Index of /public/dataset/RepeatExpansionDisorders_NoAmp
Name Last modified Size Description
Parent Directory -
rawMovie/ 2020-06-08 14:12 -
auxiliary/ 2020-06-08 14:12 -
analysis/ 2020-06-08 14:39 -
README.txt 2020-06-08 14:30 6.8K
README (Last Updated 06/08/2020)
********************
INTRODUCTION
********************
This README file describes the contents in this directory.
This dataset contains raw, intermediate, and processed files of targeted
sequence data for a set of 7 samples with repeat-expansion genotypes and 1
control sample with no repeat expansions at the targeted sites. Targeted
sites of interest are HTT and FMR1. The library was sequenced on the
Sequel II system and processed using community GitHub tool analysis.
For more information on No-Amp methods[1], bioinformatics
analysis, see the PacBio GitHub[2] and additional references below.
********************
SAMPLE
********************
Seven genomic DNA samples from Coriell Institute and one DNA sample
from HEK293 cell line:
Samples with HTT CAG repeat expansions
NA13505 with sequencing barcode BC1015
NA13509 with sequencing barcode BC1016
NA20253 with sequencing barcode BC1017
NA14044 with sequencing barcode BC1018
Samples with FMR1 CGG repeat expansions
NA13664 with sequencing barcode BC1020
NA06896 with sequencing barcode BC1021
NA07537 with sequencing barcode BC1022
Sample without know repeat expansions –
HEK293 with sequencing barcode BC1019
********************
METHODS
********************
Library Preparation:
Procedure & Checklist – No-Amp Targeted Sequencing Utilizing the CRISPR-Cas9 System (PN 101-801-500)
Sequencing:
Sequel II System with Sequel II Binding Kit 2.0 (PN 101-842-900) and
Sequel II Sequencing Kit 2.0 (4 rxn) (PN 101-820-200)
Run time:
30hr movie + 0.5hr pre-extension
Analysis:
PacBio GitHub Repeat Analysis Tools pipeline[1] using the following executable versions for
data preparation:
ccs 4.2.0 (commit v4.2.0-1-g450908e4) (available from pbbioconda[3])
lima 1.11.0 (commit v1.11.0-1-gec618c9)
pbmm2 1.2.0 (commit v1.2.0-1-g31b4be0)
Post-mapping repeat analysis was performed using GitHub scripts[2].
********************
FILE DESCRIPTION
********************
========================
WHAT FILES SHOULD I USE?
========================
Users wishing to immediately make use the processed, demuxed, and mapped
results in 3rd party tools should use the following BAM files:
analysis/align/
├── m64012_191221_044659.ccsset.bc1015--bc1015.bam
├── m64012_191221_044659.ccsset.bc1016--bc1016.bam
├── m64012_191221_044659.ccsset.bc1017--bc1017.bam
├── m64012_191221_044659.ccsset.bc1018--bc1018.bam
├── m64012_191221_044659.ccsset.bc1019--bc1019.bam
├── m64012_191221_044659.ccsset.bc1020--bc1020.bam
├── m64012_191221_044659.ccsset.bc1021--bc1021.bam
├── m64012_191221_044659.ccsset.bc1022--bc1022.bam
Additionally, users who wish to use extracted repeat expansion regions as defined
in the targets.BED file should use the following FASTQ files:
analysis/fastq
├── m64012_191221_044659.ccsset.bc1015--bc1015.extracted_FMR1.fastq
├── m64012_191221_044659.ccsset.bc1015--bc1015.extracted_HTT.fastq
├── m64012_191221_044659.ccsset.bc1016--bc1016.extracted_FMR1.fastq
├── m64012_191221_044659.ccsset.bc1016--bc1016.extracted_HTT.fastq
├── m64012_191221_044659.ccsset.bc1017--bc1017.extracted_FMR1.fastq
├── m64012_191221_044659.ccsset.bc1017--bc1017.extracted_HTT.fastq
├── m64012_191221_044659.ccsset.bc1018--bc1018.extracted_FMR1.fastq
├── m64012_191221_044659.ccsset.bc1018--bc1018.extracted_HTT.fastq
├── m64012_191221_044659.ccsset.bc1019--bc1019.extracted_FMR1.fastq
├── m64012_191221_044659.ccsset.bc1019--bc1019.extracted_HTT.fastq
├── m64012_191221_044659.ccsset.bc1020--bc1020.extracted_FMR1.fastq
├── m64012_191221_044659.ccsset.bc1020--bc1020.extracted_HTT.fastq
├── m64012_191221_044659.ccsset.bc1021--bc1021.extracted_FMR1.fastq
├── m64012_191221_044659.ccsset.bc1021--bc1021.extracted_HTT.fastq
├── m64012_191221_044659.ccsset.bc1022--bc1022.extracted_FMR1.fastq
└── m64012_191221_044659.ccsset.bc1022--bc1022.extracted_HTT.fastq
Visual graphs of all on-target reads including waterfall plots and expansion size distributions,
as well as per-read motif counts can be found in the reports directory:
analysis/reports/
├── m64012_191221_044659.ccsset.bc1015--bc1015.extracted_FMR1.counts.csv
├── m64012_191221_044659.ccsset.bc1015--bc1015.extracted_FMR1.insertSize.png
├── m64012_191221_044659.ccsset.bc1015--bc1015.extracted_FMR1.motifCount.png
├── m64012_191221_044659.ccsset.bc1015--bc1015.extracted_FMR1.waterfall.pdf
├── m64012_191221_044659.ccsset.bc1015--bc1015.extracted_HTT.counts.csv
├── m64012_191221_044659.ccsset.bc1015--bc1015.extracted_HTT.insertSize.png
├── m64012_191221_044659.ccsset.bc1015--bc1015.extracted_HTT.motifCount.png
├── m64012_191221_044659.ccsset.bc1015--bc1015.extracted_HTT.waterfall.pdf
... (truncated)
Clustered per-allele results with confidence intervals on repeat expansions
and colorized BAMs (for viewing in IGV) can be found in the cluster directory:
analysis/cluster/
├── m64012_191221_044659.ccsset.bc1015--bc1015.FMR1.hptagged.bam
├── m64012_191221_044659.ccsset.bc1015--bc1015.FMR1.hptagged.bam.bai
├── m64012_191221_044659.ccsset.bc1015--bc1015.FMR1.readnames.txt
├── m64012_191221_044659.ccsset.bc1015--bc1015.FMR1.summary.csv
├── m64012_191221_044659.ccsset.bc1015--bc1015.HTT.hptagged.bam
├── m64012_191221_044659.ccsset.bc1015--bc1015.HTT.hptagged.bam.bai
├── m64012_191221_044659.ccsset.bc1015--bc1015.HTT.readnames.txt
├── m64012_191221_044659.ccsset.bc1015--bc1015.HTT.summary.csv
... (truncated)
========================
Raw Subreads
========================
The RawMovie/ folder contains the movie BAM file.
rawMovie/
|---- m64012_191221_044659.adapters.fasta
|---- m64012_191221_044659.sts.xml
|---- m64012_191221_044659.subreads.bam
|---- m64012_191221_044659.subreads.bam.pbi
|---- md5sums.txt
========================
Auxiliary files
========================
See auxiliary/ directory for barcodes and target BED file.
auxiliary/
├── Barcoded_Adapter_8B.fasta
└── human_hs37d5.targets_repeatonly.bed
hs37d5 (hg19) reference can be downloaded here:
ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
========================
Intermediate Results
========================
The directories analysis/ccs and analysis/demux contain BAM files with unaligned CCS
and unaligned demultiplexed reads, respectively.
********************
REFERENCES
********************
[1] PacBio No-Amp landing page: https://www.pacb.com/applications/targeted-sequencing/no-amp-targeted-sequencing/
[2] Community tool RepeatAnalysis: https://github.com/PacificBiosciences/apps-scripts/tree/master/RepeatAnalysisTools
[3] PacBio pbbioconda landing page: https://github.com/PacificBiosciences/pbbioconda