Index of /public/onso/2024Q1/Twist_Exome_2p0
Name Last modified Size Description
Parent Directory -
checksums/ 2024-03-26 16:23 -
fastqs/ 2024-03-26 16:29 -
README.txt 2024-03-13 12:35 2.5K
# PacBio Onso Sequencing of NA12878 using Twist Exome 2.0 hybrid capture
## Legal disclaimer
All trademarks, trade names, or logos mentioned or used are the property of their respective owners.
## Data
The "Onso_Twist_Exome" folder contains target enriched NA12878 library reads sequenced on a PacBio Onso instrument in San Diego, CA. The read data contain Illumina P5/P7 adapters that should be trimmed prior to analysis (described later). The libraries were sequenced with paired-end 2x100bp sequencing chemistry and demultiplexed using an in-house tool.
### Samples
In total there are eight replicates of the same sample, all samples sequenced using PE100 run configuration.
### Read filtering and subsampling
Prior to releasing the data, reads were demultiplexed using an internal tool. Reads were then filtered to exclude pairs if one or both reads were < 100 BP in length. This was performed using the following example command:
```bash
cutadapt \
--minimum-length 100 \
-j 10 \
-o {input.fastq1} \
-p {output.fastq1} \
{input.fastq2} \
{output.fastq2}
```
Reads were then merged across both lanes (L01 and L02) and subsampled to 60 million reads (30 million read pairs) per sample using `seqtk`. Note that for two samples there were not 60 million reads and so all reads from those samples are being shared. This was performed with the following example commands:
```bash
seqtk sample \
-s100 \
{input.fastq1} 30000000 | gzip > {output.fastq1}
seqtk sample \
-s100 \
{input.fastq2} 30000000 | gzip > {output.fastq2}
```
### Data Processing
Below are the analysis steps performed by PacBio internally to assess the data's quality.
#### Trimming
The Illumina P5/P7 adapters were trimmed using `cutadapt`. Adapter trimming was completed with the following example command:
```bash
cutadapt \
-a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
-A AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
--overlap 3 \
-j 10 \
-o {output.fastq1} \
-p {output.fastq2} \
{input.fastq1} \
{input.fastq2}
```
#### Alignment
Reads were aligned to the GrCH38 reference without alt contigs (hg38_no_alt) using BWA MEM. Samtools was used to mark duplicates. The alignment and marking of duplicates was performed with the following example commands:
```bash
bwa mem -t12 \
{REF} \
{input.fastq1} {input.fastq2} | \
samtools sort -n - | \
samtools fixmate -m - - | \
samtools sort - | \
samtools markdup - - \
> {output.bam_out}
```
*Rev 2024-03-13*