Index of /public/onso/2024Q1/Twist_Exome_2p0
 Name                    Last modified      Size  Description
 Parent Directory                             -   
 checksums/              2024-03-26 16:23    -   
 fastqs/                 2024-03-26 16:29    -   
 README.txt              2024-03-13 12:35  2.5K  
# PacBio Onso Sequencing of NA12878 using Twist Exome 2.0 hybrid capture
## Legal disclaimer
All trademarks, trade names, or logos mentioned or used are the property of their respective owners.
## Data
The "Onso_Twist_Exome" folder contains target enriched NA12878 library reads sequenced on a PacBio Onso instrument in San Diego, CA. The read data contain Illumina P5/P7 adapters that should be trimmed prior to analysis (described later). The libraries were sequenced with paired-end 2x100bp sequencing chemistry and demultiplexed using an in-house tool.
### Samples
In total there are eight replicates of the same sample, all samples sequenced using PE100 run configuration.
### Read filtering and subsampling
Prior to releasing the data, reads were demultiplexed using an internal tool. Reads were then filtered to exclude pairs if one or both reads were < 100 BP in length. This was performed using the following example command:
```bash
cutadapt \
    --minimum-length 100 \
    -j 10 \
    -o {input.fastq1} \
    -p {output.fastq1} \
    {input.fastq2} \
    {output.fastq2}
```
Reads were then merged across both lanes (L01 and L02) and subsampled to 60 million reads (30 million read pairs) per sample using `seqtk`. Note that for two samples there were not 60 million reads and so all reads from those samples are being shared. This was performed with the following example commands:
```bash
seqtk sample \
    -s100 \
    {input.fastq1} 30000000 | gzip > {output.fastq1}
seqtk sample \
    -s100 \
    {input.fastq2} 30000000 | gzip > {output.fastq2}
```
### Data Processing
Below are the analysis steps performed by PacBio internally to assess the data's quality.
#### Trimming
The Illumina P5/P7 adapters were trimmed using `cutadapt`. Adapter trimming was completed with the following example command:
```bash
cutadapt \
    -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
    -A AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
    --overlap 3 \
    -j 10 \
    -o {output.fastq1} \
    -p {output.fastq2} \
    {input.fastq1} \
    {input.fastq2}
```
#### Alignment
Reads were aligned to the GrCH38 reference without alt contigs (hg38_no_alt) using BWA MEM. Samtools was used to mark duplicates. The alignment and marking of duplicates was performed with the following example commands:
```bash
bwa mem -t12 \
    {REF} \
    {input.fastq1} {input.fastq2} | \
    samtools sort -n - | \
    samtools fixmate -m - - | \
    samtools sort - | \
    samtools markdup - - \
    > {output.bam_out}
```
*Rev 2024-03-13*