Index of /public/onso/2024Q1/Twist_Exome_2p0

Icon  Name                    Last modified      Size  Description
[PARENTDIR] Parent Directory - [DIR] checksums/ 2024-03-26 16:23 - [DIR] fastqs/ 2024-03-26 16:29 - [TXT] README.txt 2024-03-13 12:35 2.5K
# PacBio Onso Sequencing of NA12878 using Twist Exome 2.0 hybrid capture

## Legal disclaimer

All trademarks, trade names, or logos mentioned or used are the property of their respective owners.

## Data

The "Onso_Twist_Exome" folder contains target enriched NA12878 library reads sequenced on a PacBio Onso instrument in San Diego, CA. The read data contain Illumina P5/P7 adapters that should be trimmed prior to analysis (described later). The libraries were sequenced with paired-end 2x100bp sequencing chemistry and demultiplexed using an in-house tool.

### Samples

In total there are eight replicates of the same sample, all samples sequenced using PE100 run configuration.

### Read filtering and subsampling

Prior to releasing the data, reads were demultiplexed using an internal tool. Reads were then filtered to exclude pairs if one or both reads were < 100 BP in length. This was performed using the following example command:

```bash
cutadapt \
    --minimum-length 100 \
    -j 10 \
    -o {input.fastq1} \
    -p {output.fastq1} \
    {input.fastq2} \
    {output.fastq2}
```

Reads were then merged across both lanes (L01 and L02) and subsampled to 60 million reads (30 million read pairs) per sample using `seqtk`. Note that for two samples there were not 60 million reads and so all reads from those samples are being shared. This was performed with the following example commands:

```bash
seqtk sample \
    -s100 \
    {input.fastq1} 30000000 | gzip > {output.fastq1}

seqtk sample \
    -s100 \
    {input.fastq2} 30000000 | gzip > {output.fastq2}
```

### Data Processing

Below are the analysis steps performed by PacBio internally to assess the data's quality.

#### Trimming

The Illumina P5/P7 adapters were trimmed using `cutadapt`. Adapter trimming was completed with the following example command:

```bash
cutadapt \
    -a AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGT \
    -A AGATCGGAAGAGCACACGTCTGAACTCCAGTCA \
    --overlap 3 \
    -j 10 \
    -o {output.fastq1} \
    -p {output.fastq2} \
    {input.fastq1} \
    {input.fastq2}
```

#### Alignment

Reads were aligned to the GrCH38 reference without alt contigs (hg38_no_alt) using BWA MEM. Samtools was used to mark duplicates. The alignment and marking of duplicates was performed with the following example commands:

```bash
bwa mem -t12 \
    {REF} \
    {input.fastq1} {input.fastq2} | \
    samtools sort -n - | \
    samtools fixmate -m - - | \
    samtools sort - | \
    samtools markdup - - \
    > {output.bam_out}
```

*Rev 2024-03-13*