# PacBio Onso Whole Genome Sequencing of HG002

## Legal disclaimer

All trademarks, trade names, or logos mentioned or used are the property of their respective owners.

## Data

The Onso_hg002_PCR_free_WGS_OSQ folder contains adapter trimmed reads sequenced on a PacBio Onso instrument in San Diego, CA. The libraries were sequenced with paired-end 2x150bp sequencing chemistry.

Below is a brief description of the files:
Onso_hg002_PCR_free_WGS_OSQ_R1.fastq.gz          -  read1 fastq containing untrimmed reads
Onso_hg002_PCR_free_WGS_OSQ_R2.fastq.gz          -  read2 fastq containing untrimmed reads
Onso_hg002_PCR_free_WGS_OSQ_trimmed_R1.fastq.gz  -  read1 fastq containing adapter trimmed reads
Onso_hg002_PCR_free_WGS_OSQ_trimmed_R2.fastq.gz  -  read2 fastq containing adapter trimmed reads
md5sums.txt                                      -  md5 checksums of the fastq files


### Adapter Trimming
Adapter trimming of the trimmed fastqs was perfomed using the cutadapt application (https://cutadapt.readthedocs.io/en/stable/) with the following command:
```
cutadapt \
    -a ATCGATTCGTGCTTGTCCGTGGTACTCGGCA \
    -A ATCGATTCGTGCTCGATGAACCGGGCGCTTA \
    --overlap 8 \
    -j 10 \
    -o {output.fastq1} \
    -p {output.fastq2} \
    {input.fastq1} \
    {input.fastq2}
```


### Alignment
Reads can be aligned to the a reference fasta (e.g. hg38 without alt contigs) using bwa-mem and indexed with samtools.
```
bwa mem -t24 -R {RG_TAG} {REFERENCE_FASTA} {input.fastq1} {input.fastq2} | \
    samtools sort -@4 -o {output.bam}
samtools index {output.bam}
```

 
*Rev 2023-09-15*