README (Last Updated 10/23/2020) ******************** INTRODUCTION ******************** This README file describes the contents in this directory. This dataset contains HiFi reads and clustered consensus sequences for amplicons of 6 HLA genes for 8 samples. Amplicons were generated using the NGSgo-MX6-1 kit from GenDx[1], and sequence length is approximately 3.1kb - 5.9kb. The library was sequenced on the Sequel II system and processed using the PacBio Amplicon Analysis tool pbAA[2]. Gentyping was validated with NGSEngine[3]. Sample data are provided in two sets: - High-Coverage (>10,000 -- all HiFi data for these samples from this run) - Recommended Minimum for 6 HLA loci (600 -- random subset) ******************** SAMPLE ******************** Eight genomic DNA samples from Coriell Institute: 12878-HG001 24143-HG004 24149-HG003 24385-HG002 24631-HG005 24695-HG007 06896-3 C1-218 Target Genes: HLA-A HLA-B HLA-C HLA-DPB1 HLA-DQB1 HLA-DRB1 ******************** METHODS ******************** Library Preparation: Amplified using NGSGo-MX6-1 from GenDx. PacBio SMRTbell Express Template Prep Kit 2.0 with barcoded overhang adapters. Sequencing: Sequel II System with Sequel II Binding Kit 2.0 Sequel II Sequencing Kit 2.0 54 pM on-plate concentration Run time: 20hr movie + 1.1hr pre-extension Analysis: ccs 5.0.0 (min 3 pass & min QV 20) lima 2.0.0 samtools 1.10 pbaa 0.1.2 (commit 92ce879) NGSEngine 2.18.0.17625 ******************** FILE DESCRIPTION ******************** ======================== WHAT FILES SHOULD I USE? ======================== Users wishing to demo the pbAA clustering program should at the minimum download the HLA clustering guide and fastq file(s), including .fai (or create the .fai with samtools v9+). HLA_11locus_clustering_guide.fasta HLA_11locus_clustering_guide.fasta.fai fastq (High Coverage) | |-- demultiplex.06896-3.fastq | |-- demultiplex.06896-3.fastq.fai | |-- demultiplex.12878-HG001.fastq | |-- demultiplex.12878-HG001.fastq.fai ... (truncated) and/or |-- fastq_600 | |-- demultiplex.06896-3.fastq | |-- demultiplex.06896-3.fastq.fai | |-- demultiplex.12878-HG001.fastq | |-- demultiplex.12878-HG001.fastq.fai ... (truncated) Genotype calls for HiFi data (without clustering via pbAA), as well as calls from pbAA outputs for both high- and recommended coverage: NGSEngine_HiFi_typing.pdf NGSEngine_pbAA_consensus_typing.pdf NGSEngine_pbAA_consensus_typing_600.pdf ======================== pbAA Clustered outputs ======================== Outputs from pbAA runs on both sets of data can be found in "pbaa*" directories. The file "seq.fofn" in each location and "run.sh" define inputs and commands for pbAA, respectively. The files "*painted.bam" are fastq reads from each subset/sample aligned to the clustering guide and labeled by cluster results for viewing in IGV. pbaa | |-- 06896-3 | | |-- pbaa.log | | |-- pbaa_06896-3_failed_cluster_sequences.fasta | | |-- pbaa_06896-3_painted.bam | | |-- pbaa_06896-3_painted.bam.bai | | |-- pbaa_06896-3_passed_cluster_sequences.fasta | | `-- pbaa_06896-3_read_info.txt | |-- 12878-HG001 | | |-- pbaa.log | | |-- pbaa_12878-HG001_failed_cluster_sequences.fasta | | |-- pbaa_12878-HG001_painted.bam | | |-- pbaa_12878-HG001_painted.bam.bai | | |-- pbaa_12878-HG001_passed_cluster_sequences.fasta | | `-- pbaa_12878-HG001_read_info.txt ... (truncated) pbaa_600 |-- 06896-3 | |-- pbaa.log | |-- pbaa_06896-3_failed_cluster_sequences.fasta | |-- pbaa_06896-3_painted.bam | |-- pbaa_06896-3_painted.bam.bai | |-- pbaa_06896-3_passed_cluster_sequences.fasta | `-- pbaa_06896-3_read_info.txt |-- 12878-HG001 | |-- pbaa.log | |-- pbaa_12878-HG001_failed_cluster_sequences.fasta | |-- pbaa_12878-HG001_painted.bam | |-- pbaa_12878-HG001_painted.bam.bai | |-- pbaa_12878-HG001_passed_cluster_sequences.fasta | `-- pbaa_12878-HG001_read_info.txt ... (truncated) ******************** REFERENCES ******************** [1] GenDx NGSGo-MX6-1: https://www.gendx.com/product_line/ngsgo-mx6-1/ [2] PacBio pbAA: https://github.com/PacificBiosciences/pbAA [3] GenDx NGSEngine: https://www.gendx.com/product_line/ngsengine/ More info on HLA Sequencing with PacBio: https://www.pacb.com/applications/targeted-sequencing/hla/