Index of /public/dataset/pbAmpliconAnalysis_HLA
Name Last modified Size Description
Parent Directory -
fastq/ 2020-10-23 15:21 -
fastq_600/ 2020-10-23 15:21 -
pbaa/ 2020-10-23 15:37 -
pbaa_600/ 2020-10-23 15:40 -
HLA_11locus_clustering_guide.fasta.fai 2020-10-22 15:24 2.5K
README.txt 2020-10-23 17:18 4.3K
md5sum.txt 2020-10-23 17:21 11K
HLA_11locus_clustering_guide.fasta 2020-10-22 15:24 417K
NGSEngine_HiFi_typing.pdf 2020-10-23 14:03 2.5M
NGSEngine_pbAA_consensus_typing.pdf 2020-10-23 15:19 3.3M
NGSEngine_pbAA_consensus_typing_600.pdf 2020-10-23 15:02 3.3M
README (Last Updated 10/23/2020)
********************
INTRODUCTION
********************
This README file describes the contents in this directory.
This dataset contains HiFi reads and clustered consensus sequences for
amplicons of 6 HLA genes for 8 samples. Amplicons were generated using the
NGSgo-MX6-1 kit from GenDx[1], and sequence length is approximately
3.1kb - 5.9kb. The library was sequenced on the Sequel II system and
processed using the PacBio Amplicon Analysis tool pbAA[2]. Gentyping was
validated with NGSEngine[3].
Sample data are provided in two sets:
- High-Coverage (>10,000 -- all HiFi data for these samples from this run)
- Recommended Minimum for 6 HLA loci (600 -- random subset)
********************
SAMPLE
********************
Eight genomic DNA samples from Coriell Institute:
12878-HG001
24143-HG004
24149-HG003
24385-HG002
24631-HG005
24695-HG007
06896-3
C1-218
Target Genes:
HLA-A
HLA-B
HLA-C
HLA-DPB1
HLA-DQB1
HLA-DRB1
********************
METHODS
********************
Library Preparation:
Amplified using NGSGo-MX6-1 from GenDx.
PacBio SMRTbell Express Template Prep Kit 2.0 with barcoded overhang adapters.
Sequencing:
Sequel II System with Sequel II Binding Kit 2.0
Sequel II Sequencing Kit 2.0
54 pM on-plate concentration
Run time:
20hr movie + 1.1hr pre-extension
Analysis:
ccs 5.0.0 (min 3 pass & min QV 20)
lima 2.0.0
samtools 1.10
pbaa 0.1.2 (commit 92ce879)
NGSEngine 2.18.0.17625
********************
FILE DESCRIPTION
********************
========================
WHAT FILES SHOULD I USE?
========================
Users wishing to demo the pbAA clustering program should at the
minimum download the HLA clustering guide and fastq file(s),
including .fai (or create the .fai with samtools v9+).
HLA_11locus_clustering_guide.fasta
HLA_11locus_clustering_guide.fasta.fai
fastq (High Coverage)
| |-- demultiplex.06896-3.fastq
| |-- demultiplex.06896-3.fastq.fai
| |-- demultiplex.12878-HG001.fastq
| |-- demultiplex.12878-HG001.fastq.fai
... (truncated)
and/or
|-- fastq_600
| |-- demultiplex.06896-3.fastq
| |-- demultiplex.06896-3.fastq.fai
| |-- demultiplex.12878-HG001.fastq
| |-- demultiplex.12878-HG001.fastq.fai
... (truncated)
Genotype calls for HiFi data (without clustering via pbAA), as well
as calls from pbAA outputs for both high- and recommended coverage:
NGSEngine_HiFi_typing.pdf
NGSEngine_pbAA_consensus_typing.pdf
NGSEngine_pbAA_consensus_typing_600.pdf
========================
pbAA Clustered outputs
========================
Outputs from pbAA runs on both sets of data can be found in "pbaa*"
directories. The file "seq.fofn" in each location and "run.sh" define inputs
and commands for pbAA, respectively. The files "*painted.bam" are fastq reads from each
subset/sample aligned to the clustering guide and labeled by cluster results for viewing
in IGV.
pbaa
| |-- 06896-3
| | |-- pbaa.log
| | |-- pbaa_06896-3_failed_cluster_sequences.fasta
| | |-- pbaa_06896-3_painted.bam
| | |-- pbaa_06896-3_painted.bam.bai
| | |-- pbaa_06896-3_passed_cluster_sequences.fasta
| | `-- pbaa_06896-3_read_info.txt
| |-- 12878-HG001
| | |-- pbaa.log
| | |-- pbaa_12878-HG001_failed_cluster_sequences.fasta
| | |-- pbaa_12878-HG001_painted.bam
| | |-- pbaa_12878-HG001_painted.bam.bai
| | |-- pbaa_12878-HG001_passed_cluster_sequences.fasta
| | `-- pbaa_12878-HG001_read_info.txt
... (truncated)
pbaa_600
|-- 06896-3
| |-- pbaa.log
| |-- pbaa_06896-3_failed_cluster_sequences.fasta
| |-- pbaa_06896-3_painted.bam
| |-- pbaa_06896-3_painted.bam.bai
| |-- pbaa_06896-3_passed_cluster_sequences.fasta
| `-- pbaa_06896-3_read_info.txt
|-- 12878-HG001
| |-- pbaa.log
| |-- pbaa_12878-HG001_failed_cluster_sequences.fasta
| |-- pbaa_12878-HG001_painted.bam
| |-- pbaa_12878-HG001_painted.bam.bai
| |-- pbaa_12878-HG001_passed_cluster_sequences.fasta
| `-- pbaa_12878-HG001_read_info.txt
... (truncated)
********************
REFERENCES
********************
[1] GenDx NGSGo-MX6-1: https://www.gendx.com/product_line/ngsgo-mx6-1/
[2] PacBio pbAA: https://github.com/PacificBiosciences/pbAA
[3] GenDx NGSEngine: https://www.gendx.com/product_line/ngsengine/
More info on HLA Sequencing with PacBio: https://www.pacb.com/applications/targeted-sequencing/hla/