Index of /public/dataset/pbAmpliconAnalysis_HLA

Icon  Name                                    Last modified      Size  Description
[PARENTDIR] Parent Directory - [DIR] fastq/ 2020-10-23 15:21 - [DIR] fastq_600/ 2020-10-23 15:21 - [DIR] pbaa/ 2020-10-23 15:37 - [DIR] pbaa_600/ 2020-10-23 15:40 - [   ] HLA_11locus_clustering_guide.fasta 2020-10-22 15:24 417K [   ] HLA_11locus_clustering_guide.fasta.fai 2020-10-22 15:24 2.5K [   ] NGSEngine_HiFi_typing.pdf 2020-10-23 14:03 2.5M [   ] NGSEngine_pbAA_consensus_typing.pdf 2020-10-23 15:19 3.3M [   ] NGSEngine_pbAA_consensus_typing_600.pdf 2020-10-23 15:02 3.3M [TXT] README.txt 2020-10-23 17:18 4.3K [TXT] md5sum.txt 2020-10-23 17:21 11K
README  (Last Updated 10/23/2020)

********************
INTRODUCTION
********************

   This README file describes the contents in this directory.

   This dataset contains HiFi reads and clustered consensus sequences for 
amplicons of 6 HLA genes for 8 samples. Amplicons were generated using the
NGSgo-MX6-1 kit from GenDx[1], and sequence length is approximately 
3.1kb - 5.9kb. The library was sequenced on the Sequel II system and
processed using the PacBio Amplicon Analysis tool pbAA[2].  Gentyping was 
validated with NGSEngine[3].

Sample data are provided in two sets:
- High-Coverage (>10,000 -- all HiFi data for these samples from this run)
- Recommended Minimum for 6 HLA loci (600 -- random subset)

********************
SAMPLE
********************

Eight genomic DNA samples from Coriell Institute:

12878-HG001
24143-HG004
24149-HG003
24385-HG002
24631-HG005
24695-HG007
06896-3
C1-218

Target Genes:
HLA-A
HLA-B
HLA-C
HLA-DPB1
HLA-DQB1
HLA-DRB1

********************
METHODS
********************

Library Preparation: 
Amplified using NGSGo-MX6-1 from GenDx.
PacBio SMRTbell Express Template Prep Kit 2.0 with barcoded overhang adapters.

Sequencing: 
Sequel II System with Sequel II Binding Kit 2.0
Sequel II Sequencing Kit 2.0
54 pM on-plate concentration

Run time: 
20hr movie + 1.1hr pre-extension 

Analysis: 
ccs 5.0.0 (min 3 pass & min QV 20)
lima 2.0.0
samtools 1.10
pbaa 0.1.2 (commit 92ce879)
NGSEngine 2.18.0.17625


********************
FILE DESCRIPTION
********************

========================
WHAT FILES SHOULD I USE? 
========================
Users wishing to demo the pbAA clustering program should at the
minimum download the HLA clustering guide and fastq file(s), 
including .fai (or create the .fai with samtools v9+).

HLA_11locus_clustering_guide.fasta
HLA_11locus_clustering_guide.fasta.fai

fastq (High Coverage)
|   |-- demultiplex.06896-3.fastq
|   |-- demultiplex.06896-3.fastq.fai
|   |-- demultiplex.12878-HG001.fastq
|   |-- demultiplex.12878-HG001.fastq.fai
... (truncated)

and/or
|-- fastq_600
|   |-- demultiplex.06896-3.fastq
|   |-- demultiplex.06896-3.fastq.fai
|   |-- demultiplex.12878-HG001.fastq
|   |-- demultiplex.12878-HG001.fastq.fai
... (truncated)

Genotype calls for HiFi data (without clustering via pbAA), as well
as calls from pbAA outputs for both high- and recommended coverage:

NGSEngine_HiFi_typing.pdf
NGSEngine_pbAA_consensus_typing.pdf
NGSEngine_pbAA_consensus_typing_600.pdf 

========================
pbAA Clustered outputs
========================

Outputs from pbAA runs on both sets of data can be found in "pbaa*" 
directories.  The file "seq.fofn" in each location and "run.sh" define inputs
and commands for pbAA, respectively. The files "*painted.bam" are fastq reads from each
subset/sample aligned to the clustering guide and labeled by cluster results for viewing
in IGV.

pbaa
|   |-- 06896-3
|   |   |-- pbaa.log
|   |   |-- pbaa_06896-3_failed_cluster_sequences.fasta
|   |   |-- pbaa_06896-3_painted.bam
|   |   |-- pbaa_06896-3_painted.bam.bai
|   |   |-- pbaa_06896-3_passed_cluster_sequences.fasta
|   |   `-- pbaa_06896-3_read_info.txt
|   |-- 12878-HG001
|   |   |-- pbaa.log
|   |   |-- pbaa_12878-HG001_failed_cluster_sequences.fasta
|   |   |-- pbaa_12878-HG001_painted.bam
|   |   |-- pbaa_12878-HG001_painted.bam.bai
|   |   |-- pbaa_12878-HG001_passed_cluster_sequences.fasta
|   |   `-- pbaa_12878-HG001_read_info.txt
... (truncated)

pbaa_600
    |-- 06896-3
    |   |-- pbaa.log
    |   |-- pbaa_06896-3_failed_cluster_sequences.fasta
    |   |-- pbaa_06896-3_painted.bam
    |   |-- pbaa_06896-3_painted.bam.bai
    |   |-- pbaa_06896-3_passed_cluster_sequences.fasta
    |   `-- pbaa_06896-3_read_info.txt
    |-- 12878-HG001
    |   |-- pbaa.log
    |   |-- pbaa_12878-HG001_failed_cluster_sequences.fasta
    |   |-- pbaa_12878-HG001_painted.bam
    |   |-- pbaa_12878-HG001_painted.bam.bai
    |   |-- pbaa_12878-HG001_passed_cluster_sequences.fasta
    |   `-- pbaa_12878-HG001_read_info.txt
... (truncated)

********************
REFERENCES
********************

[1] GenDx NGSGo-MX6-1: https://www.gendx.com/product_line/ngsgo-mx6-1/
[2] PacBio pbAA: https://github.com/PacificBiosciences/pbAA
[3] GenDx NGSEngine: https://www.gendx.com/product_line/ngsengine/

More info on HLA Sequencing with PacBio: https://www.pacb.com/applications/targeted-sequencing/hla/