Index of /public/dataset/SarsCov2-Eden-ATCC

Icon  Name                                Last modified      Size  Description
[PARENTDIR] Parent Directory - [DIR] subsample_100/ 2020-05-04 11:12 - [DIR] subsample_1000/ 2020-05-04 11:11 - [DIR] subsample_20/ 2020-05-04 11:12 - [   ] NC_045512.2.fasta 2020-03-24 09:27 30K [TXT] README.txt 2020-05-09 07:54 3.9K [   ] eden.primers.fasta 2020-05-04 11:25 1.2K [   ] eden.primers.plus_M13constant.fasta 2020-05-04 11:25 1.6K [   ] run_juliet_per_sample.sh 2020-05-04 11:17 1.5K [   ] sarscov2.json 2020-04-09 06:27 31K
README  (Last Updated 05/09/2020)

********************
INTRODUCTION
********************

   This README file describes the contents in this directory.

   This dataset contains processed data of SARS-CoV-2 sequencing on the PacBio
Systems [1] using the Eden primer set [2] on ATCC full-length controls [3].

   Bioinformatics processing is described in the CoSA tutorial [4] using the 
2020-05-01 version of workflow.

   For issues or questions regarding this dataset, 
file a "bug" at https://github.com/Magdoll/CoSA/issues.

********************
SAMPLE
********************
 
ATCC VR-1986D Lot# 70034826
(https://www.atcc.org/en/Global/Products/VR-1986D.aspx)


********************
METHODS
********************

Library Preparation & Sequencing:

The library was constructed using SMRTbell Express Template Prep Kit 2.0. Sequencing was done on one
SMRT Cell 8M on the Sequel II system for 15hr with 0.6hr pre-extension time
using Sequel II Binding Kit 2.0.


Analysis: 

Detailed bioinformatics processing is described in the CoSA tutorial [4] 
using the 2020-05-01 version of workflow. Briefly, CCS reads were generated
using SMRT Link, then demultiplexed of M13 barcodes. A second round of demux
(using lima) was performed to identify the Eden primers, allowing only for
adjacent pairs (ex: A3F--A3R) and filtering out invalid pairs (ex: A1F--A3R).
The demuxed, trimmed, and filtered CCS reads were then pooled together and
downsampled at 1000, 100, and 20 reads per amplicon using the CoSA script
`subsample_amplicons.py`. 

Mapping and variant calling was done using pbmm2 (minimap2 wrapper) to the
reference genome, followed by juliet (minorseq) with --min-perc 10 frequency
cutoff. 


Analysis tool versions:

ccs v5.0.0 (using SMRT Link v9.1.0.94448)
lima v1.11.0
pbmm2 v1.2.1
juliet v1.12.0


********************
FILE DESCRIPTION
********************


NC_045512.2.fasta - the reference genome fasta file, note the ID is "NC_045512v2" 
                    to be consistent with the UCSC genome browser convention.

eden.primers.fasta - the Eden primers

eden.primers.plus_M13constant.fasta - the Eden primers, with the M13 constant sequence added 
                    
sarscov2.json - the SARS-CoV-2 config file used by Juliet (MinorSeq) for variant calling

run_juliet_per_sample.sh - template command file for mapping and variant calling

subsampled.ccs.Q20.fastq  - CCS (HiFi) amplicon reads. Barcodes and Eden primers have been trimmed.

subsampled.mapped.bam - mapping of "subsampled.ccs.Q20.fastq" to the reference genome.

subsampled.minperc10.juliet.* - variant calling output using Juliet (minorseq).


********************
FILE LIST
********************

├── NC_045512.2.fasta 
├── run_juliet_per_sample.sh
├── sarscov2.json 
├── eden.primers.fasta 
├── eden.primers.plus_M13constant.fasta 
├── subsample_1000
│   ├── subsampled.ccs.Q20.fastq 
│   ├── subsampled.mapped.bam 
│   ├── subsampled.mapped.bam.bai 
│   ├── subsampled.minperc10.juliet.html 
│   ├── subsampled.minperc10.juliet.json 
│   └── subsampled.minperc10.juliet.vcf 
├── subsample_100
│   ├── subsampled.ccs.Q20.fastq 
│   ├── subsampled.mapped.bam 
│   ├── subsampled.mapped.bam.bai 
│   ├── subsampled.minperc10.juliet.html 
│   ├── subsampled.minperc10.juliet.json 
│   └── subsampled.minperc10.juliet.vcf 
└── subsample_20
    ├── subsampled.ccs.Q20.fastq 
    ├── subsampled.mapped.bam
    ├── subsampled.mapped.bam.bai 
    ├── subsampled.minperc10.juliet.html 
    ├── subsampled.minperc10.juliet.json
    └── subsampled.minperc10.juliet.vcf 

  
4. REFERENCES

[1] https://www.pacb.com/covid-19
[2] https://www.pacb.com/wp-content/uploads/Customer-Collaboration-PacBio-Compatible-Eden-Protocol-for-SARS-CoV-2-Sequencing.pdf
[3] https://www.atcc.org/en/Global/Products/VR-1986HK.aspx
[4] https://github.com/Magdoll/CoSA