Index of /public/dataset/Ecoli/egs

Icon  Name                                Last modified      Size  Description
[PARENTDIR] Parent Directory - [   ] m64004_200618_002500.subreads.bam 2020-06-18 18:07 433G [   ] m64004_200618_002500.hifi_reads.bam 2020-09-24 17:32 20G [   ] ecoli_pbi_Jan2021_majorStrain.fasta 2021-01-12 07:04 4.4M [TXT] README.html 2021-02-09 07:34 2.6K [TXT] README.txt 2021-02-09 07:34 2.6K
================================
# E.coli Gold Standard (EGS)

#GOAL

- Share gold-standard E.coli sample data:

  - vetted reference sequence

  - Pacbio sequencing data

  - documentation of biologically irreducible minor variant
  contaminants to be filtered away

================================
# Data Locations Reference and Sequencing Reads

- Gold standard E.coli sample sequencing PACB

data type     | size            | link / file
------------- | --------------- | -----------------------------------
Reference     |       4,639,002 | ecoli_pbi_Jan2021_majorStrain.fasta
CCS HIFI Reads|  21,591,167,182 | m64004_200618_002500.hifi_reads.bam
Raw Subreads  | 464,943,830,201 | m64004_200618_002500.subreads.bam

================================
# Data Methods

- Pacbio Data Methods 

PACB key         | value
---------------- | ----------------------------------------------------------------------
Sample           | ATCC E.coli K12 MG1655 prep WL_052920
Shearing         | 15 kb Megaruptor3, Shear Speed 37, 10 ug/column, 500 uL per column
Size selection   | BluePippin U1 12 kb to 18 kb
Library prep     | TPK-1, Express V2, 4 EM, WL_061420b
Sequencing       | Sequel System II with BINDINGKIT=101-820-500 SEQUENCINGKIT=101-826-100
Run time         | 2.9 hour pre-extension; 15 hour movie
CCS              | SMRT Link 10.0.0  Circular Consensus Sequence Analysis (ccs v5.0.0)
Alignment        | pbmm2 1.5.0 (commit v1.5.0-2-g464414e)
Alignment Params | --min-concordance-perc 70.0 --min-length 50 --preset CCS

- Pacbio CCS Stats

value     | statistic
--------- | -----------------------------
1,519,099 | hifi reads
13,956    | mean readlength
q29       | median predicted read quality
8         | mean number of passes

- Overall error rates and coverages

median err | mean err | median cover | mean cover
---------- | -------- | ------------ | ----------
QV31.7     | QV26.0   | 4561         | 4557

================================
# Biological Minor Subspecies

- There are biological minor subspecies present in our sample.

  - Our best practices strived for as little biological variation as
  possible. These variants appear to be biologically irredicible given
  our methods.

  - The subspecies were indicated by large error events.

 - Reads that map to these minor subspecies at these locations should
  be appropriately filtered as they differ from the reference
  sequence.

location        | size | abundance | cause
--------------- | ---- | --------- | -------------------------
1035756-1037553 | ~2k  | 20%       | prophage inversion
2343429-2343725 | ~300 | 2%        | fimB regulatory inversion

================================