Index of /public/dataset/Ecoli/egs
Name Last modified Size Description
Parent Directory -
README.html 2021-02-09 07:34 2.6K
README.txt 2021-02-09 07:34 2.6K
ecoli_pbi_Jan2021_majorStrain.fasta 2021-01-12 07:04 4.4M
m64004_200618_002500.hifi_reads.bam 2020-09-24 17:32 20G
m64004_200618_002500.subreads.bam 2020-06-18 18:07 433G
================================
# E.coli Gold Standard (EGS)
#GOAL
- Share gold-standard E.coli sample data:
- vetted reference sequence
- Pacbio sequencing data
- documentation of biologically irreducible minor variant
contaminants to be filtered away
================================
# Data Locations Reference and Sequencing Reads
- Gold standard E.coli sample sequencing PACB
data type | size | link / file
------------- | --------------- | -----------------------------------
Reference | 4,639,002 | ecoli_pbi_Jan2021_majorStrain.fasta
CCS HIFI Reads| 21,591,167,182 | m64004_200618_002500.hifi_reads.bam
Raw Subreads | 464,943,830,201 | m64004_200618_002500.subreads.bam
================================
# Data Methods
- Pacbio Data Methods
PACB key | value
---------------- | ----------------------------------------------------------------------
Sample | ATCC E.coli K12 MG1655 prep WL_052920
Shearing | 15 kb Megaruptor3, Shear Speed 37, 10 ug/column, 500 uL per column
Size selection | BluePippin U1 12 kb to 18 kb
Library prep | TPK-1, Express V2, 4 EM, WL_061420b
Sequencing | Sequel System II with BINDINGKIT=101-820-500 SEQUENCINGKIT=101-826-100
Run time | 2.9 hour pre-extension; 15 hour movie
CCS | SMRT Link 10.0.0 Circular Consensus Sequence Analysis (ccs v5.0.0)
Alignment | pbmm2 1.5.0 (commit v1.5.0-2-g464414e)
Alignment Params | --min-concordance-perc 70.0 --min-length 50 --preset CCS
- Pacbio CCS Stats
value | statistic
--------- | -----------------------------
1,519,099 | hifi reads
13,956 | mean readlength
q29 | median predicted read quality
8 | mean number of passes
- Overall error rates and coverages
median err | mean err | median cover | mean cover
---------- | -------- | ------------ | ----------
QV31.7 | QV26.0 | 4561 | 4557
================================
# Biological Minor Subspecies
- There are biological minor subspecies present in our sample.
- Our best practices strived for as little biological variation as
possible. These variants appear to be biologically irredicible given
our methods.
- The subspecies were indicated by large error events.
- Reads that map to these minor subspecies at these locations should
be appropriately filtered as they differ from the reference
sequence.
location | size | abundance | cause
--------------- | ---- | --------- | -------------------------
1035756-1037553 | ~2k | 20% | prophage inversion
2343429-2343725 | ~300 | 2% | fimB regulatory inversion
================================