================================ # E.coli Gold Standard (EGS) #GOAL - Share gold-standard E.coli sample data: - vetted reference sequence - Pacbio sequencing data - documentation of biologically irreducible minor variant contaminants to be filtered away ================================ # Data Locations Reference and Sequencing Reads - Gold standard E.coli sample sequencing PACB data type | size | link / file ------------- | --------------- | ----------------------------------- Reference | 4,639,002 | ecoli_pbi_Jan2021_majorStrain.fasta CCS HIFI Reads| 21,591,167,182 | m64004_200618_002500.hifi_reads.bam Raw Subreads | 464,943,830,201 | m64004_200618_002500.subreads.bam ================================ # Data Methods - Pacbio Data Methods PACB key | value ---------------- | ---------------------------------------------------------------------- Sample | ATCC E.coli K12 MG1655 prep WL_052920 Shearing | 15 kb Megaruptor3, Shear Speed 37, 10 ug/column, 500 uL per column Size selection | BluePippin U1 12 kb to 18 kb Library prep | TPK-1, Express V2, 4 EM, WL_061420b Sequencing | Sequel System II with BINDINGKIT=101-820-500 SEQUENCINGKIT=101-826-100 Run time | 2.9 hour pre-extension; 15 hour movie CCS | SMRT Link 10.0.0 Circular Consensus Sequence Analysis (ccs v5.0.0) Alignment | pbmm2 1.5.0 (commit v1.5.0-2-g464414e) Alignment Params | --min-concordance-perc 70.0 --min-length 50 --preset CCS - Pacbio CCS Stats value | statistic --------- | ----------------------------- 1,519,099 | hifi reads 13,956 | mean readlength q29 | median predicted read quality 8 | mean number of passes - Overall error rates and coverages median err | mean err | median cover | mean cover ---------- | -------- | ------------ | ---------- QV31.7 | QV26.0 | 4561 | 4557 ================================ # Biological Minor Subspecies - There are biological minor subspecies present in our sample. - Our best practices strived for as little biological variation as possible. These variants appear to be biologically irredicible given our methods. - The subspecies were indicated by large error events. - Reads that map to these minor subspecies at these locations should be appropriately filtered as they differ from the reference sequence. location | size | abundance | cause --------------- | ---- | --------- | ------------------------- 1035756-1037553 | ~2k | 20% | prophage inversion 2343429-2343725 | ~300 | 2% | fimB regulatory inversion ================================