README (Last updated 01/04/2020) Edited by: Gregory Concepcion (gconcepcion@pacb.com) ******** ABOUT ******** The Sequoia sempervirens genome was sequenced and assembled at PacBio in February 2020 and provided as a gift to the community. A 24Kb PacBio HiFi Library was prepared and sequenced and the raw HiFi data was accessioned and deposited in NCBI under BioProject PRJNA606797. https://www.ncbi.nlm.nih.gov/bioproject/PRJNA606797 HiFi reads used to generate the assembly can be found in the SRA: https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP251156 The original assembly can still be found here: https://downloads.pacbcloud.com/public/dataset/redwood2020/ with accompanying blog post here: https://www.pacb.com/blog/tackling-a-giant-genome/ ******* ASSEMBLY UPDATE ******* The raw data was re-assembled in just over 3 days using Hifiasm (v0.12-r304) on a computer with 80 cores and 792Gb of ram using this command line: $ hifiasm -o redwood_v12 -t 80 *.fastq This directory contains the fasta files generated from the assembly graphs output by hifiasm: redwood_v12.a_ctg.fa.gz - Primary Contigs redwood_v12.p_ctg.fa.gz - Alternate Contigs redwood_v12.p_utg.fa.gz - Haplotype-resolved unitigs And also a copy of the the Haplotype-resolved unitigs mapped to the Primary Contigs: redwood_v12_p_ctg_redwood_v12_p_utg.sorted.bam - Haplotype-resolved unitigs mapped to Primary Contigs Assembly stats - 33-fold dataset: p_ctg a_ctg p_utg contigs 14,912 30,260 36,649 esize 6,967,288 2,047,826 4,335,655 max 43,025,233 10,784,019 25,744,016 n50 5,503,715 1,569,162 3,443,399 n90 1,568,956 324,829 860,114 n95 982,145 144,292 482,330 total_bp 35,310,121,336 15,756,956,673 50,710,320,000 ******* IsoSeq ******* A companion IsoSeq dataset from the same tree was also recently generated and details can be found here: https://downloads.pacbcloud.com/public/dataset/redwood2020/IsoSeq ****** Reference ****** https://www.pacb.com/blog/tackling-a-giant-genome/