Index of /public/dataset/redwood2020/hifiasm/v12
Name Last modified Size Description
Parent Directory -
redwood_v12.a_ctg.fa.gz 2020-12-29 14:04 4.4G
redwood_v12.a_ctg.fa.stats 2020-12-29 14:04 224
redwood_v12.p_ctg.fa.gz 2020-12-29 14:09 9.8G
redwood_v12.p_ctg.fa.stats 2020-12-29 14:09 227
redwood_v12.p_utg.fa.gz 2020-12-29 14:18 14G
redwood_v12.p_utg.fa.stats 2020-12-29 14:18 225
redwood_v12.p_ctg.fa.gz.md5 2021-01-04 11:58 58
redwood_v12.p_utg.fa.gz.md5 2021-01-04 12:01 58
redwood_v12.a_ctg.fa.gz.md5 2021-01-04 12:03 58
redwood_v12_p_ctg_redwood_v12_p_utg.sorted.bam 2021-01-05 16:02 16G
redwood_v12_p_ctg_redwood_v12_p_utg.sorted.bam.bai 2021-01-05 16:04 20M
redwood_v12_p_ctg_redwood_v12_p_utg.sorted.bam.md5 2021-01-05 18:07 81
README.txt 2021-01-06 10:00 2.1K
README (Last updated 01/04/2020)
Edited by: Gregory Concepcion (gconcepcion@pacb.com)
********
ABOUT
********
The Sequoia sempervirens genome was sequenced and assembled at PacBio in
February 2020 and provided as a gift to the community. A 24Kb PacBio HiFi
Library was prepared and sequenced and the raw HiFi data was accessioned
and deposited in NCBI under BioProject PRJNA606797.
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA606797
HiFi reads used to generate the assembly can be found in the SRA:
https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP251156
The original assembly can still be found here:
https://downloads.pacbcloud.com/public/dataset/redwood2020/
with accompanying blog post here:
https://www.pacb.com/blog/tackling-a-giant-genome/
*******
ASSEMBLY UPDATE
*******
The raw data was re-assembled in just over 3 days using Hifiasm (v0.12-r304)
on a computer with 80 cores and 792Gb of ram using this command line:
$ hifiasm -o redwood_v12 -t 80 *.fastq
This directory contains the fasta files generated from the assembly
graphs output by hifiasm:
redwood_v12.a_ctg.fa.gz - Primary Contigs
redwood_v12.p_ctg.fa.gz - Alternate Contigs
redwood_v12.p_utg.fa.gz - Haplotype-resolved unitigs
And also a copy of the the Haplotype-resolved unitigs mapped to the Primary
Contigs:
redwood_v12_p_ctg_redwood_v12_p_utg.sorted.bam - Haplotype-resolved unitigs
mapped to Primary Contigs
Assembly stats - 33-fold dataset:
p_ctg a_ctg p_utg
contigs 14,912 30,260 36,649
esize 6,967,288 2,047,826 4,335,655
max 43,025,233 10,784,019 25,744,016
n50 5,503,715 1,569,162 3,443,399
n90 1,568,956 324,829 860,114
n95 982,145 144,292 482,330
total_bp 35,310,121,336 15,756,956,673 50,710,320,000
*******
IsoSeq
*******
A companion IsoSeq dataset from the same tree was also recently
generated and details can be found here:
https://downloads.pacbcloud.com/public/dataset/redwood2020/IsoSeq
******
Reference
******
https://www.pacb.com/blog/tackling-a-giant-genome/