Index of /public/dataset/redwood2020/hifiasm/v12

Icon  Name                                               Last modified      Size  Description
[PARENTDIR] Parent Directory - [TXT] README.txt 2021-01-06 10:00 2.1K [   ] redwood_v12.a_ctg.fa.gz 2020-12-29 14:04 4.4G [   ] redwood_v12.a_ctg.fa.gz.md5 2021-01-04 12:03 58 [   ] redwood_v12.a_ctg.fa.stats 2020-12-29 14:04 224 [   ] redwood_v12.p_ctg.fa.gz 2020-12-29 14:09 9.8G [   ] redwood_v12.p_ctg.fa.gz.md5 2021-01-04 11:58 58 [   ] redwood_v12.p_ctg.fa.stats 2020-12-29 14:09 227 [   ] redwood_v12.p_utg.fa.gz 2020-12-29 14:18 14G [   ] redwood_v12.p_utg.fa.gz.md5 2021-01-04 12:01 58 [   ] redwood_v12.p_utg.fa.stats 2020-12-29 14:18 225 [   ] redwood_v12_p_ctg_redwood_v12_p_utg.sorted.bam 2021-01-05 16:02 16G [   ] redwood_v12_p_ctg_redwood_v12_p_utg.sorted.bam.bai 2021-01-05 16:04 20M [   ] redwood_v12_p_ctg_redwood_v12_p_utg.sorted.bam.md5 2021-01-05 18:07 81
README (Last updated 01/04/2020)

Edited by: Gregory Concepcion (gconcepcion@pacb.com)

********
ABOUT
********

The Sequoia sempervirens genome was sequenced and assembled at PacBio in
February 2020 and provided as a gift to the community. A 24Kb PacBio HiFi 
Library was prepared and sequenced and the raw HiFi data was accessioned 
and deposited in NCBI under BioProject PRJNA606797.
https://www.ncbi.nlm.nih.gov/bioproject/PRJNA606797

HiFi reads used to generate the assembly can be found in the SRA:
https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP251156

The original assembly can still be found here:

https://downloads.pacbcloud.com/public/dataset/redwood2020/

with accompanying blog post here:

https://www.pacb.com/blog/tackling-a-giant-genome/

*******
ASSEMBLY UPDATE
*******

The raw data was re-assembled in just over 3 days using Hifiasm (v0.12-r304)
on a computer with 80 cores and 792Gb of ram using this command line:
    
    $ hifiasm -o redwood_v12 -t 80 *.fastq

This directory contains the fasta files generated from the assembly 
graphs output by hifiasm:  

redwood_v12.a_ctg.fa.gz - Primary Contigs
redwood_v12.p_ctg.fa.gz - Alternate Contigs
redwood_v12.p_utg.fa.gz - Haplotype-resolved unitigs

And also a copy of the the Haplotype-resolved unitigs mapped to the Primary
Contigs:

redwood_v12_p_ctg_redwood_v12_p_utg.sorted.bam - Haplotype-resolved unitigs
mapped to Primary Contigs

Assembly stats - 33-fold dataset:

                   p_ctg          a_ctg          p_utg
 contigs          14,912         30,260         36,649
 esize         6,967,288      2,047,826      4,335,655
 max          43,025,233     10,784,019     25,744,016
 n50           5,503,715      1,569,162      3,443,399
 n90           1,568,956        324,829        860,114
 n95             982,145        144,292        482,330
 total_bp 35,310,121,336 15,756,956,673 50,710,320,000

*******
IsoSeq
*******

A companion IsoSeq dataset from the same tree was also recently
generated and details can be found here:

https://downloads.pacbcloud.com/public/dataset/redwood2020/IsoSeq


******
Reference
******

https://www.pacb.com/blog/tackling-a-giant-genome/