Index of /public/dataset/Kinnex-full-length-RNA

 Name                                     Last modified      Size  Description
 Parent Directory                                              -   
 DATA-Vega-UHRR2024/                      2024-10-29 13:25    -   
 DATA-Revio-UHRR2024/                     2024-10-29 06:33    -   
 DATA-RevioSPRQ-UHRR2024/                 2024-10-29 06:33    -   
 DATA-Revio-IDT-BrainUHRR/                2024-09-23 13:31    -   
 DATA-EXAMPLE/                            2024-09-16 11:42    -   
 MISC-SCRIPTS/                            2024-07-08 12:57    -   
 DATA-Revio-SCRI-Sample2-Heart-Trisomy21/ 2023-11-29 02:03    -   
 DATA-Revio-SCRI-Sample6-Cerebellum/      2023-11-28 11:09    -   
 DATA-Revio-SCRI-Sample5-Cerebellum/      2023-11-28 10:00    -   
 DATA-Revio-SCRI-Sample8-Cerebellum/      2023-11-28 04:35    -   
 DATA-Revio-SCRI-Sample4-Heart-Control/   2023-11-27 13:34    -   
 DATA-Revio-SCRI-Sample3-Heart-Control/   2023-11-27 13:27    -   
 DATA-Revio-SCRI-Sample1-Heart-Trisomy21/ 2023-11-27 13:21    -   
 REF-primers/                             2023-10-25 18:05    -   
 DATA-SQ2-UHRR-Monomer/                   2023-10-25 17:43    -   
 DATA-SQ2-UHRR/                           2023-10-24 17:05    -   
 DATA-Revio-UHRR/                         2023-10-24 16:39    -   
 DATA-Revio-HG002-1/                      2023-10-24 15:56    -   
 REF-pigeon_ref_sets/                     2022-08-23 08:58    -   
 README.txt                               2024-10-29 09:57  6.6K

********************
INTRODUCTION
********************

Last Updated: 10/29/2024

This README file describes the contents of this directory.

This is a data release for Kinnex full-length RNA kit. The Kinnex full-length 
RNA libraries [1] were sequenced on the Sequel® II, IIe, and Revio Systems 
and processed using SMRT® Link v13.0 [2] or command-line/BioConda version [3].

To learn more about the Kinnex full-length RNA kit from PacBio for full-length 
RNA sequencing, read the application note [4].



********************
SAMPLE
********************

====================================
HG002
====================================
Vendor – Coriell

HG002 cells were purchased from Coriell and grown in RPMI1640 using Glutamax 
media with 16% FBS and 0.5% Penicillin-Streptomycin. RNA was isolated from 
10x10^6 HG002 cells using Trizol reagent and Phasemaker tubes. RNA quality was 
assessed using a Bioanalyzer.

====================================
UHRR (Universal Human Reference RNA)
====================================
Vendor – Agilent
Part No - 740000
UHRR total RNA was purchased from Agilent and directly used for cDNA generation.


====================================
Heart and Cerebellum samples
====================================
Collaborator – Seattle Children’s Research Institute (SCRI)


Two heart samples from prenatal specimens with trisomy 21 were obtained from the 
Birth Defects Research Laboratory tissue repository. Total RNA was isolated from 
50 mg of fresh frozen tissue using the Promega Maxwell Kit.

Two heart samples from prenatal specimens were obtained from the Birth Defects 
Research Laboratory tissue repository. Total RNA was isolated from 50 mg of fresh 
frozen tissue using the Promega Maxwell RNA Extraction Kit.

Three cerebellum samples were obtained from the Birth Defects Research Laboratory 
tissue repository. Total RNA was isolated from fresh frozen tissue sections or 
following laser capture microdissection using the Qiagen RNeasy Micro Kit as 
described in PMID:34140698.

The collaborator has granted permission to release this dataset.


********************
METHODS
********************

Library Preparation: 
Procedure & Checklist - Preparing Kinnex libraries using Kinnex full-length RNA kit [1]

Sequencing: 
Revio system with Revio polymerase kit and Revio sequencing plate OR,
             with Revio SPRQ polymerase kit and Revio SPRQ sequencing plate. 
Sequel II and Sequel IIe system with Sequel II Binding kit 3.2 and Sequel II sequencing kit 2.0.

Run time: 
Revio – 30 hr movie
Sequel II/IIe – 30 hr movie

Analysis: 
Read segmentation and Iso-Seq workflow

   
********************
FILE DESCRIPTION
********************

Each sample contains the following folders:


========================
1-Sreads
========================

This directory contains segmented reads that were processed by Read segmentation
 to produce S-reads that represent the original cDNA molecules. 

segmented.bam contains S-reads that have the expected order of MAS adapters and 
is the file used in performing the subsequent analyses.

1-Sreads/
├── segmented.bam
├── segmented.bam.pbi
└── segmented.summary.json


========================
2-FLNC
========================

This directory contains full-length, non-concatemer (FLNC) reads with the 
5' and 3' cDNA primer as well as the polyA tail removed. FLNC reads are oriented 
from 5'->3' based on the asymmetry of the cDNA primers.


2-FLNC/
├── flnc.bam
├── flnc.bam.pbi
├── flnc.filter_summary.json
└── flnc.report.csv  



========================
3-ClusterMap
========================

This directory contains HQ cluster sequences by clustering FLNC reads at the 
isoform level de novo. The HQ cluster reads are then mapped using pbmm2 
(minimap2 wrapper) to hg38.

3-ClusterMap
├── sample0.transcripts.bam (or clustered.bam)
├── sample0.transcripts.bam.pbi (or clustered.bam.pbi)
├── mapped.bam
└── mapped.bam.bai


========================
4-Collapse
========================

This directory contains the result of collapsing redundant isoforms after HQ 
cluster sequences were mapped to the genome. Each collapsed isoform has a unique 
ID `PB.X.Y` with an associated FLNC read count.

4-Collapse
├── collapsed_transcripts.fasta 
├── collapsed_transcripts.flnc_count.txt 
├── collapsed_transcripts.gff 
├── collapsed_transcripts.group.txt
└── collapsed_transcripts.read_stat.txt 


========================
5-Pigeon
========================


This directory lists the total set of unique transcripts as a result of mapping 
the clustered reads to the genome, collapsed into transcripts, and classified 
and filtered against Gencode using pigeon. 
Read about pigeon at [3]. 

The classification.txt and junctions.txt files are the output from pigeon showing 
the per-isoform and per-junction-per-isoform classification results against 
Gencode annotation. The GFF file shows the exonic structures of the transcript isoforms. 

The isoseq_saturation.txt file shows the rarefaction/saturation result of subsampling 
FLNC reads based on the post-filter pigeon result.


5-Pigeon/
├── isoseq_classification.filtered.report.json  
├── isoseq_classification.filtered_lite_classification.txt 
├── isoseq_classification.filtered_lite_junctions.txt  
├── isoseq_transcripts.sorted.filtered_lite.gff 
└── isoseq_saturation.txt


********************
REFERENCES
********************

[1] Procedure & Checklist - Preparing Kinnex libraries using Kinnex full-length RNA kit 
https://www.pacb.com/wp-content/uploads/Procedure-checklist-Preparing-Kinnex-libraries-using-the-Kinnex-full-length-RNA-kit.pdf

[2] SMRT Link software https://pacb.com/software

[3] isoseq.how https://isoseq.how/

[4] https://www.pacb.com/wp-content/uploads/Application-note-Kinnex-full-length-RNA-kit-for-isoform-sequencing.pdf



Research use only. Not for use in diagnostic procedures.

 © 2024 Pacific Biosciences of California, Inc. (“PacBio”). All rights reserved. 
The data provided in these files and the information in this document are subject 
to change without notice. PacBio assumes no responsibility for any errors or 
omissions in the files or this document. Certain notices, terms, conditions and/or 
use restrictions may pertain to your use of PacBio products and/or third-party 
products. Refer to the applicable PacBio terms and conditions of sale and to the 
applicable license terms at pacb.com/license. Pacific Biosciences, the PacBio logo, 
PacBio, Circulomics, Omniome, SMRT, SMRTbell, Iso-Seq, Sequel, Nanobind, SBB, 
Revio, Onso, Apton, and Kinnex are trademarks of PacBio. All other trademarks are 
the sole property of their respective owners.