Index of /public/dataset/MAS-Seq

Icon  Name                           Last modified      Size  Description
[PARENTDIR] Parent Directory - [DIR] DATA-MAS-Revio-PBMC-1/ 2024-04-16 12:33 - [DIR] DATA-MAS-Revio-PBMC-2/ 2024-04-16 12:33 - [DIR] DATA-MAS-SQ2-PBMC_10kcells/ 2024-04-16 12:33 - [DIR] DATA-MAS-SQ2-PBMC_5kcells/ 2024-04-16 12:33 - [DIR] DATA-MAS-SQ2_HG002_10kcells/ 2024-04-16 12:34 - [DIR] DATA-Revio-Kinnex-HG002-10x5p/ 2024-02-15 13:29 - [DIR] DATA-Revio-Kinnex-PBMC-10x3p/ 2024-02-15 13:29 - [DIR] DATA-Revio-Kinnex-PBMC-10x5p/ 2024-02-15 13:31 - [DIR] PLOT-scripts/ 2022-11-02 16:02 - [DIR] REF-10x_barcodes/ 2023-02-07 14:01 - [DIR] REF-10x_primers/ 2022-08-29 10:43 - [DIR] REF-MAS_adapters/ 2022-08-29 08:49 - [DIR] REF-pigeon_ref_sets/ 2022-08-23 08:58 - [TXT] README.txt 2024-02-20 09:46 7.1K
README  (Last Updated 02/20/2024)

********************
INTRODUCTION
********************

This README file describes the contents in this directory.

The dataset generated here contains single-cell RNA-Seq data generated 
using the MAS-Seq for 10x Single Cell 3' kit ("MAS") [1] and the 
KinnexTM single-cell RNA kit ("Kinnex") [2].

The MAS-Seq libraries were sequenced on the Sequel® II/IIe and Revio 
systems and processed using SMRT® Link v11.1 [3] or BioConda [4].

The Kinnex libraries were sequenced on the Revio system and processed using
SMRT Link v13.1 [5].

To learn more about Kinnex, visit: https://pacb.com/kinnex


********************
SAMPLE
********************

All PBMC samples were purchased from BioIVT. Either fresh or cryopreserved.

All HG002/GM24385 10k cells were purchased from Coriell.

All cDNA libraries were generated using the 10x Chromium Next GEM 
Single Cell 3’ kit (v3.1) or Single Cell 5' kit (v2) with a 10x Chromium 
Next GEM Chip G on a 10x Chromium X system.

Below is a description of the kits, systems, samples used for each directory.

DATA-Revio-Kinnex-HG002-10x5p: Kinnex kit, Revio, HG002, 10x 5' kit
DATA-Revio-Kinnex-PBMC-10x3p : Kinnex kit, Revio, PBMC, 10x 3' kit
DATA-Revio-Kinnex-PBMC-10x5p : Kinnex kit, Revio, PBMC, 10x 5' kit

DATA-MAS-Revio-PBMC-1     : MAS-Seq kit, Revio, PBMC, 10x 3' kit
DATA-MAS-Revio-PBMC-2     : MAS-Seq kit, Revio, PBMC, 10x 3' kit
DATA-MAS-SQ2-PBMC_10kcells: MAS-Seq kit, Sequel IIe, PBMC, 10x 3' kit
DATA-MAS-SQ2-PBMC_5kcells : MAS-Seq kit, Sequel IIe, PBMC, 10x 3' kit
DATA-SQ2_HG002_10kcells   : MAS-Seq kit, Sequel IIe, HG002, 10x 3' kit



********************
METHODS
********************

Library Preparation: 

Procedure & Checklist - Preparing MAS-Seq libraries using MAS-Seq for 10x Single Cell 3’ kit
or
Procedure & checklist - Preparing Kinnex libraries using Kinnex single-cell RNA kit


Sequencing: 

Sequel IIe system with Sequel II binding kit 3.2 and Sequel II sequencing kit 2.0 (4 rxn)
or
Revio system with Revio polymerase kit and Revio sequencing plate


Run time: 

Sequel II/IIe – 30 hr movie + 2hr pre-extension + adaptive loading
Revio – 24 hr movie


Analysis: 

Read Segmentation and Single-cell Iso-Seq workflow (SL v11.1 and v13.1)
   
********************
FILE DESCRIPTION
********************

Each sample will contain the following folders:

========================
0-CCS
========================

This directory contains HiFi reads produced either directly on-instrument or have gone through 
CCS analysis on SMRT Link. 

0-CCS/
|---- <movie>.hifi_reads.bam
|---- <movie>.hifi_reads.bam.pbi



========================
1-Sreads
========================

This directory contains segmented reads that have been processed by 
Read segmentation (or skera [6]) to produce S-reads that represent the original cDNA molecules. 
segmented.bam contains S-reads that have the expected order of MAS/Kinnex primers and is the file 
used in carrying the subsequent analyses.

1-Sreads/
|---- segmented.bam
|---- segmented.non_passing.bam


========================
2-DeduplicatedReads
========================

This directory contains deduplicated reads that have been through barcode correction 
(using barcode whitelist) and UMI deduplication. The dedup reads are then used for subsequent 
mapping and transcript analyses.


2-DeduplicatedReads/
├── scisoseq.5p--3p.tagged.refined.corrected.sorted.dedup.bam 
├── scisoseq.5p--3p.tagged.refined.corrected.sorted.dedup.bam.bai 
├── scisoseq.5p--3p.tagged.refined.corrected.sorted.dedup.bam.pbi 
└── scisoseq.5p--3p.tagged.refined.corrected.sorted.dedup.fasta 





========================
3-CollapsedTranscripts
========================

This directory lists the  total set of unique transcripts as a result of mapping the dedup reads 
to the genome, collapsed into transcripts, classified and filtered against Gencode using pigeon. 
Read about pigeon at [3]. 

The classification.txt and junctions.txt are the output from pigeon showing the per-isoform and 
per-junction-per-isoform classification results against Gencode annotation. The GFF3 file shows the 
exonic structures of the transcript isoforms. The group.txt file is an intermediate file required 
for generating Seurat-compatible matrix in the next step, and is kept here for those who wish to 
re-generate matrices.


3-CollapsedTranscripts/
├── scisoseq_classification.filtered_lite_classification.txt 
├── scisoseq_classification.filtered_lite_junctions.txt 
├── scisoseq.mapped_transcripts.collapse.group.txt 
└── scisoseq_transcripts.sorted.filtered_lite.gff


========================
4-SeuratMatrix
========================

This directory contains the gene- and isoform-level count matrix compatible with common tertiary 
analyses tools such as Seurat. 

The NoNovelGenesIsoforms/ subdirectory contains only known genes (for genes_seurat/) and known+novel 
isoforms from known genes (for isoforms_seurat). Ribo/mito genes are excluded. 

The WithNovelGenesIsoforms/ subdirectory contains both known and novel genes. Ribo/mito genes are excluded.


4-SeuratMatrix/
├── NoNovelGenesIsoforms
│   ├── cmd.sh
│   ├── genes_seurat
│   │   ├── barcodes.tsv
│   │   ├── genes.tsv
│   │   └── matrix.mtx
│   └── isoforms_seurat
│       ├── barcodes.tsv
│       ├── genes.tsv
│       └── matrix.mtx
└── WithNovelGenesIsoforms
    ├── cmd.sh
    ├── genes_seurat
    │   ├── barcodes.tsv
    │   ├── genes.tsv
    │   └── matrix.mtx
    └── isoforms_seurat
        ├── barcodes.tsv
        ├── genes.tsv
        └── matrix.mtx


4. REFERENCES

[1] Procedure & Checklist - Preparing MAS-Seq libraries using MAS-Seq for 10x Single Cell 3’ kit
https://www.pacb.com/wp-content/uploads/Procedure-checklist-preparing-MAS-Seq-libraries-using-MAS-Seq-for-10x-single-cell-3-kit.pdf

[2] Procedure & checklist - Preparing Kinnex libraries using Kinnex single-cell RNA kit
https://www.pacb.com/wp-content/uploads/Procedure-checklist-Preparing-Kinnex-libraries-using-Kinnex-single-cell-RNA-kit.pdf

[3] SMRT Link v11.1 User Guide 
https://www.pacb.com/wp-content/uploads/SMRT_Link_User_Guide_v11.1.pdf

[4] SMRT Link v13.1 User Guide 
Coming soon

[5] isoseq.how https://isoseq.how/

[6] skera.how https://skera.how/


Research use only. Not for use in diagnostic procedures. © 
2024 Pacific Biosciences of California, Inc. (“PacBio”). 
All rights reserved. The data provided in these files and the 
information in this document are subject to change without notice. 
PacBio assumes no responsibility for any errors or omissions 
in the files or this document. Certain notices, terms, conditions 
and/or use restrictions may pertain to your use of PacBio 
products and/or third-party products. Refer to the applicable 
PacBio terms and conditions of sale and to the applicable license 
terms at pacb.com/license.  Pacific Biosciences, the PacBio logo, 
PacBio, Circulomics, Omniome, SMRT, SMRTbell, Iso-Seq, Sequel, 
Nanobind, SBB, Revio, Onso, Apton, Kinnex, and PureTarget are 
trademarks of PacBio.