Index of /public/dataset/AvianBrainTranscriptome

Icon  Name                    Last modified      Size  Description
[PARENTDIR] Parent Directory - [DIR] ProcessedData/ 2017-09-08 15:40 - [DIR] RawDataDemultiplexed/ 2017-09-08 15:38 - [DIR] RawData/ 2017-08-29 11:35 - [TXT] README.txt 2017-09-08 15:33 8.2K
README  (Last Updated 09/06/2017)

1. INTRODUCTION

	This README file describes the contents in this directory.

	The dataset released in this directory contains the raw Iso-Seq data from brain samples of two vocal learning models, Anna's Hummingbird and Zebra finch, collected on the PacBio(R) Sequel(R) system. 



2. LIBRARY PREPARATION AND SEQUENCING

	Brain tissue was collected immediately post-mortem from Anna's Hummingbird (Calypte anna) and Zebra finch (Taeniopygia guttata) by Erich Jarvis of Rockefeller University and stored in a cryogenic tube or embedded in OCT resin at -80°C. RNA was isolated from 30mg of input material using the Qiagen RNeasy Mini Kit.
   
	First strand cDNA library was generated with barcoded oligo-dT in place of SMART-CDS primer IIA. Reactions were incubated at 42°C for 90 minutes followed by 70°C for 10 minutes. Four PCRs were run for optimization of each samples at 8, 10, 12, and 14 cycles; 11 cycles was determined to be optimal. cDNA was amplified for 11 cycles according to PCR procedures in the Iso-Seq Template Preparation protocol (http://www.pacb.com/wp-content/uploads/Procedure-Checklist-Iso-Seq-Template-Preparation-Sequel-Systems.pdf).

	PCR product were purified and DNA damage repair and end-repair was performed prior to SMRTbell adaptor ligation to DNA. Primer was annealed and polymerase bound to template according to binding calculator.
	
	Barcoded libraries were run on 4 Sequel 1M SMRT cells with 2.0 chemistry in August 2017.
	
	

3. DEMULTIPLEXING SUBREADS

Demultiplexing:

	This workflow depends SMRT Analysis 5.x (http://www.pacb.com/support/software-downloads/) and csvkit (https://csvkit.readthedocs.io/en/1.0.2/)

	The 4 Sequel SMRT cell datasets were combined into a single dataset on SMRT link 5.0.1. The Iso-Seq Classify Only Analysis Application was run on the combined dataset in order to demultiplex and classify CCS reads. Zebra finch has barcode 0; hummingbird has barcode 1. Primer sequences were:
	
>F0
AAGCAGTGGTATCAACGCAGAGTACATGGGG
>R0
CGCACTCTGATATGTGGTACTCTGCGTTGATACCACTGCTT
>F1
AAGCAGTGGTATCAACGCAGAGTACATGGGG
>R1
CTCACAGTCTGTGTGTGTACTCTGCGTTGATACCACTGCTT
	
	Demultiplexing requires this file from the SMRT Link job path:

"CSV_FILE=<job_dir>/tasks/pbcoretools.tasks.gather_csv-1/file.csv"

	For each primer-based barcode, ZMW IDs were identified:

# zebra finch (primer 0)
"csvgrep -c 9 -m '0' $CSV_FILE | cut -d',' -f1 | cut -d'/' -f1-2 > primer0.whitelist.txt"

# hummingbird (primer 1)
"csvgrep -c 9 -m '1' $CSV_FILE | cut -d',' -f1 | cut -d'/' -f1-2 > primer1.whitelist.txt"

	ZMW ID were split based on the four SMRT cells, for example:
	
"grep m54006_170729_232022 primer0.whitelist.txt | cut -d'/' -f2 > primer0.m54006_170729_232022.txt"

"grep m54006_170729_232022 primer1.whitelist.txt | cut -d'/' -f2 > primer1.m54006_170729_232022.txt"

	New BAM datasets were created for each barcode in each movie, for example:
	
"bamsieve --whitelist primer0.m54006_170729_232022.txt m54200_170721_210832.subreadset.xml primer0.m54200_170721_210832.subreadset.xml" 
"bamsieve --whitelist primer1.m54006_170729_232022.txt m54200_170721_210832.subreadset.xml primer1.m54200_170721_210832.subreadset.xml"

	These BAM files were then loaded into SMRT Link and combined into species-specific datasets.



4. GENERATING FULL LENGTH NON-CHIMERIC CCS READS

	From the SMRT Link Iso-Seq Classify Only job directory, we concatenated all of the "isoseq_flnc.fasta" from the 24 pbtranscript.tasks.classify directories into a single file. Using primer and chimera information contained in the header line of each CCS read, we split these sequences until a separate file for each species.

5. DESCRIPTION OF FILES

Raw Data
m54006_170729_232022.subreads.bam - BAM file of subreads for barcoded Sequel Cell  
m54006_170729_232022.subreads.bam.pbi - BAM INDEX file of subreads for barcoded Sequel Cell  
m54006_170729_232022.subreadset.xml - meta data for barcoded Sequel Cell   
m54026_170727_103805.subreads.bam - BAM file of subreads for barcoded Sequel Cell
m54026_170727_103805.subreads.bam.pbi - BAM INDEX file of subreads for barcoded Sequel Cell 
m54026_170727_103805.subreadset.xml - meta data for barcoded Sequel Cell
m54200_170721_210832.subreads.bam - BAM file of subreads for barcoded Sequel Cell
m54200_170721_210832.subreads.bam.pbi - BAM INDEX file of subreads for barcoded Sequel Cell 
m54200_170721_210832.subreadset.xml - meta data for barcoded Sequel Cell   
m54200_170722_173443.subreads.bam - BAM file of subreads for barcoded Sequel Cell
m54200_170722_173443.subreads.bam.pbi - BAM INDEX file of subreads for barcoded Sequel Cell   
m54200_170722_173443.subreadset.xml - meta data for barcoded Sequel Cell

Raw Data, Demultiplexed
primer0.m54006_170729_232022.subreads.bam -  BAM file of subreads for demultiplexed zebra finch sample  
primer0.m54006_170729_232022.subreads.bam.pbi -  BAM INDEX file of subreads for demultiplexed zebra finch sample  
primer0.m54006_170729_232022.subreadset.xml -  meta data for demultiplexed zebra finch sample   
primer0.m54026_170727_103805.subreads.bam -  BAM file of subreads for demultiplexed zebra finch sample
primer0.m54026_170727_103805.subreads.bam.pbi -  BAM INDEX file of subreads for demultiplexed zebra finch sample 
primer0.m54026_170727_103805.subreadset.xml -  meta data for demultiplexed zebra finch sample
primer0.m54200_170721_210832.subreads.bam -  BAM file of subreads for demultiplexed zebra finch sample
primer0.m54200_170721_210832.subreads.bam.pbi -  BAM INDEX file of subreads for demultiplexed zebra finch sample 
primer0.m54200_170721_210832.subreadset.xml -  meta data for demultiplexed zebra finch sample   
primer0.m54200_170722_173443.subreads.bam -  BAM file of subreads for demultiplexed zebra finch sample
primer0.m54200_170722_173443.subreads.bam.pbi -  BAM INDEX file of subreads for demultiplexed zebra finch sample   
primer0.m54200_170722_173443.subreadset.xml -  meta data for demultiplexed zebra finch sample
primer1.m54006_170729_232022.subreads.bam -  BAM file of subreads for demultiplexed hummingbird sample  
primer1.m54006_170729_232022.subreads.bam.pbi -  BAM INDEX file of subreads for demultiplexed hummingbird sample  
primer1.m54006_170729_232022.subreadset.xml -  meta data for demultiplexed hummingbird sample   
primer1.m54026_170727_103805.subreads.bam -  BAM file of subreads for demultiplexed hummingbird sample
primer1.m54026_170727_103805.subreads.bam.pbi -  BAM INDEX file of subreads for demultiplexed hummingbird sample 
primer1.m54026_170727_103805.subreadset.xml -  meta data for demultiplexed hummingbird sample
primer1.m54200_170721_210832.subreads.bam -  BAM file of subreads for demultiplexed hummingbird sample
primer1.m54200_170721_210832.subreads.bam.pbi -  BAM INDEX file of subreads for demultiplexed hummingbird sample 
primer1.m54200_170721_210832.subreadset.xml -  meta data for demultiplexed hummingbird sample   
primer1.m54200_170722_173443.subreads.bam -  BAM file of subreads for demultiplexed hummingbird sample
primer1.m54200_170722_173443.subreads.bam.pbi -  BAM INDEX file of subreads for demultiplexed hummingbird sample   
primer1.m54200_170722_173443.subreadset.xml -  meta data for demultiplexed hummingbird sample 

Processed, Demulitiplexed Data
isoseq_flnc.primer_0.fasta - FASTA file of full length non-nonchimeric CCS reads for zebra finch
isoseq_flnc.primer_1.fasta - FASTA file of full length non-nonchimeric CCS reads for hummingbird
  
  
For Research Use Only. Not for use in diagnostic procedures.  Copyright 2017, Pacific Biosciences of California, Inc. All rights reserved. The data provided in these files is subject to change without notice and Pacific Biosciences assumes no responsibility for any errors or omissions. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences data, products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and to the applicable license terms at http://www.pacificbiosciences.com/licenses.html. Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT, SMRTbell and Iso-Seq are trademarks of Pacific Biosciences. BluePippin and SageELF are trademarks of Sage Science, Inc. NGS-go and NGSengine are trademarks of GenDx. All other trademarks are the sole property of their respective owners.