pasterprinting.blogg.se - Clc main workbench tutorial aligment to reference

CLC MAIN WORKBENCH TUTORIAL ALIGMENT TO REFERENCE HOW TO
CLC MAIN WORKBENCH TUTORIAL ALIGMENT TO REFERENCE PASSWORD
CLC MAIN WORKBENCH TUTORIAL ALIGMENT TO REFERENCE DOWNLOAD

“-i” indicates what attribute we will be using from the annotation file, here it is the PAC transcript ID.

“-t” indicates the feature from the annotation file we will be using, which in our case will be exons. “-r” indicates the order that the reads were generated, for us it was by alignment position. “-s” indicates we do not have strand specific counts. bam output files are also stored in this directory.Ĭonvert BAM Files to Raw Counts with HTSeq:įinally, we will use HTSeq to transform these mapped reads into counts that we can analyze with R. common/RNASeq_Workshop/Soybean/STAR_HTSEQ_mapping as the file star_soybean.sh. The script for mapping all six of our trimmed reads to. BAM files binary files that will be converted to raw counts in our next step. The –genomeDir flag refers to the directory in which your indexed genome is located. Now that you have your genome indexed, you can begin mapping your trimmed reads with the following script: STAR -genomeDir /common/RNASeq_Workshop/Soybean/gmax_genome/ -readFilesIn trimmed_SRR391535.fastq -runThreadN 8 -outSAMtype BAM SortedB圜oordinate -outFileNamePrefix SRR391535 common/RNASeq_Workshop/Soybean/gmax_genome The assembly file, annotation file, as well as all of the files created from indexing the genome can be found in Indexing the genome allows for more efficient mapping of the reads to the genome. You will likely have to alter this script slightly to reflect the directory that you are working in and the specific names you gave your files, but the general idea is there. Now that you have the genome and annotation files, you will create a genome index using the following script: STAR -runMode genomeGenerate -genomeDir /common/RNASeq_Workshop/Soybean/gmax_genome/ -genomeFastaFiles /common/RNASeq_Workshop/Soybean/gmax_genome/Gmax_275_v2 -sjdbGTFfile /common/RNASeq_Workshop/Soybean/gmax_genome/Gmax_275_Wm82.a2.v1.gene_exons -sjdbGTFtagExonParentTranscript Parent -sjdbOverhang 100 -runThreadN 8 Having the correct files is important for annotating the genes with Biomart later on.

CLC MAIN WORKBENCH TUTORIAL ALIGMENT TO REFERENCE DOWNLOAD

Once you’ve done that, you can download the assembly file Gmax_275_v2 and the annotation file Gmax_275_Wm82.a2.v1.gene_exons.

CLC MAIN WORKBENCH TUTORIAL ALIGMENT TO REFERENCE PASSWORD

You will need to create a user name and password for this database before you download the files. The files I used can be found at the following link: The output trimmed fastq files are also stored in this directory.įor this next step, you will first need to download the reference genome and annotation file for Glycine max (soybean). common/RNASeq_Workshop/Soybean/Quality_Control as the file sickle_soybean.sh. The script for running quality control on all six of our samples can be found in sickle se -f SRR391535.fastq -t sanger -o trimmed_SRR391535.fastq -q 35 -l 45 The trimmed output files are what we will be using for the next steps of our analysis. The -f flag designates the input file, -o is the output file, -q is our minimum quality score and -l is the minimum read length. We are using unpaired reads, as indicated by the “se” flag in the script below. Step one is to perform quality control on the reads using Sickle. Quality Control on the Reads Using Sickle: module load sratoolkit/2.8.1 fastq-dump SRR391535 The fastq files themselves are also already saved to this same directory. common/RNASeq_Workshop/Soybean/Quality_Control as the file fastq-dump.sh. SRA files and converting them to fastq can be found in They can be found in results 13 through 18 of the following NCBI search: The samples we will be using are described by the following accession numbers SRR391535, SRR391536, SRR391537, SRR391538, SRR391539, and SRR391541. The Bench Scientist’s Guide to statistical Analysis of RNA-Seq Data The paper that these samples come from (which also serves as a great background reading on RNA-seq) can be found here: Each condition was done in triplicate, giving us a total of six samples we will be working with. The data we will be using are comparative transcriptomes of soybeans grown at either ambient or elevated O 3 levels. The packages we’ll be using can be found here: Page by Dister Deoss Most of this will be done on the BBC server unless otherwise stated. We will be going through quality control of the reads, alignment of the reads to the reference genome, conversion of the files to raw counts, analysis of the counts with DeSeq2, and finally annotation of the reads using Biomart.

CLC MAIN WORKBENCH TUTORIAL ALIGMENT TO REFERENCE HOW TO

This tutorial will serve as a guideline for how to go about analyzing RNA sequencing data when a reference genome is available.