RNA-seq Analysis with Tuxedo Tools

The RNA-seq pipeline “Tuxedo” consists of the TopHat spliced read mapper, which internally uses Bowtie or Bowtie 2 short read aligners, and several Cufflinks tools that allow for the assembly of transcripts, estimation of their abundances, and testing for differential expression and regulation in RNA-seq samples.

Environment Requirements

The pipeline is currently available on Linux and macOS systems only.

How to Use This Sample

If you haven’t used the workflow samples in UGENE before, look at the “How to Use Sample Workflows” section of the documentation.

Workflow Sample Location

The workflow sample “RNA-seq Analysis with Tuxedo Tools” can be found in the “NGS” section of the Workflow Designer samples.

Workflow Image

There are two types of short read workflows: single-end and paired-end reads. For both of them, there are three analysis types:

  1. Full Tuxedo Pipeline - use this pipeline to analyze multiple samples with TopHat, Cufflinks, Cuffmerge, and Cuffdiff tools.
  2. Single-sample Tuxedo Pipeline - use this pipeline to analyze a single sample with TopHat and Cufflinks tools.
  3. No-new-transcripts Tuxedo Pipeline - use this pipeline to analyze multiple samples with TopHat and Cuffdiff tools only, i.e., without producing new transcripts.

For the Full Tuxedo Pipeline analysis type and single-end reads type, the following workflow appears:

For the Full Tuxedo Pipeline analysis type and paired-end reads type, the following workflow appears:

For the Single-sample Tuxedo Pipeline analysis type and single-end reads type, the following workflow appears:

For the Single-sample Tuxedo Pipeline analysis type and paired-end reads type, the following workflow appears:

For the No-new-transcripts Tuxedo Pipeline analysis type and single-end reads type, the following workflow appears:

For the No-new-transcripts Tuxedo Pipeline analysis type and paired-end reads type, the following workflow appears:

Workflow Wizard

All of these workflows have similar wizards. For the Full Tuxedo Pipeline analysis type and the paired-end reads type, the wizard has 7 pages.

  1. Input Data: Here, you need to input RNA-seq short reads in FASTA or FASTQ formats. Multiple datasets with different reads can be added.

  2. Cuffdiff Samples: Here, you need to divide the input datasets into samples for running Cuffdiff. There must be at least 2 samples. It is not necessary to have the same number of datasets (replicates) for each sample. The sample names will be used by Cuffdiff as labels, which will be included in various output files produced by Cuffdiff.

  3. TopHat Settings: Here, you can configure TopHat settings. To show additional parameters, click on the + button.

    The following parameters are available:

    ParameterDescription
    Bowtie index directoryThe directory with the Bowtie index for the reference sequence.
    Bowtie index basenameThe basename of the Bowtie index for the reference sequence.
    Bowtie versionSpecifies which Bowtie version should be used.
    Known transcript fileA set of gene model annotations and/or known transcripts.
    Raw junctionsThe list of raw junctions.
    Mate inner distanceExpected (mean) inner distance between mate pairs.
    Mate standard deviationStandard deviation for the distribution on inner distances between mate pairs.
    Library typeSpecifies RNA-seq protocol.
    No novel junctionsOnly look for reads across junctions indicated in the supplied GFF or junctions file. (Ignored if Raw junctions or Known transcript file is not set.)
    Max multihitsAllows up to this many alignments to the reference for a given read, and suppresses all alignments for reads with more than this many alignments.
    Segment lengthEach read is cut up into segments, each at least this long. These segments are mapped independently.
    Fusion searchTurn on fusion mapping.
    Transcriptome max hitsOnly align the reads to the transcriptome and report only those mappings as genomic mappings.
    Prefilter multihitsDirects TopHat to first align the reads to the whole genome and exclude multi-mapped reads based on the Max multihits option value.
    Min anchor lengthThe anchor length. TopHat will report junctions spanned by reads with at least this many bases on each side of the junction.
    Splice mismatchesThe maximum number of mismatches that may appear in the anchor region of a spliced alignment.
    Read mismatchesFinal read alignments having more than this many mismatches are discarded.
    Segment mismatchesRead segments are mapped independently, allowing up to this many mismatches in each segment alignment.
    Solexa 1.3 qualsUse for FASTQ files from pipeline 1.3 or later, where quality scores are encoded in Phred-scaled base-64.
    Bowtie -n modeUse -n in Bowtie for initial read mapping. Read segments are always mapped using -v option.
    Bowtie tool pathThe path to the Bowtie external tool.
    SAMtools tool pathThe path to the SAMtools tool. (Note that the tool is available in the UGENE External Tool Package.)
    TopHat tool pathThe path to the TopHat external tool in UGENE.
    Temporary directoryThe directory for temporary files.
  4. Cufflinks Settings: The following page allows you to configure Cufflinks settings:

    The following parameters are available:

    ParameterDescription
    Reference annotationTells Cufflinks to use the supplied reference annotation to estimate isoform expression.
    RABT annotationUse the supplied reference annotation to guide Reference Annotation Based Transcript (RABT) assembly.
    Library typeSpecifies RNA-seq protocol.
    Mask fileIgnore all reads that could have come from transcripts in this file.
    Multi-read correctMore accurately weight reads mapping to multiple locations in the genome.
    Min isoform fractionFilters out transcripts with very low abundance.
    Frag bias correctProviding Cufflinks with a multifasta file instructs it to run the bias detection and correction algorithm.
    Pre-mRNA fractionFilters out alignments in intronic intervals implied by spliced alignments.
    Cufflinks tool pathThe path to the Cufflinks external tool in UGENE.
    Temporary directoryThe directory for temporary files.
  5. Cuffmerge Settings: On this page, you can modify Cuffmerge parameters:

    The following parameters are available:

    ParameterDescription
    Minimum isoform fractionDiscard isoforms with abundance below this.
    Reference annotationMerge the input assemblies together with this reference annotation.
    Reference sequenceThe genomic DNA sequences for the reference. Used to classify transfrags and exclude artifacts.
    Cuffcompare tool pathThe path to the Cuffcompare external tool in UGENE.
    Cuffmerge tool pathThe path to the Cuffmerge external tool in UGENE.
    Temporary directoryThe directory for temporary files.
  6. Cuffdiff Settings: On the following page, you may configure Cuffdiff settings:

    The following parameters are available:

    ParameterDescription
    Time series analysisInstructs Cuffdiff to analyze the samples as a time series.
    Upper quartile normNormalizes by the upper quartile of the number of fragments mapping to individual loci.
    Hits normInstructs how to count all fragments. Choosing Compatible is generally recommended to reduce bias.
    Frag bias correctProvides sequences to Cuffdiff for bias detection and correction.
    Multi-read correctMore accurately weight reads mapping to multiple locations in the genome.
    Library typeSpecifies RNA-seq protocol.
    Mask fileIgnore all reads that could have originated from transcripts in this file.
    Min alignment countThe minimum number of alignments in a locus necessary for significance testing.
    FDRAllowed false discovery rate used in testing.
    Max MLE iterationsSets the maximum number of iterations during maximum likelihood estimation of abundances.
    Emit count tablesInclude information about fragment counts and variances in the report.
    Cuffdiff tool pathThe path to the Cuffdiff external tool in UGENE.
    Temporary directoryThe directory for temporary files.
  7. Output Data: On this page, you can modify output parameters.

The work on this pipeline was supported by grant RUB1-31097-NO-12 from NIAID.