Assemble Transcripts with StringTie Element

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantify full-length transcripts representing multiple splice variants for each gene locus.

Element type: stringtie

Parameters

ParameterDescriptionDefault valueParameter in Workflow FileType
Reference annotationsUse the reference annotation file (in GTF or GFF3 format) to guide the assembly process (-G). The output will include expressed reference transcripts as well as any novel transcripts that are assembled.reference-annotationsstring
Reads orientationSelect the NGS library type: unstranded, stranded fr-secondstrand (–fr), or stranded fr-firststrand (–rf).Unstrandedreads-orientationstring
LabelUse the specified string as the prefix for the name of the output transcripts (-l).STRGlabelstring
Min isoform fractionSpecify the minimum isoform abundance of the predicted transcripts as a fraction of the most abundant transcript assembled at a given locus (-f). Lower abundance transcripts are often artifacts of incompletely spliced precursors of processed transcripts.0.1min-isoform-fractionnumeric
Min assembled transcript lengthSpecify the minimum length for the predicted transcripts (-m).200min-isoform-fractionnumeric
Min anchor length for junctionsJunctions that don’t have spliced reads that align them with at least this amount of bases on both sides are filtered out (-a).10min-anchor-lengthnumeric
Min junction coverageThere should be at least this many spliced reads that align across a junction (-j). This number can be fractional since some reads align in more than one place. A read that aligns in n places will contribute 1/n to the junction coverage.1min-junction-coveragenumeric
Trim transcripts based on coverageBy default, StringTie adjusts the predicted transcript’s start and/or stop coordinates based on sudden drops in coverage. Set to “False” to disable trimming (-t).Truetrim-transcriptsbool
Min coverage for assembled transcriptsSpecifies the minimum read coverage allowed for the predicted transcripts (-c). A transcript with lower coverage than this value is not shown in the output. This number can be fractional.2.5min-coveragenumeric
Min locus gap separationReads mapped closer than this distance are merged into the same processing bundle (-g).50 bpmin-locus-gapnumeric
Fraction covered by multi-hit readsMax fraction of multi-mapped reads allowed at a locus (-M). A read aligning in n places contributes 1/n to coverage.0.95multi-hit-fractionnumeric
Skip assembling for sequencesIgnore all read alignments for the specified reference sequences (-x). Useful for skipping mitochondrial genome, etc. Case sensitive.skip-sequencesstring
Multi-mapping correctionEnables or disables multi-mapping correction (-u).Enabledmulti-mapping-correctionbool
Verbose logEnable detailed logging (-v). Messages go to UGENE log and dashboard.Falseverbose-logbool
Number of threadsNumber of processing threads to use (-p).8threadsnumeric
Output transcripts filePrimary GTF output file with assembled transcripts.Autotranscripts-output-urlstring
Enable gene abundance outputGenerate gene abundances file (-A). The file URL is passed to an output slot.Falsegene-abundance-outputbool

Input/Output Ports

The element has 1 input port:

Name in GUI: Input BAM file(s)
Name in Workflow File: in
Slots:

Slot in GUISlot in Workflow FileType
Source URLurlstring

And 1 output port:

Name in GUI: StringTie output data
Name in Workflow File: out
Slots:

Slot in GUISlot in Workflow FileType
Output URLurlstring