Assemble Transcripts with StringTie Element

StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantify full-length transcripts representing multiple splice variants for each gene locus.

Element type: stringtie

Parameters

Parameter	Description	Default value	Parameter in Workflow File	Type
Reference annotations	Use the reference annotation file (in GTF or GFF3 format) to guide the assembly process (-G). The output will include expressed reference transcripts as well as any novel transcripts that are assembled.		reference-annotations	string
Reads orientation	Select the NGS library type: unstranded, stranded fr-secondstrand (–fr), or stranded fr-firststrand (–rf).	Unstranded	reads-orientation	string
Label	Use the specified string as the prefix for the name of the output transcripts (-l).	STRG	label	string
Min isoform fraction	Specify the minimum isoform abundance of the predicted transcripts as a fraction of the most abundant transcript assembled at a given locus (-f). Lower abundance transcripts are often artifacts of incompletely spliced precursors of processed transcripts.	0.1	min-isoform-fraction	numeric
Min assembled transcript length	Specify the minimum length for the predicted transcripts (-m).	200	min-isoform-fraction	numeric
Min anchor length for junctions	Junctions that don’t have spliced reads that align them with at least this amount of bases on both sides are filtered out (-a).	10	min-anchor-length	numeric
Min junction coverage	There should be at least this many spliced reads that align across a junction (-j). This number can be fractional since some reads align in more than one place. A read that aligns in n places will contribute 1/n to the junction coverage.	1	min-junction-coverage	numeric
Trim transcripts based on coverage	By default, StringTie adjusts the predicted transcript’s start and/or stop coordinates based on sudden drops in coverage. Set to “False” to disable trimming (-t).	True	trim-transcripts	bool
Min coverage for assembled transcripts	Specifies the minimum read coverage allowed for the predicted transcripts (-c). A transcript with lower coverage than this value is not shown in the output. This number can be fractional.	2.5	min-coverage	numeric
Min locus gap separation	Reads mapped closer than this distance are merged into the same processing bundle (-g).	50 bp	min-locus-gap	numeric
Fraction covered by multi-hit reads	Max fraction of multi-mapped reads allowed at a locus (-M). A read aligning in n places contributes 1/n to coverage.	0.95	multi-hit-fraction	numeric
Skip assembling for sequences	Ignore all read alignments for the specified reference sequences (-x). Useful for skipping mitochondrial genome, etc. Case sensitive.		skip-sequences	string
Multi-mapping correction	Enables or disables multi-mapping correction (-u).	Enabled	multi-mapping-correction	bool
Verbose log	Enable detailed logging (-v). Messages go to UGENE log and dashboard.	False	verbose-log	bool
Number of threads	Number of processing threads to use (-p).	8	threads	numeric
Output transcripts file	Primary GTF output file with assembled transcripts.	Auto	transcripts-output-url	string
Enable gene abundance output	Generate gene abundances file (-A). The file URL is passed to an output slot.	False	gene-abundance-output	bool

Input/Output Ports

The element has 1 input port:

Name in GUI: Input BAM file(s)
Name in Workflow File: in
Slots:

Slot in GUI	Slot in Workflow File	Type
Source URL	url	string

And 1 output port:

Name in GUI: StringTie output data
Name in Workflow File: out
Slots:

Slot in GUI	Slot in Workflow File	Type
Output URL	url	string