Assemble Transcripts with StringTie Element
StringTie is a fast and highly efficient assembler of RNA-Seq alignments into potential transcripts. It uses a novel network flow algorithm as well as an optional de novo assembly step to assemble and quantify full-length transcripts representing multiple splice variants for each gene locus.
Element type: stringtie
Parameters
Parameter | Description | Default value | Parameter in Workflow File | Type |
---|---|---|---|---|
Reference annotations | Use the reference annotation file (in GTF or GFF3 format) to guide the assembly process (-G). The output will include expressed reference transcripts as well as any novel transcripts that are assembled. | reference-annotations | string | |
Reads orientation | Select the NGS library type: unstranded, stranded fr-secondstrand (–fr), or stranded fr-firststrand (–rf). | Unstranded | reads-orientation | string |
Label | Use the specified string as the prefix for the name of the output transcripts (-l). | STRG | label | string |
Min isoform fraction | Specify the minimum isoform abundance of the predicted transcripts as a fraction of the most abundant transcript assembled at a given locus (-f). Lower abundance transcripts are often artifacts of incompletely spliced precursors of processed transcripts. | 0.1 | min-isoform-fraction | numeric |
Min assembled transcript length | Specify the minimum length for the predicted transcripts (-m). | 200 | min-isoform-fraction | numeric |
Min anchor length for junctions | Junctions that don’t have spliced reads that align them with at least this amount of bases on both sides are filtered out (-a). | 10 | min-anchor-length | numeric |
Min junction coverage | There should be at least this many spliced reads that align across a junction (-j). This number can be fractional since some reads align in more than one place. A read that aligns in n places will contribute 1/n to the junction coverage. | 1 | min-junction-coverage | numeric |
Trim transcripts based on coverage | By default, StringTie adjusts the predicted transcript’s start and/or stop coordinates based on sudden drops in coverage. Set to “False” to disable trimming (-t). | True | trim-transcripts | bool |
Min coverage for assembled transcripts | Specifies the minimum read coverage allowed for the predicted transcripts (-c). A transcript with lower coverage than this value is not shown in the output. This number can be fractional. | 2.5 | min-coverage | numeric |
Min locus gap separation | Reads mapped closer than this distance are merged into the same processing bundle (-g). | 50 bp | min-locus-gap | numeric |
Fraction covered by multi-hit reads | Max fraction of multi-mapped reads allowed at a locus (-M). A read aligning in n places contributes 1/n to coverage. | 0.95 | multi-hit-fraction | numeric |
Skip assembling for sequences | Ignore all read alignments for the specified reference sequences (-x). Useful for skipping mitochondrial genome, etc. Case sensitive. | skip-sequences | string | |
Multi-mapping correction | Enables or disables multi-mapping correction (-u). | Enabled | multi-mapping-correction | bool |
Verbose log | Enable detailed logging (-v). Messages go to UGENE log and dashboard. | False | verbose-log | bool |
Number of threads | Number of processing threads to use (-p). | 8 | threads | numeric |
Output transcripts file | Primary GTF output file with assembled transcripts. | Auto | transcripts-output-url | string |
Enable gene abundance output | Generate gene abundances file (-A). The file URL is passed to an output slot. | False | gene-abundance-output | bool |
Input/Output Ports
The element has 1 input port:
Name in GUI: Input BAM file(s)
Name in Workflow File: in
Slots:
Slot in GUI | Slot in Workflow File | Type |
---|---|---|
Source URL | url | string |
And 1 output port:
Name in GUI: StringTie output data
Name in Workflow File: out
Slots:
Slot in GUI | Slot in Workflow File | Type |
---|---|---|
Output URL | url | string |