Gene-by-Gene Report
Task Name: gene-by-gene
Suppose you have genomes and you want to characterize them. One way to do this is to build a table showing which genes are present in each genome and which are not.
- Create a local BLAST database of your genome sequence/contigs. One database per genome.
- Create a file with sequences of the genes you want to explore. This file will be the input file for the scheme.
- Set up the location and name of the BLAST database you created for the first genome.
- Set up output files: the report location and output file with annotated (via BLAST) sequences. You might want to delete the “Write Sequence” element if you do not need output sequences.
- Run the scheme. 5*. Run the scheme on the same input and output files, changing the BLAST database for each genome that you have.
As a result, you will get a report file with “Yes” and “No” fields. A “Yes” indicates that the gene is present in the genome. A “No” might indicate that the gene is not present in the genome. It is a good idea to analyze all the “No” sequences using annotated files. Simply open a file and find a sequence with the name of a gene that has a “No” result.
Parameters:
- in: Input sequence file [URL datasets]
- final-name: Annotation name used to compare genes and reference genomes (using ‘blast_result’ by default) [String]
- exist-file: If a target report already exists, specify how to handle it. Merge two tables into one, overwrite, or rename the existing file (using ‘Merge’ by default) [String]
- ident: Identity between gene sequence length and annotation length in percent. BLAST identity (if specified) is checked after (using ‘90.0’ percent by default) [Number]
- out: Output report file [String]
- blast-out: Location of BLAST output file [String]
- search-type: Type of BLAST searches (using ‘blastn’ by default) [String]
- db-name: Name of BLAST DB [String]
- blast-path: Path to BLAST DB [String]
- expected-value: This setting specifies the statistical significance threshold for reporting matches against database sequences (using ‘10.0’ by default) [Number]
- gapped-aln: Perform gapped alignment (using ‘use’ by default) [Boolean]
- blast-name: Name for annotations (using ‘blast_result’ by default) [String]
- tmpdir: Directory for temporary files (using UGENE temporary directory by default) [String]
- toolpath: External tool path (using the path specified in UGENE by default) [String]
- out-type: Type of BLAST output file (using ‘XML (-m 7)’ by default) [String]
Example:
ugene gene-by-gene --in=human_T1.fa --out=human_T1_report