Find Group of Annotated Regions

The Find Group of Annotated Regions feature provides an algorithm to search for sequence regions that contain a predefined set of annotations.

Open a DNA sequence in the Sequence View. There are two ways to open the Find Repeats dialog:

  1. By clicking on the toolbar button:

  2. By selecting the Analyze ‣ Find annotated regions… context menu item:

Algorithm

This tool has been designed to search for annotations that intersect (or completely overlap—it depends on the specified parameters) other, already existing annotations of a given sequence. Let’s look at the example:

We have a sequence with two annotations. The annotations have different lengths and do not intersect each other. The length of annotation 2 is four times that of annotation 1 (41 vs. 11 bases).

Using this function, we can find an annotation that intersects both source annotations and captures their parts depending on their lengths. For example, let us find an intersection 25 bases long. We will have the following annotation:

As we can see, the intersection with the first annotation is two characters long, and the intersection with the second annotation is eight characters long. This result was chosen because the second annotation is four times the length of the first annotation.

NOTE: A good candidate for this feature could be any file in GenBank format with a rich set of annotations. FASTA is not the best option because this format does not store annotations.

Parameters

The following parameters are available:

  • Left window - annotations to search intersection regions for.
  • Right window - the list of possible intersection regions.
  • Region size - the length of the new intersection region.
  • Result strand - select the DNA strand whose annotations will be considered in the search. If, for example, the “Complement” strand is selected, but all chosen annotations are on the direct strand, then nothing will be found.
  • Annotation must fit into region - all annotations chosen at the left window must fit completely into the result annotation (not just in a few characters).
  • Save regions as annotations - store results as annotations.
  • Clear results - clear the result table.