Tutorial: Performing Local Sequence Alignment with Smith Waterman Algorithm

Smith Waterman algorithm

Here we discuss the most popular topics introduced by our users and show the helpful ways of using UGENE, a free cross-platform genome analysis suite.

Running Smith Waterman Algorithm

We are opening a haemorphilus sequence from this sample file. We are intended to search for a part of this sequence. Say, let's take this region into account. It's a terminator. First, we will copy the terminator sequence into clipboard. To do this, we select the terminator annotation by left-click, and then bring up context menu by right-click and select „Copy, Copy annotation sequence“.

Now we activate Smith-Waterman dialog box by right-clicking and selecting „Analyse Smith Waterman“ context menu items. We paste our sequence into Pattern text edit. There're several useful parameters we will dwell on. First of all, we could choose algorithm optimization option from classic, SSE2 or CUDA (since I have SSE2 processor and CUDA-compatible video card in my PC). The result will be the same, but the working time could speedup of up to 30X with CUDA variant and up to 20X with SSE2 variant. We'll select CUDA.

Further, we can select or view the scoring matrix. Since we work with DNA, there'is just one standard matrix. Numbers in matrix cells is the score for the combination of two corresponding bases. Also we can specify open and extension gap scores. The results filtering strategy includes „report results“ filtering option. If „filter intersections“ is selected, then the results are filtered by intersections, which could simplify work with the results that have a lots of almost identify regions. But we want to find all the results, so we choose „None“. Finally, we could specify „similarity threshold“ which is the minimal score of to regions needed to considered them as similar.

The next 3 panels are intended to specify what we search and where. We could search both in a Sequence or in aTranslation, in a complement or direct strand (or both) and specify a region of the sequence we'll search in.

The results will be presented as annotations, so we need to save them into existing or new file. Also we need to specify the result annotations group name and the annotations name.

View Output

When the search is done, the sequence view of a sequence being searched has new annotations. These are our results. As we see, we have 10 results. Their regions appear and colored with a new color at the panoramic view. Each annotation represents the region of the sequence similar to the sought-for sequence and has one qualifier. The qualifier represents score with which the region passed.

So, we have several regions similar to the sought-for term of the haemorphilus sequence with 60% minimal score.

Additional Materials

Documentation page

Youtube video