Aligning Short Reads with BWA-MEM

When you select the Tools ‣ Align to reference ‣ Align short reads item in the main menu, the Align Sequencing Reads dialog appears. Set the value of the Align short reads method parameter to BWA-MEM. The dialog looks as follows:

The following parameters are available:

Reference sequence — DNA sequence to align short reads to. This parameter is required.

Result file name — file in SAM format to write the result of the alignment into. This parameter is required.

Prebuilt index — check this box to use an index file instead of a source reference sequence. Alternatively, you can build it manually.

SAM output — always save the output file in the SAM format (this option is disabled for BWA).

Short reads — each added short read is a small DNA sequence file. At least one read should be added.

You can also configure other parameters:

Index algorithm (-a) — algorithm for constructing BWA index.

It implements three different algorithms:

  • is — designed for short reads up to ~200bp with a low error rate (<3%). It performs gapped global alignment with respect to reads, supports paired-end reads, and is one of the fastest short read alignment algorithms to date, also visiting suboptimal hits.
  • bwtsw — designed for long reads with more errors. It performs heuristic Smith-Waterman-like alignment to find high-scoring local hits. This algorithm is implemented in BWA-SW. For low-error short queries, BWA-SW is slower and less accurate than the is algorithm, but for long reads, it is better.
  • div — does not work for long genomes.

Number of threads (-t) — number of threads.

Min seed length (-k) — minimum seed length. Matches shorter than INT will be missed. The alignment speed is usually insensitive to this value unless it significantly deviates from 20.

Band width (-w) — band width. Essentially, gaps longer than INT will not be found. Note that the maximum gap length is also affected by the scoring matrix and the hit length, not solely determined by this option.

Dropoff (-d) — off-diagonal X-dropoff (Z-dropoff). Stop extension when the difference between the best and the current extension score is above |i-j|*A+INT, where i and j are the current positions of the query and reference, respectively, and A is the matching score. Z-dropoff is similar to BLAST’s X-dropoff except that it doesn’t penalize gaps in one of the sequences in the alignment. Z-dropoff not only avoids unnecessary extension but also reduces poor alignments inside a long good alignment.

Internal seeds length (-r) — trigger reseeding for a MEM longer than minSeedLen*FLOAT. This is a key heuristic parameter for tuning performance. A larger value yields fewer seeds, which leads to faster alignment speed but lower accuracy.

Skip seeds threshold (-c) — discard a MEM if it occurs more than INT times in the genome. This is an insensitive parameter.

Drop chain threshold (-D) — drop chains shorter than a FLOAT fraction of the longest overlapping chain.

Rounds of mate rescues (-m) — perform at most INT rounds of mate rescues for each read.

Skip mate rescue (-S) — skip mate rescue.

Skip pairing (-P) — in paired-end mode, perform SW to rescue missing hits only but do not try to find hits that fit a proper pair.

Score for a match (-A) — matching score.

Mismatch penalty (-B) — mismatch penalty. The sequence error rate is approximately: {.75 * exp[-log(4) * B/A]}.

Gap open penalty (-O) — gap open penalty.

Gap extension penalty (-E) — gap extension penalty. A gap of length k costs O + k*E (i.e. Gap open penalty is for opening a zero-length gap).

Penalty for clipping (-L) — clipping penalty. When performing SW extension, BWA-MEM keeps track of the best score reaching the end of the query. If this score is larger than the best SW score minus the clipping penalty, clipping will not be applied. Note that in this case, the SAM AS tag reports the best SW score; the clipping penalty is not deducted.

Penalty unpaired (-U) — penalty for an unpaired read pair. BWA-MEM scores an unpaired read pair as scoreRead1+scoreRead2-INT and scores a paired read as scoreRead1+scoreRead2-insertPenalty. It compares these two scores to determine whether pairing should be forced.

Score threshold (-T) — don’t output alignment with a score lower than the score threshold. This option only affects output.

Select the required parameters and press the Start button.