Variant Calling and Effect Prediction

The workflow sample described below calls variants for an input assembly and a reference sequence using SAMtools mpileup and bcftools. Additionally, it predicts the effects of the variants using SnpEff.

How to Use This Sample

If you haven’t used the workflow samples in UGENE before, refer to the “How to Use Sample Workflows” section of the documentation.

Workflow Sample Location

The workflow sample “Variant Calling and Effect Prediction” can be found in the “NGS” section of the Workflow Designer samples.

Workflow Image

The opened workflow appears as follows:

Workflow Wizard

The wizard consists of 7 pages:

  1. Input Reference Sequence and Assembly: On this page, input files must be set.

  2. SAMtools mpileup Parameters: You can change the SAMtools mpileup parameters here.

    The following parameters are available:

    ParameterDescription
    Count anomalous read pairsDo not skip anomalous read pairs in variant calling (mpileup) (-A).
    Disable BAQ computationDisable probabilistic realignment for BAQ computation. This helps to reduce false SNPs caused by misalignments. (-B)
    Mapping quality downgrading coefficientCoefficient for downgrading mapping quality for reads with excessive mismatches. The recommended value for BWA is 50. (-C)
    Max number of reads per input BAMMaximum number of reads per input BAM at a position (mpileup)(-d).
    Extended BAQ computationThis option helps sensitivity for MNPs, but may reduce specificity (-E).
    BED or position list fileList of regions or sites where pileup or BCF should be generated (-l).
    Pileup regionOnly generate pileup in the specified region STR (-r).
    Minimum mapping qualityMinimum mapping quality required for an alignment to be used (-q).
    Minimum base qualityMinimum base quality required for a base to be considered (-Q).
    Illumina-1.3+ encodingAssume quality is in Illumina 1.3+ encoding (-6).
    Gap extension errorPhred-scaled gap extension sequencing error probability. Reducing INT leads to longer indels (-e).
    Homopolymer errors coefficientCoefficient for modeling homopolymer errors. Sequencing error of an indel of size s is modeled as INT*s/l (-h).
    No INDELsDo not perform INDEL calling (-I).
    Max INDEL depthSkip INDEL calling if the average per-sample depth is above INT (-L).
    Gap open errorPhred-scaled gap open sequencing error probability. Reducing INT leads to more indel calls (-o).
    List of platforms for indelsList of platforms from which indel candidates are obtained. It is recommended to use sequencing technologies with low indel error rates like ILLUMINA (-P).
  3. SAMtools bcftools View Parameters: You can modify the SAMtools bcftools view parameters here.

    The following parameters are available:

    ParameterDescription
    Retain all possible alternateRetain all possible alternate alleles at variant sites. Default discards unlikely alleles.
    Indicate PLIndicate PL generated by r921 or before.
    No genotype informationSuppress all individual genotype information.
    A/C/G/T onlySkip sites where the REF field is not A/C/G/T.
    List of sitesList sites for which information is outputted.
    QCALL likelihoodOutput the QCALL likelihood format.
    List of samplesList of samples to use. The first column gives sample names, and the second gives ploidy (1 or 2).
    Min samples fractionSkip loci where the fraction of samples covered by reads is below FLOAT.
    Per-sample genotypesCall per-sample genotypes at variant sites.
    INDEL-to-SNP RatioRatio of INDEL-to-SNP mutation rate.
    Max p(refD)
    Prior allele frequency spectrumSTR can be full, cond2, flat, or a file from error output from a previous variant calling (bcf view) (-P).
    Mutation rateScaled mutation rate for variant calling (bcf view) (-t).
    Pair/trio callingEnable pair/trio calling. Use option -s to configure the trio members and order. Valid values: “pair”, “trioauto”, “trioxd”, “trioxs”.
    N group-1 samplesNumber of group-1 samples, used for dividing samples into two groups for contrast SNP calling or association test.
    N permutationsNumber of permutations for association test (effective only with -1).
    Max P(chi^2)Only perform permutations for P(chi^2).
  4. SAMtools vcfutils varFilter Parameters: Configure SAMtools vcfutils parameters on this page.

    The following parameters are available:

    ParameterDescription
    Log filteredPrint filtered variants into the log (varFilter) (-p).
    Minimum RMS qualityMinimum RMS mapping quality for SNPs (varFilter) (-Q).
    Minimum read depthMinimum read depth (varFilter) (-d).
    Maximum read depthMaximum read depth (varFilter) (-D).
    Alternate basesMinimum number of alternate bases (varFilter) (-a).
    Gap sizeSNP within INT bp around a gap to be filtered (varFilter) (-w).
    Window sizeWindow size for filtering adjacent gaps (varFilter) (-W).
    Strand biasMinimum P-value for strand bias given PV4 (varFilter) (-1).
    BaseQ biasMinimum P-value for baseQ bias (varFilter) (-2).
    MapQ biasMinimum P-value for mapQ bias (varFilter) (-3).
    End distance biasMinimum P-value for end distance bias (varFilter) (-4).
    HWEMinimum P-value for HWE with F<0 (varFilter) (-e).
  5. Change Chromosome Notation for Variations: Change chromosome notation for variations on this page.

    The following parameters are available:

    ParameterDescription
    Replace prefixesInput the list of chromosome prefixes to replace, e.g., “NC_000”. Separate different prefixes by semicolons.
    Replace byInput the prefix to set instead, e.g., “chr”.
  6. SnpEff Parameters: Configure SnpEff parameters on this page.

    The following parameters are available:

    ParameterDescription
    GenomeSelect the target genome. The genome data will be downloaded if not found.
    Canonical transcriptsUse only canonical transcripts.
    HGVS nomenclatureAnnotate using HGVS nomenclature.
    Annotate Loss of function variationsAnnotate Loss of function variations (LOF) and Nonsense mediated decay (NMD).
    Annotate TFBSs motifsAnnotate transcription factor binding site motifs (only for latest GRCh37).
    Upstream/downstream lengthUpstream and downstream interval size. Eliminate by using 0 length.
  7. Output Files Page: On this page, you can select output files.