Raw ChIP-Seq Data Processing

The component for ChIP-seq data analysis is not installed by default. To use this sample, add the component via the UGENE Online Installer or, if you used an offline installer, manually configure the package. See the “Configure ChIP-seq Analysis Data” chapter of the manual.

Use this workflow sample to process raw ChIP-seq next-generation sequencing (NGS) data from the Illumina platform. The processing includes:

  • Filtration:
    • Filtering of the NGS short reads by the CASAVA 1.8 header;
    • Trimming of the short reads by quality;
  • Mapping:
    • Mapping of the short reads to the specified reference sequence (the BWA-MEM tool is used in the sample);
  • Post-filtration:
    • Filtering of the aligned short reads by SAMtools to remove reads with low mapping quality, unpaired/unaligned reads;
    • Removing of duplicated short reads.

The result of the data processing is provided in the BED format. Intermediate data files from the filtration and mapping steps are also available in the output.

How to Use This Sample

If you haven’t used the workflow samples in UGENE before, look at the “How to Use Sample Workflows” section of the documentation.

Workflow Sample Location

The workflow sample “Raw ChIP-Seq Processing” can be found in the “NGS” section of the Workflow Designer samples.

Workflow Image

There are two versions of the workflow available. The workflow for single-end reads looks as follows:

The workflow for paired-end short reads appears as follows:

Workflow Wizard

The workflows have similar wizards. The wizard for paired-end reads has 5 pages.

  1. Input Data: On this page, you must input FASTQ file(s).

  2. Pre-processing: On this page, you can modify filtration parameters.

    The following parameters are available for reads and reads pairs filtration:

    ParameterDescription
    Base qualityQuality threshold for trimming.
    Reads lengthToo short reads are discarded by the filter.
    Trim both endsTrim both ends of a read or not. Usually, set True for Sanger sequencing and False for NGS.
    3’ adaptersA FASTA file with one or multiple sequences of adapter that were ligated to the 3’ end. The adapter itself and anything that follows is trimmed. If the adapter sequence ends with the ‘$’ character, the adapter is anchored to the end of the read and only found if it is a suffix of the read.
    5’ adaptersA FASTA file with one or multiple sequences of adapters that were ligated to the 5’ end. If the adapter sequence starts with the character ‘^’, the adapter is ‘anchored’. An anchored adapter must appear in its entirety at the 5’ end of the read (it is a prefix of the read). A non-anchored adapter may appear partially at the 5’ end or may occur within the read. If it is found within a read, the sequence preceding the adapter is also trimmed. In all cases, the adapter itself is trimmed.
    5’ and 3’ adaptersA FASTA file with one or multiple sequences of adapter that were ligated to the 5’ end or 3’ end.
  3. Mapping: On this page, you must input a reference and optionally modify advanced parameters.

    The following parameters are available:

    ParameterDescription
    Reference genomePath to indexed reference genome.
    Number of threadsNumber of threads (-t).
    Min seed lengthMin seed length (-k).
    Band widthBand width for banded alignment (-w).
    DropoffOff-diagonal X-dropoff (-d).
    Internal seed lengthLook for internal seeds inside a seed longer than {-k} (-r).
    Skip seed thresholdSkip seeds with more than INT occurrences (-c).
    Drop chain thresholdDrop chains shorter than FLOAT fraction of the longest overlapping chain (-D).
    Rounds of mate rescuesPerform at most INT rounds of mate rescues for each read (-m).
    Skip mate rescueSkip mate rescue (-S).
    Skip pairingSkip pairing; mate rescue performed unless -S also in use (-P).
    Mismatch penaltyPenalty for a mismatch (-B).
    Gap open penaltyGap open penalty (-O).
    Gap extension penaltyGap extension penalty; a gap of size k costs {-O} (-E).
    Penalty for clippingPenalty for clipping (-L).
    Penalty unpairedPenalty for an unpaired read pair (-U).
    Score thresholdMinimum score to output (-T).
  4. Post-processing: On this page, you can modify post-processing parameters.

    The following parameters are available:

    ParameterDescription
    MAPQ thresholdMinimum MAPQ quality score.
    Skip flagSkip alignment with the selected items. Select the items in the combobox to configure the bit flag. Do not select the items to avoid filtration by this parameter.
    RegionRegions to filter. For BAM output only. Use ‘chr2’ to output the whole chr2. ‘[chr2:1000]’ to output regions of chr2 starting from 1000. ‘[chr2:1000-2000]’ to output regions of chr2 between 1000 and 2000 including the endpoint. To input multiple regions, use the space separator (e.g. chr1 chr2 [chr3:1000-2000]).
    For single-end readsRemove duplicates for single-end reads.
  5. Output Data: On this page, you must input output parameters.