ChIP-seq Analysis with Cistrome Tools

Attention! Cistrome Tools was removed in UGENE 42.0 version.

The component for ChIP-seq data analysis is not installed by default. To use this sample, add the component via the UGENE Online Installer or, if you used an offline installer, manually configure the package, see the “Configure ChIP-Seq Analysis Data” chapter of the manual (available until UGENE 39.version).

The ChIP-seq pipeline “Cistrome” integrated into UGENE allows the following analysis steps:

  • Peak calling and annotating
  • Motif search
  • Gene ontology analysis

ChIP-seq analysis is initiated with the MACS tool. CEAS then takes peak regions and signal wiggle file to:

  • Check chromosome enrichment
  • Identify binding significance at features (promoters, gene bodies, etc.)
  • Calculate signal aggregation at TSS/TTS or metagene bodies

Then peaks are investigated for:

  1. Conservation scores at binding sites
  2. DNA motifs at binding sites

This is based on the General ChIP-seq pipeline from Cistrome on Galaxy.

How to Use This Sample

If you haven’t used workflow samples in UGENE before, see “How to Use Sample Workflows”.

Workflow Sample Location

Available in the “NGS” section of the Workflow Designer samples.

Workflow Images

Treatment tags only analysis:

Treatment and control tags analysis:

Workflow Wizard

The wizard has 7 pages for both workflow types.

Page 1: Input data

Input files for treatment and control annotations (MACS input).

Page 2: MACS

MACS parameters:

ParameterDescription
Genome sizee.g., Human: 2,700 Mbp, Mouse: 1,870 Mbp, etc.
P-valueDefault 0.00001. Looser: 0.001
Tag sizeOptional. Input 0 to auto-detect.
Keep duplicatesOptions: auto, all, or number.
Use modelUse MACS paired peaks model.
Model foldFold range to build the paired peaks model.
Wiggle outputStore fragment pileup in wiggle format.
Wiggle spaceWiggle resolution (default: 10 bp).
Shift sizeUsed if no model; half of fragment size.
Band widthScan width for model building.
Use lambdaUse local lambda model to reduce false positives.
Small nearby regionRegion (bp) near peaks for local lambda.
Auto bimodalEnable fallback to shift-size model if auto model fails.
Scale to largeScale smaller sample up to match the larger one.

Page 3: CEAS

CEAS parameters:

ParameterDescription
Gene annotations tablee.g., refGene table in SQLite format
Span sizeTSS/TTS window for promoter/downstream (in bp)
Wiggle profiling resolutionMust be ≥ wiggle interval
Promoter intervals3 values or 1 (split into 3)
BiPromoter ranges2 values or 1 (split into 2)
Relative distanceTo TSS/TTS in WIGGLE profiling
Gene group filesCSV with gene names in first column
Gene group namesOptional; comma-separated (e.g., Group 1,Group 2)

Page 4: Peak2Gene and GO

ParameterDescription
Output typeOutput directory
Official gene symbolsOutput gene symbols instead of RefSeq
DistanceRadius (in bp) from peak center
Genome fileSQLite genome reference
TitleUsed for output filenames
Gene universeDefine universe for GO analysis

Page 5: Conservation Plot

ParameterDescription
TitleTitle of figure
LabelLabel for data in figure
Assembly versionDirectory for phastCons scores
Window widthWidth around binding site (bp)
HeightPlot height
WidthPlot width

Page 6: SeqPos Motif Tool

ParameterDescription
Genome versionUCSC database version
De novo motifsEnable/disable de novo motif search
Motif databaseKnown motif collection
Region widthRegion width for scanning motifs
P-value cutoffSignificance threshold for motifs

Page 7: Output Data

MACS output:

ParameterDescription
Output directoryFolder for MACS output
NamePrefix for output files (e.g., NAME_peaks.xls, NAME_peaks.bed, etc.)

CEAS output:

ParameterDescription
Output report fileCEAS result report
Output annotations fileTab-delimited file per RefSeq gene

Conservation Plot output:

ParameterDescription
Output fileBMP image with phastCons scores

SeqPos motif tool output:

ParameterDescription
Output directoryFolder for results
Output file nameFile storing de novo found motifs

Peak2Gene output:

ParameterDescription
Gene annotationsOutput path for gene annotation data
Peak annotationsOutput path for peak annotation data

Conduct GO output:

ParameterDescription
Output directoryFolder for GO analysis results

The work on this pipeline was supported by grant RUB1-31097-NO-12 from NIAID.