Nuclize: nucleosome-based histone modification caller

Contents

Overview
Building indexes for mapped tags
Building indexes for called peaks from other programs
Calling nucleosome peaks
Calling histone modification peaks
Comparing histone modification peaks between samples

1. Overview

NUCLIZE has five computational modules, including 1) tags indexing module (indexTag), 2) peaks indexing module (indexPeak), 3) nucleosome peak calling module, (callNuc), 4) histone modification calling module (callHis), and 5) differential histone modification analytical module (diffHis). Run the command nuclize -h will print a basic introduction of nuclize. Here is the overview about the applications of NUCLIZE.

SYNOPSIS:

Usage:
         nuclize <Command> [Options]

Command: indexTag         building indexes for mapped tags prior to subsequent analysis
         indexPeak        building indexes for external nucleosome peaks
         callNuc          calling nucleosome peaks
         callHis          calling histone variants and modification peaks
         diffHis          comparing histone variants and modification peaks between samples

For more help information, please use: nuclize <Command> -h

NUCLIZE includes five computational modules, and each analytical module could be executed by nuclize <Command> options. To run each module, you could use nuclize <Command> or nuclize <Command> -h. For example, you could run the following command to get the full options information of indexTag.

nuclize indexTag -h

2. Building indexes for mapped tags

indexTag module is written for the generation of indexes for mapped tags from sequencing reads by short reads aligners, such as bowtie, bwa, and so on. The mapped tags could be in the format of sam, bam (could be converted to sam format), or bed. Mapped reads from the sequencing data of nucleosomes, histone variants and modification, and input samples need to be indexed before employing to other computational modules in NUCLIZE.

SYNOPSIS:

Usage:
         nuclize indexTag [Options]

This module is designed to index mapped tags(reads).

Options: -i    String     the file path of the data to be indexed
         -f    String     the format of the input data
                          bed  - BED format file:
                                 1:chr, 2:start, 3:end, 4:name, 5:score, 6:strand(+/-)
                                 if use -d pe, name should be ended with /1 or /2, and they should be next to each other in line
                          sam  - SAM format:
                                 For BAM file, please convert BAM to SAM file with samTools:
                                   try: samtools view -S XXX.BAM > XXX.SAM
         -d    String     the data type of the input data
                          pe   - the input data are paired-end tags
                          se   - the input data are single-end tags
                                 [default: se]
         -o    String     the output directory for indexed data
         -m    String     the way to deal with mapped tags(only works for: -f sam):
                          unique  - Only keep uniquely mapped tags
                          one     - Only keep the first mapped tags whether it is uniquely mapped or not
                          primary - keep the primary alignment (Filter by SAM Flag 0x100)
                          all     - keep all mapped tags
                                    [default: unique]
         -q    int        filter mapped tag by #MAPQ value in SAM file(only works for: -f sam)[Default: 10]
         -5    int        extend #bp on the 5' end(only works for: -f sam)
                          if #bp are trimmed from the tags' 5' ends during alignment.[Default: 0]
         -3    int        extend #bp on the 3' end(only works for: -f sam)
                          if #bp are trimmed from the tags' 3' ends during alignment.[Default: 0]
         -r               remove all potential PCR duplicates and "clonal reads"
                          [Default: DO NOT remove duplicates]
         -B    int        Data IO buffer size. [Default: 100000]
         -h               print this message

3. Building indexes for called peaks from other programs

indexPeak module is written to import called peaks (either nucleosome peaks or histone modification peaks) from other nucleosome peak callers or ChIP-Seq peak callers. 1) For nucleosome peaks: In callHis module, nucleosome peaks are required for histone modification peaks calling. NUCLIZE not only has the module callNuc to call nucleosome peaks before running callHis, but can also use imported nucleosome peaks called by other programs. 2) For histone modification peaks: users could import called histone modification peaks from other programs, and then perform the comparison on histone modification peaks from different samples using the module diffHis in NUCLIZE.

SYNOPSIS:

Usage:
         nuclize indexPeak [Options]

This module is designed to index imported peak data from other programs.

Options: -i    String     the file path of the data to be indexed
         -f    String     the column order for peak information: chr, start position, end position
                          example: 2:3:6, which means chr, start osition, end position locate at 2nd, 3rd and 6th columns respectively.
                          the column order begins with 1, and each column in the file should be separated by tab(\t)
                          [default: 1:2:3], which means the first three columns are chr, start, end
         -o    String     the output directory for indexed peak
         -B    int        data IO buffer size. [Default: 100000]
         -h               print this message

4. Calling nucleosome peaks

callNuc ***********************************************

SYNOPSIS:

Usage:
         nuclize callNuc [Options]

Options: -i    String     the directory for indexed nucleosome tags(reads) from MNase-seq.
         -o    String     the directory for output result
         -t    Int        number of threads, default 0: use all cpus
         -h               print this message

5. Calling histone modification peaks

callHis ***********************************************

SYNOPSIS:

Usage:
         nuclize callHis [Options]

Options: -e    String     The directory for indexed histone modification tags
         -b    String     The directory for indexed backgound data, also known as input
         -n    String     The directory for indexed nucleosome peaks
         -o    String     The output directory for result
         -t    Int        The number of threads, default 0: use all cpus
         -h               Print this message

6. Comparing histone modification peaks between samples

diffHis ***********************************************

SYNOPSIS:

Usage:
         nuclize diffHis [Options]

Options: -d    String     The directory for called histone peaks in sample 1
         -D    String     The directory for called histone peaks in sample 2
         -e    String     The directory for indexed histone modification tags in sample 1
         -E    String     The directory for indexed histone modification tags in sample 2
         -o    String     The output directory
         -s    Int        The max offset on the summit for the same peak.
         -1    String     The track name for the sample 1, such as Cancer_H3K9me3
                          if not assigned, the default name 'sample 1' will be used
         -2    String     The track name for the sample 2, such as Normal_H3K9me3
                          if not assigned, the default name 'sample 2' will be used
         -t    Int        Number of threads, default 0: use all cpus
         -h               Print this message