This lesson is still being designed and assembled (Pre-Alpha version)

Peak calling with MACS2

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • Key question (FIXME)

Objectives
  • First learning objective. (FIXME)

For this lesson we’re doing to do some peak calling using some software called macs2. Let’s keep using the best practices we learned in the last lesson and run these commands on a “compute node” rather than the “login node”.

qlogin
module load macs2/2.1.2
macs2 callpeak --help

Let’s map the ChIP samples first. We need to make an output directory for the files, then run macs2 with the -t, -c -n, -f, -g, -q and --outdir parameters

mkdir peak_calls
mkdir peak_calls/macs2
macs2 callpeak -t aligned_reads/G1E_ChIP_Gata2_0h.chr11.sorted.bam \
-c aligned_reads/G1E_ChIP_Input_0h.chr11.sorted.bam -n G1E_ChIP_Gata2_0h \
-f BAM -g 1.87e9 -q 0.1 --outdir peak_calls/macs2
ls -l peak_calls/macs2

Once macs2 has finished running, we should have four new files in our output directory.

-rw-r--r-- 1 user group  88633 May 12 12:06 G1E_ChIP_Gata2_0h_model.r
-rw-r--r-- 1 user group 135241 May 12 12:06 G1E_ChIP_Gata2_0h_peaks.narrowPeak
-rw-r--r-- 1 user group 151285 May 12 12:06 G1E_ChIP_Gata2_0h_peaks.xls
-rw-r--r-- 1 user group  93538 May 12 12:06 G1E_ChIP_Gata2_0h_summits.bed

We can see that the “.narrowPeak” file is in BED format with a few extra columns. You can read about this file format at UCSC.

head peak_calls/macs2/G1E_ChIP_Gata2_0h_peaks.narrowPeak
chr11	3104926	3105658	G1E_ChIP_Gata2_0h_peak_1	998	.	11.03180	103.14955	99.81662	416
chr11	3105932	3106226	G1E_ChIP_Gata2_0h_peak_2	61	.	3.02292	8.60469	6.17999	159
chr11	3201865	3202318	G1E_ChIP_Gata2_0h_peak_3	105	.	3.96565	13.13776	10.59181	374
chr11	3279535	3280209	G1E_ChIP_Gata2_0h_peak_4	628	.	16.61709	65.97417	62.87525	400
chr11	3291562	3291949	G1E_ChIP_Gata2_0h_peak_5	102	.	4.20686	12.83781	10.29883	159
chr11	3313151	3313768	G1E_ChIP_Gata2_0h_peak_6	1158	.	24.21939	119.31017	115.88039	324
chr11	3379576	3379870	G1E_ChIP_Gata2_0h_peak_7	55	.	5.12864	7.99220	5.58688	90
chr11	3434737	3435288	G1E_ChIP_Gata2_0h_peak_8	175	.	9.95468	20.25646	17.57889	272
chr11	3446439	3446733	G1E_ChIP_Gata2_0h_peak_9	46	.	4.69043	6.99010	4.62133	264
chr11	4086358	4086652	G1E_ChIP_Gata2_0h_peak_10	49	.	4.89368	7.31591	4.93340	69
macs2 callpeak -t aligned_reads/G1E_ChIP_Gata1_24h.chr11.sorted.bam \
-c aligned_reads/G1E_ChIP_Input_24h.chr11.sorted.bam -n G1E_ChIP_Gata1_24h \
-f BAM -g 1.87e9 -q 0.1 --outdir peak_calls/macs2
wc -l peak_calls/macs2/*.narrowPeak
   9 peak_calls/macs2/G1E_ChIP_Gata1_24h_peaks.narrowPeak
1548 peak_calls/macs2/G1E_ChIP_Gata2_0h_peaks.narrowPeak
1557 total

The Gata1 file has only 9 peaks. We should examine them somehow to see what’s going on. We can upload them to the UCSC genome browser, but first we need to download them to our local computer, which we can do using scp. We need the absolute path to the file we need to download, which we can get using a combination of ls and pwd.

Once we have the absolute path, we can open another terminal and download the files like this:

scp username@cluster.com:/home/user/chip-tutorial/peak_calls/macs2/*.narrowPeak Downloads/

Then we copy and paste the data into UCSC. Open the genome browser and go to “add custom tracks”. We need to paste our data into the text box, with a track line first. For Gata1, this might look like:

track type=narrowPeak name="Gata1-ChIP" description="ChIP for Gata1 in 24h induced G1E cells"

We can see the reason macs2 hasn’t called many peaks for Gata1 is that it looks very similar to the control, input track (i.e. it’s not a very good ChIP).

Now we can peak call the ATAC-seq datasets. We need to make two changes to the command for these samples because there’s no control track and because they are paired end files.

macs2 callpeak -t aligned_reads/G1E_ATAC_0h_rep1.chr11.sorted.bam \
-n G1E_ATAC_0h_rep1 -f BAMPE -g 1.87e9 -q 0.1 --outdir peak_calls/macs2

Let’s write a batch script to peak call all of the ATAC-seq files. First log out of the compute node using Ctrl+D

Peak calling the rest of the ATAC-seq files.

Open a new bash script called peak_call_atac.sh and use the template below to write a script for submission to the cluster that will peak call all of the ATAC-seq files.

#$ -cwd
#$ -m ea

for ____
do
  macs2 callpeak ____ -n $(basename $bam | cut -d "." -f 1)
done

Solution

#$ -cwd
#$ -m ea

for bam in aligned_reads/G1E_ATAC_*.sorted.bam
do
 macs2 callpeak -t $bam -n $(basename $bam | cut -d "." -f 1) \
 -f BAMPE -g 1.87e9 -q 0.1 --outdir peak_calls/macs2
done

Key Points

  • First key point. Brief Answer to questions. (FIXME)