This lesson is still being designed and assembled (Pre-Alpha version)

Visualising aligned reads

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • Key question (FIXME)

Objectives
  • First learning objective. (FIXME)

To make sure we’re all working with the same set of files, lets download some bam files and overwrite the ones we made in the last lesson:

cd ..
rm -rf chip-tutorial/aligned_reads
wget https://rob.beagrie.com/media/chip-tutorial/chip-tutorial-files2.tar.gz
tar zxvf chip-tutorial-files2.tar.gz
ls -l chip-tutorial/aligned_reads
-rw-r--r-- 1 user group  84893348 May  5 15:27 G1E_ATAC_0h_rep1.chr11.bam
-rw-r--r-- 1 user group 104745615 May  5 15:27 G1E_ATAC_0h_rep2.chr11.bam
-rw-r--r-- 1 user group 104157142 May  5 15:27 G1E_ATAC_30h_rep1.chr11.bam
-rw-r--r-- 1 user group   9254482 May  5 15:27 G1E_ATAC_30h_rep2.chr11.bam
-rw-r--r-- 1 user group  48952433 May  5 15:27 G1E_ChIP_Gata1_24h.chr11.bam
-rw-r--r-- 1 user group  20428179 May  5 15:27 G1E_ChIP_Gata2_0h.chr11.bam
-rw-r--r-- 1 user group  47261386 May  5 15:27 G1E_ChIP_Input_0h.chr11.bam
-rw-r--r-- 1 user group  37927946 May  5 15:27 G1E_ChIP_Input_24h.chr11.bam

We now need to make a new file that tells us how many reads there are mapping to any given genomic position. One very common type of such file is a BigWig. We can make these using a set of software called deeptools, which we need to load on the cluster just as we did for bowtie2:

module avail
module load deeptools/3.0.1
bamCoverage --help

We can get a detailed description of the options on the bamCoverage help page.

The command we’ll want to use to map our ChIP and ATAC-seq files is:

bamCoverage -b aligned_reads/G1E_ATAC_0h_rep1.bam --binSize 1 --blackListFileName annotation/mm10-blacklist.ENCSR636HFF.v2.bed --normalizeUsing RPKM --extendReads -o bigwigs/G1E_ATAC_0h_rep1.bw
'aligned_reads/G1E_ATAC_0h_rep1.bam' does not appear to have an index. You MUST index the file first!

Deeptools requires an index file. We can generate one of these with samtools index

samtools index aligned_reads/G1E_ATAC_0h_rep1.bam
[E::hts_idx_push] unsorted positions
samtools index: "aligned_reads/G1E_ATAC_0h_rep1.bam" is corrupted or unsorted

We get another error, this time because samtools index only works on a file where the reads are sorted by their chromosomal position. We can sort the files using samtools sort

samtools sort aligned_reads/G1E_ATAC_0h_rep1.bam > aligned_reads/G1E_ATAC_0h_rep1.sorted.bam
samtools index aligned_reads/G1E_ATAC_0h_rep1.sorted.bam
ls -lh aligned_reads/
-rw-r--r-- 1 user group  81M May  4 17:06 G1E_ATAC_0h_rep1.bam
-rw-r--r-- 1 user group  75M May  5 10:15 G1E_ATAC_0h_rep1.sorted.bam
-rw-r--r-- 1 user group  95K May  5 10:17 G1E_ATAC_0h_rep1.sorted.bam.bai
-rw-r--r-- 1 user group 100M May  4 16:58 G1E_ATAC_0h_rep2.bam
-rw-r--r-- 1 user group 100M May  4 17:11 G1E_ATAC_30h_rep1.bam
-rw-r--r-- 1 user group 8.9M May  4 17:12 G1E_ATAC_30h_rep2.bam
-rw-r--r-- 1 user group  47M May  4 16:43 G1E_ChIP_Gata1_24h.chr11.bam
-rw-r--r-- 1 user group  20M May  4 16:44 G1E_ChIP_Gata2_0h.chr11.bam
-rw-r--r-- 1 user group  46M May  4 16:45 G1E_ChIP_Input_0h.chr11.bam
-rw-r--r-- 1 user group  37M May  4 16:46 G1E_ChIP_Input_24h.chr11.bam

Now that we have sorted and indexed our bam file, we can finally run bamCoverage:

bamCoverage -b aligned_reads/G1E_ATAC_0h_rep1.sorted.bam --binSize 1 \
--blackListFileName annotation/mm10-blacklist.ENCSR636HFF.v2.bed \
--normalizeUsing RPKM --extendReads -o bigwigs/G1E_ATAC_0h_rep1.bw -r chr11:32180000-32345000
ls -l bigwigs/

Sorting and indexing files in a loop (part 1)

Write a pattern to match just the unsorted .bam files

ls -l _______

Expected output:

-rw-r--r-- 1 user group  84893348 May  4 17:06 aligned_reads/G1E_ATAC_0h_rep1.chr11.bam
-rw-r--r-- 1 user group 104745615 May  4 16:58 aligned_reads/G1E_ATAC_0h_rep2.chr11.bam
-rw-r--r-- 1 user group 104157142 May  4 17:11 aligned_reads/G1E_ATAC_30h_rep1.chr11.bam
-rw-r--r-- 1 user group   9254482 May  4 17:12 aligned_reads/G1E_ATAC_30h_rep2.chr11.bam
-rw-r--r-- 1 user group  48952433 May  4 16:43 aligned_reads/G1E_ChIP_Gata1_24h.chr11.bam
-rw-r--r-- 1 user group  20428179 May  4 16:44 aligned_reads/G1E_ChIP_Gata2_0h.chr11.bam
-rw-r--r-- 1 user group  47261386 May  4 16:45 aligned_reads/G1E_ChIP_Input_0h.chr11.bam
-rw-r--r-- 1 user group  37927946 May  4 16:46 aligned_reads/G1E_ChIP_Input_24h.chr11.bam

Solution

ls -l aligned_reads/G1E_*.chr11.bam

Sorting and indexing files in a loop (part 2)

Use the pattern you just wrote to write a loop that uses the cut command to print out the name of the bam file without the “.bam” part.

for bamfile in <PATTERN FROM PART 1>
do
  echo $bamfile
  echo $bamfile | cut -d ____ -f ____
done

Expected output:

aligned_reads/G1E_ATAC_0h_rep1.chr11.bam
aligned_reads/G1E_ATAC_0h_rep1.chr11
aligned_reads/G1E_ATAC_0h_rep2.chr11.bam
aligned_reads/G1E_ATAC_0h_rep2.chr11
aligned_reads/G1E_ATAC_30h_rep1.chr11.bam
aligned_reads/G1E_ATAC_30h_rep1.chr11
aligned_reads/G1E_ATAC_30h_rep2.chr11.bam
aligned_reads/G1E_ATAC_30h_rep2.chr11
aligned_reads/G1E_ChIP_Gata1_24h.chr11.bam
aligned_reads/G1E_ChIP_Gata1_24h.chr11
aligned_reads/G1E_ChIP_Gata2_0h.chr11.bam
aligned_reads/G1E_ChIP_Gata2_0h.chr11
aligned_reads/G1E_ChIP_Input_0h.chr11.bam
aligned_reads/G1E_ChIP_Input_0h.chr11
aligned_reads/G1E_ChIP_Input_24h.chr11.bam
aligned_reads/G1E_ChIP_Input_24h.chr11

Solution

for bamfile in aligned_reads/G1E_*.chr11.bam
do
  echo $bamfile
  echo $bamfile | cut -d "." -f 1,2
done

Sorting and indexing files in a loop (part 3)

Using your answers to parts 1 & 2 and the “command substitution” syntax $() we learned in the [shell find lesson][shell-find] to print the original bam file and its corresponding sorted bam file.

for bamfile in <PATTERN FROM PART 1>
do
  echo $bamfile $(echo $bamfile | cut -d ____ -f ____).sorted.bam
done

Expected output:

aligned_reads/G1E_ATAC_0h_rep1.chr11.bam aligned_reads/G1E_ATAC_0h_rep1.chr11.sorted.bam
aligned_reads/G1E_ATAC_0h_rep2.chr11.bam aligned_reads/G1E_ATAC_0h_rep2.chr11.sorted.bam
aligned_reads/G1E_ATAC_30h_rep1.chr11.bam aligned_reads/G1E_ATAC_30h_rep1.chr11.sorted.bam
aligned_reads/G1E_ATAC_30h_rep2.chr11.bam aligned_reads/G1E_ATAC_30h_rep2.chr11.sorted.bam
aligned_reads/G1E_ChIP_Gata1_24h.chr11.bam aligned_reads/G1E_ChIP_Gata1_24h.chr11.sorted.bam
aligned_reads/G1E_ChIP_Gata2_0h.chr11.bam aligned_reads/G1E_ChIP_Gata2_0h.chr11.sorted.bam
aligned_reads/G1E_ChIP_Input_0h.chr11.bam aligned_reads/G1E_ChIP_Input_0h.chr11.sorted.bam
aligned_reads/G1E_ChIP_Input_24h.chr11.bam aligned_reads/G1E_ChIP_Input_24h.chr11.sorted.bam

Solution

for bamfile in aligned_reads/G1E_*.chr11.bam
do
  echo $bamfile $(echo $bamfile | cut -d "." -f 1,2).sorted.bam
done

Sorting and indexing files in a loop (part 4)

Using your answers to parts 1, 2 & 3 write a for loop that sorts and indexes all of the bam files.

for ______________
do
  samtools sort ___________
  samtools index __________
done

Solution

for bamfile in aligned_reads/G1E_*.chr11.bam
do
  samtools sort $bamfile > $(echo $bamfile | cut -d "." -f 1,2).sorted.bam
  samtools index $(echo $bamfile | cut -d "." -f 1,2).sorted.bam
done

And since we’ve now sorted and indexed all our bam files, we should be able to make bigwigs for them all using a for loop:

for bam in aligned_reads/G1E_*.sorted.bam
do
  bamCoverage -b $bam --binSize 1 --blackListFileName annotation/mm10-blacklist.ENCSR636HFF.v2.bed \
  --normalizeUsing RPKM --extendReads -o $(echo $bam | cut -d "." -f 1,2).bw -r chr11:32180000-32345000
  mv $(echo $bam | cut -d "." -f 1,2).bw bigwigs/
done
*ERROR*: library is not paired-end. Please provide an extension length.
mv: cannot stat `aligned_reads/G1E_ChIP_Gata2_0h.chr11.bw': No such file or directory

We get an error message because the ChIP samples are not paired end, so we can’t automatically extend the fragments, we have to give an estimated fragment size:

for bam in aligned_reads/G1E_ChIP*.sorted.bam
do
  bamCoverage -b $bam --binSize 1 --blackListFileName annotation/mm10-blacklist.ENCSR636HFF.v2.bed \
  --normalizeUsing RPKM --extendReads 150 -o $(echo $bam | cut -d "." -f 1,2).bw -r chr11:32180000-32345000
  mv $(echo $bam | cut -d "." -f 1,2).bw bigwigs/
done
ls -l bigwigs/
-rw-r--r-- 1 user group 24116 May  5 15:01 G1E_ATAC_0h_rep1.chr11.bw
-rw-r--r-- 1 user group 28656 May  5 15:02 G1E_ATAC_0h_rep2.chr11.bw
-rw-r--r-- 1 user group 28260 May  5 15:02 G1E_ATAC_30h_rep1.chr11.bw
-rw-r--r-- 1 user group  4031 May  5 15:02 G1E_ATAC_30h_rep2.chr11.bw
-rw-r--r-- 1 user group 38176 May  5 15:07 G1E_ChIP_Gata1_24h.chr11.bw
-rw-r--r-- 1 user group 24922 May  5 15:07 G1E_ChIP_Gata2_0h.chr11.bw
-rw-r--r-- 1 user group 33703 May  5 15:07 G1E_ChIP_Input_0h.chr11.bw
-rw-r--r-- 1 user group 27264 May  5 15:08 G1E_ChIP_Input_24h.chr11.bw

Key Points

  • First key point. Brief Answer to questions. (FIXME)