So, I was interestd in running the software peakranger bcp recently, but
stumbeled into a rather irritating probelm. I kept getting the classic message
Segmentation fault (core dumped). Upon looking around for the solution to
this problem I found nothing. So I thought I would document briefly what I did
to fix it.
program version: 1.18 Data files: File format: bam Sample file: chip_rep1_BWA.sorted.bam Control file: input_rep1_BWA.sorted.bam Qualities: P value cut off: 0.0001 sliding window size: 500 Read extension length: 200 Output: Regions: test_region.bed HTML reports: Disabled(--report not specified) Reads statistics: Treatment reads +: 12002431 Treatment reads -: 11987889 Average read length: 30 Control reads +: 11301505 Control reads -: 11298535 Average read length: 30 Verifying reads... Warning: B73V4_ctg162 only contains reads in the positive strand. The chromosome is removed. Warning: B73V4_ctg162 only contains reads in the positive strand. The chromosome is removed. Reads statistics after correction: Treatment reads +: 12002424 Treatment reads -: 11987889 Control reads +: 11301503 Control reads -: 11298535 Calling peaks... Segmentation fault (core dumped)
Upon investigating this further, I decided to see if the issue was coming from the scaffold reported in the error, B73V4_ctg162. To see check this I first analysed how many reads were aligning to this contig in the maize genome using using a quick samtools command:
#Command: samtools idxstats chip_B73_ear_H3K36me3_rep1_bowtie2algn.sorted.bam | grep B73V4_ctg162 #Output: B73V4_ctg162 40634 0 0
So, there were ZERO reads aligning to this contig. Making me think the software might be chocking on this scaffold. But after removing this scaffold from the SAM file, I was still being greated by
Segmentation fault (core dumped) . So, in order to try again I took a more drastic measure removed all scaffolds that had CTG in them. I did this after checking the gff3 file for all scaffolds/contigs, and coming to the conclusion that none of them are particularly gene rich and/or interesting for my questions. I removed ALL of these scaffolds using the following command:
grep -v "B73V4_ctg*" chip_rep1_bowtie2algn.sam > chip_rep1_bowtie2algn.noctg_scaff.sam && grep -v "B73V4_ctg*" chip_rep1_bowtie2algn.sam > chip_rep1_bowtie2algn.noctg_scaff.sam
I eventuall re-ran this software using these generated SAM (I didn’t want to take the extra step of going from
samtools view | grep _v back to sam) files, and was eventually rewarded with:
program version: 1.18 Data files: File format: sam Sample file: chip_rep1_bowtie2algn.noctg_scaff.sam Control file: chip_input_rep1_bowtie2algn.noctg_scaff.sam Qualities: P value cut off: 0.0001 sliding window size: 500 Read extension length: 200 Output: Regions: test_sam_no162_region.bed HTML reports: Disabled(--report not specified) Reads statistics: Treatment reads +: 20131985 Treatment reads -: 20126418 Average read length: 76 Control reads +: 15762557 Control reads -: 15771271 Average read length: 76 Verifying reads... Reads statistics after correction: Treatment reads +: 20131985 Treatment reads -: 20126418 Control reads +: 15762557 Control reads -: 15771271 Calling peaks... Discovered 2693 regions in 1. Discovered 1222 regions in 10. Discovered 2065 regions in 2. Discovered 1902 regions in 3. Discovered 1879 regions in 4. Discovered 2085 regions in 5. Discovered 1538 regions in 6. Discovered 1445 regions in 7. Discovered 1649 regions in 8. Discovered 1396 regions in 9. Discovered 3 regions in Mt. Discovered 0 regions in Pt. Total regions discovered: 17877
Success! It appears that those nasty scaffolds/contigs were proving to be quite troublesome for this software. But, this isn’t an ideal solution. We’re still casting out genomic information that I would prefer to keep for analysis. So… There has to be a smarter way of doing this. Maybe pin pointing the exact scaffolds that were failing and why? But in my case, this got me the desired results I needed.