Recently, I was interested to find out what fraction of genome is expressed. This can be computed quite easily from RNA-Seq alignments using bedtools and some simple scripting. Note, it’s crucial to use -split parameter in bedtools genomecov to report correct coverage from split alignments. Otherwise, the introns that are spliced-out will be treated as expressed regions.
# get genomic regions with at least n reads aligned n=3 ref=genome for f in *.bam; do echo `date` $f; # get genomic intervals covered by n reads bedtools genomecov -split -ibam $f -g $ref.fa.fai -dz | awk -F'\t' 'BEGIN {OFS = FS} {if ($3>='$n') {print $1,$2,$2+1}}' | bedtools merge > $f.${n}reads.bed; # report bp covered by n reads awk '{sum+=$3-$2} END {print sum}' $f.${n}reads.bed done; date