Alternative splicing produces an array of transcripts from individual gene. Some of these alternative transcripts are expressed in tissue-specific fashion and may play different roles in the cell.
Recently, I got interested in identifying isoform switching from RNA-Seq data, this is detecting changes in the most expressed isoform between conditions. As I couldn’t find any ready solution, I have written my own script for this task (fpkm2major.py). This script simply parse
isoforms.fpkm_tracking output from
cufflinks and report genes with evidence of isoform switching into tab-delimited output file.
For each gene, number of transcripts, cumulative expression, major isoforms for each condition and each major isoform are reported. Lowly expressed genes in given condition are coded with `0` (
--minFPKM < 1), while genes without clear major isoform are marked with `-1` (
fpkm2major.py -v -i sample1/isoforms.fpkm_tracking -o major_isoforms.txt
# combine data from multiple samples
paste sample1/isoforms.fpkm_tracking <(cut -f10- sample2/isoforms.fpkm_tracking) \
| fpkm2major.py -v -o major_isoforms2.txt
# visualise using cummeRbund
cuff <- readCufflinks("sample1")
# expression plot for isoforms of given gene
dev.copy(svg, paste(geneid,".sample1.svg"), width=12, height=12); dev.off()