I’ve been asked to help analysing data from RNA editing experiment. Typically, such experiments consists of DNA and RNA sequencing. The purpose of these is to identify the differences between DNA template and resulting RNA product.
The canonical RNA editing enzyme, ADAR, deaminase adenine to inosine (A-to-I editing). Inosine is subsequently detected as guanine by the sequencer. Thus the identification of RNA editing can be simplified down to genotyping of DNA and RNA samples and finding differences between these two.
Importantly, RNA editing can be promiscuous, meaning multiple As are changed into Is in given region. This makes an alignment challenging. In addition, you may ignore most of edited sites due to quality filtering (many SNP clustered together), if you genotype your libraries with standard tools/pipelines ie. GATK. Finally, RNAseq is more error-prone than DNAseq, due to low fidelity of reverse-transcriptase, so you may want to account for that as well.
Ideally, you want stranded RNAseq, then only A>G editing will be observed. But in reality, you may analyse also unstranded RNAseq libraries, then you expect both A>G and T>C edited sites, from transcripts of genes on Watson (+) and Crick (-) strand, respectively.
DNA/RNAseq alignment
For the alignment part, I’ve been using BWA MEM (DNAseq) and STAR (RNAseq). If you use STAR, you should reassign mapping qualities (mapQ). It’s also recommended to split reads at N CIGAR. These two can be accomplished with GATK:
java -jar ~/src/GATK/GenomeAnalysisTK.jar -T SplitNCigarReads -R REF.fa -I SAMPLE.bam -o SAMPLE.split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS > SAMPLE.split.log 2>&1 &
Finally, you may want to realign both DNAseq and RNAseq around indels using GATK IndelRealigner.
RNA editing identification from BAM files
I’ve written program, bam2RNAediting.py (currently REDiscover), to detect RNA editing in a set of DNAseq and RNAseq experiments. It’s in early developmental phase, but it’s functional already. It can be run simply by:
bam2RNAediting.py -d bwamem/DNAseq*.bam -r star/RNAseq*.bam > bam2RNAediting.txt
Quality control
Simple quality control for RNA editing experiment could be looking at the enrichment of A>G (and T>C) edited sites over other changes. If RNA editing is present in your sample you would expect to find quite strong enrichment of A>G (and T>C) changes between DNAseq and RNAseq. This is indeed what I’ve found in my samples.
Let me know how it works for you. I’m looking for your feedback!