Today I needed to annotate set of RefSeq IDs in .bed file with gene names.
Firstly, I was looking for a way to get gene names for RefSeq IDs. I have found simple solution on BioStars.
mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D danRer10 \ -e "select name,name2 from refGene" > refSeq2gene.txt
Secondly, I’ve written simple Python script to add the gene name to .bed file in place of score which I don’t need at all.
#!/usr/bin/env python # Add gene name instead of score to BED file # USAGE: cat bed | bed2gene.py refSeq2gene.txt > with_gene_names.bed import sys fn = sys.argv[1] refSeq2gene = {} for l in open(fn): refSeq, gene = l[:-1].split('\t') refSeq2gene[refSeq] = gene sys.stderr.write(" %s accesssions loaded!\n"%len(refSeq2gene)) for l in sys.stdin: ldata = l[:-1].split('\t') chrom, s, e, refSeq = ldata[:4] if refSeq in refSeq2gene: ldata[4] = refSeq2gene[refSeq] else: ldata[4] = "-" sys.stdout.write("\t".join(ldata)+"\n")
Hope, some will find it useful.