Installing Gene Cluster on Ubuntu

Gene Cluster is a program for clustering. I wanted to use it to analyse gene expression data. However, I had problems during installation under Ubuntu 14.04. This is how I solved it:

# install dependencies: Motif libraries
sudo apt-get install libxext-dev libmotif-dev

Get GeneCluster3.0 source code and unpack it.

# configure to install in local dir
./configure --prefix=`pwd` --program-prefix=gene_ && make && make install

# add install dir to ~/.bashrc

Get gene names for set of RefSeq IDs

Today I needed to annotate set of RefSeq IDs in .bed file with gene names.
Firstly, I was looking for a way to get gene names for RefSeq IDs. I have found simple solution on BioStars.

mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D danRer10 \
  -e "select name,name2 from refGene" > refSeq2gene.txt

Secondly, I’ve written simple Python script to add the gene name to .bed file in place of score which I don’t need at all.

#!/usr/bin/env python
# Add gene name instead of score to BED file
 
# USAGE: cat bed | bed2gene.py refSeq2gene.txt > with_gene_names.bed
 
import sys
 
fn = sys.argv[1]
refSeq2gene = {}
for l in open(fn):
    refSeq, gene = l[:-1].split('\t')
    refSeq2gene[refSeq] = gene
     
sys.stderr.write(" %s accesssions loaded!\n"%len(refSeq2gene))
     
for l in sys.stdin:
    ldata = l[:-1].split('\t')
    chrom, s, e, refSeq = ldata[:4]
    if refSeq in refSeq2gene:
        ldata[4] = refSeq2gene[refSeq]
    else:
        ldata[4] = "-"
    sys.stdout.write("\t".join(ldata)+"\n")

Hope, some will find it useful.