Connecting to MySQL without passwd prompt

If you are (like me) annoyed by providing password at every mysql login, you can skip it. Also it makes easier programmatic access to any MySQL db, as not passwd prompting is necessary šŸ™‚
Create `~/.my.cnf` file:

[client]
user=username
password="pass"
 
[mysql]
user=username
password="pass"

And login without `-p` parameter:

mysql -h host -u username dbname

If you want to use `~/.my.cnf` file in MySQLdb, just connect using this:

import MySQLdb
cnx = MySQLdb.connect(host=host, port=port, read_default_file="~/.my.cnf")

Get gene names for set of RefSeq IDs

Today I needed to annotate set of RefSeq IDs in .bed file with gene names.
Firstly, I was looking for a way to get gene names for RefSeq IDs. I have found simple solution on BioStars.

mysql --user=genome -N --host=genome-mysql.cse.ucsc.edu -A -D danRer10 \
  -e "select name,name2 from refGene" > refSeq2gene.txt

Secondly, I’ve written simple Python script to add the gene name to .bed file in place of score which I don’t need at all.

#!/usr/bin/env python
# Add gene name instead of score to BED file
 
# USAGE: cat bed | bed2gene.py refSeq2gene.txt > with_gene_names.bed
 
import sys
 
fn = sys.argv[1]
refSeq2gene = {}
for l in open(fn):
    refSeq, gene = l[:-1].split('\t')
    refSeq2gene[refSeq] = gene
     
sys.stderr.write(" %s accesssions loaded!\n"%len(refSeq2gene))
     
for l in sys.stdin:
    ldata = l[:-1].split('\t')
    chrom, s, e, refSeq = ldata[:4]
    if refSeq in refSeq2gene:
        ldata[4] = refSeq2gene[refSeq]
    else:
        ldata[4] = "-"
    sys.stdout.write("\t".join(ldata)+"\n")

Hope, some will find it useful.

Progress of long processes in BASH

You can view progress of your process execution in UNIX using pv or bar. With pv, you can even report progress of multiple modules of your pipeline.

This is very useful for tracing large database dump/restore progress:

pv -cN gzip backup.sql.gz | gzip -d | pv -cN mysql | mysql
Ā  mysql: 799MiB 0:06:30 [1.68MiB/s] [ <=> ]
   gzip: 173MiB 0:06:30 [ 250kiB/s] [=> ] 4% ETA 2:25:09

Fetching from MySQL in batches

Today, I have encountered problems during fetching large MySQL table. Simply, the server was getting out of memory due to largeĀ table size.

I knew, you can fetch subset of dataĀ from MySQL tableĀ using LIMIT. Standard `LIMIT N` returns first N rows of your table.

However there is second way of calling LIMIT, `LIMIT S, N`; where S is the starting row and N is total number of rows to be returned. Thus, you can easily plug it into FOR LOOPĀ to fetch from large MySQL table in for example 1 million rows batches:

step=1000000
t=large_table_name
user=mysql_username
pass=mysql_pass
db=database_name
#get row count
c=`mysql -u$user -p$pass $db -N -A -e"select count(*) from $t"`
echo `date` $t "with $c rows";
#write table header
echo -e $header | gzip > $t.txt.gz
#get table in batch
for i in `seq 1 $step $c`; do
  cmd="select * from $t limit $i, $step"
  echo -en "r$i / $c"
  mysql -u$user -p$pass $db -A -N -e"$cmd" | gzip >> $t.txt.gz
done
echo ""

Hope, someone will find it useful:)