Geotag your photos for free

Today slightly different topic. I was looking for a way of geotagging my photos. At first, I wanted to share GPS position from my phone to my camera via wi-fi. But it turned out to be impossible…

Then, I considered buying dedicated GPS dongle, but these are quite expensive and pretty useless, as they work only with given manufacturer and sometimes only with given camera model, and you have to carry (and re-charge) yet another gadget (sic!).

Recently I have realised there is much simpler solution. During my trips, I enable GPS tracking on my phone (ie. Endomondo) and shoot photos as usual. After going back home I get GPS info from Endomondo website (More options –> Export –> .gpx file) and update photos with geotags (ie. GottenGeography or digiKam under Linux).

Randomisation of the paired-end read order in FastQ

I’ve been playing with Python trying to randomise the order of paired-end (PE) reads in FastQ. After very unsuccessful afternoon (Python implementation was randomising 1M PE reads in 10 minutes (!)), I’ve decided to try BASH.
BASH-based solution is simple and efficient (12 seconds for 1M PE reads):

paste <(zcat test.1.fq.gz) <(zcat test.2.fq.gz) | paste - - - - | shuf | awk -F'\t' '{OFS=FS; print $1,$3,$5,$7 > "random.1.fq"; print $2,$4,$6,$8 > "random.2.fq";}'

If you are interested in random subset of your FastQ file(s) ie 100K, you can specify it with shuf -n 100000.

For large FastQ files it’s good to follow the progress of randomisation. This can be than by pluging pv inside the process. Additionally, the output files can be gzipped on the fly, saving lots of disks I/O operations. Finally, reads can be sampled/randomised from more than one library (reads1_1/2 and reads2_1/2), as follows:

pv -cN zcat reads1_1.fastq.gz reads2_1.fastq.gz | zcat | paste - <(zcat reads1_2.fastq.gz reads2_2.fastq.gz) | paste - - - - | pv -cN shuf | shuf | pv -cN awk | awk -F't' '{OFS=FS; print $1,$3,$5,$7 | "gzip > random_1.fq.gz"; print $2,$4,$6,$8 | "gzip > random_2.fq.gz";}'

Speeding up TAR.GZ compression with PIGZ

Most of you probably noticed TAR.GZ compression isn’t very fast. Recently, during routine system backup I have realised TAR.GZ is limited not by disk read/write, but GZIP compression (98% of computation).

time sudo tar cpfz backup/ubuntu1404.tgz --one-file-system /

real 6m20.999s
user 6m1.800s
sys  0m19.043s

GZIP in its standard implementation is single CPU bound, while most of modern computers can run 4-8 threads concurrently. But there are also multi-core implementations of GZIP ie. PIGZ. I have decided to install PIGZ and plug it with TAR as follows:

sudo apt-get install lbzip2 pigz

time sudo tar cpf backup/ubuntu1404.pigz.tgz --one-file-system --use-compress-program=pigz /

real 1m43.693s
user 8m34.168s
sys  0m20.243s

As you can see, TAR.GZ using PIGZ on 4-cores i7-4770k (using 8 threads) is 2.5 times faster than GZIP! And you get standard TAR.GZ archive as output:)

The same applies to BZIP2 compression using LBZIP2.