# cowsay sudo apt-get install fortune-mod cowsay fortune | cowsay / Don't let your mind wander -- it's too little to be let out alone. / ---------------------------------------- ^__^ (oo)_______ (__) )/ ||----w | || || # starwars telnet towel.blinkenlights.nl
Category Archives: General
Autoresize the Ubuntu guest screen in VirtualBox
The need for manual resize of screen in Ubuntu under VirtualBox is frustrating. There is simple solutions for that:
sudo apt-get install virtualbox-guest-dkms virtualbox-guest-utils virtualbox-guest-x11
After restart, the screen resolution will adjust automatically to the size of the VirtualBox window.
Geotag your photos for free
Today slightly different topic. I was looking for a way of geotagging my photos. At first, I wanted to share GPS position from my phone to my camera via wi-fi. But it turned out to be impossible…
Then, I considered buying dedicated GPS dongle, but these are quite expensive and pretty useless, as they work only with given manufacturer and sometimes only with given camera model, and you have to carry (and re-charge) yet another gadget (sic!).
Recently I have realised there is much simpler solution. During my trips, I enable GPS tracking on my phone (ie. Endomondo) and shoot photos as usual. After going back home I get GPS info from Endomondo website (More options –> Export –> .gpx file) and update photos with geotags (ie. GottenGeography or digiKam under Linux).
A portable version of the ETE toolkit
Randomisation of the paired-end read order in FastQ
I’ve been playing with Python trying to randomise the order of paired-end (PE) reads in FastQ. After very unsuccessful afternoon (Python implementation was randomising 1M PE reads in 10 minutes (!)), I’ve decided to try BASH.
BASH-based solution is simple and efficient (12 seconds for 1M PE reads):
paste <(zcat test.1.fq.gz) <(zcat test.2.fq.gz) | paste - - - - | shuf | awk -F'\t' '{OFS=FS; print $1,$3,$5,$7 > "random.1.fq"; print $2,$4,$6,$8 > "random.2.fq";}'
If you are interested in random subset of your FastQ file(s) ie 100K, you can specify it with shuf -n 100000.
For large FastQ files it’s good to follow the progress of randomisation. This can be than by pluging pv inside the process. Additionally, the output files can be gzipped on the fly, saving lots of disks I/O operations. Finally, reads can be sampled/randomised from more than one library (reads1_1/2 and reads2_1/2), as follows:
pv -cN zcat reads1_1.fastq.gz reads2_1.fastq.gz | zcat | paste - <(zcat reads1_2.fastq.gz reads2_2.fastq.gz) | paste - - - - | pv -cN shuf | shuf | pv -cN awk | awk -F't' '{OFS=FS; print $1,$3,$5,$7 | "gzip > random_1.fq.gz"; print $2,$4,$6,$8 | "gzip > random_2.fq.gz";}'
Speeding up TAR.GZ compression with PIGZ
Most of you probably noticed TAR.GZ compression isn’t very fast. Recently, during routine system backup I have realised TAR.GZ is limited not by disk read/write, but GZIP compression (98% of computation).
time sudo tar cpfz backup/ubuntu1404.tgz --one-file-system / real 6m20.999s user 6m1.800s sys 0m19.043s
GZIP in its standard implementation is single CPU bound, while most of modern computers can run 4-8 threads concurrently. But there are also multi-core implementations of GZIP ie. PIGZ. I have decided to install PIGZ and plug it with TAR as follows:
sudo apt-get install lbzip2 pigz time sudo tar cpf backup/ubuntu1404.pigz.tgz --one-file-system --use-compress-program=pigz / real 1m43.693s user 8m34.168s sys 0m20.243s
As you can see, TAR.GZ using PIGZ on 4-cores i7-4770k (using 8 threads) is 2.5 times faster than GZIP! And you get standard TAR.GZ archive as output:)
The same applies to BZIP2 compression using LBZIP2.