On handy docker images

Motivated by successful stripping problematic dependencies from Redundans, I have decided to generate smaller Docker image, starting with Alpine Linux image (2Mb / 5Mb after downloading) instead of Ubuntu (49Mb / 122Mb). Previously, I couldn’t really rely on Alpine Linux, because it was impossible to make these problematic dependencies running… But now it’s whole new world of possibilities 😉

There are very few dependencies left, so I have started… (You can find all the commands below).

  1. First, I have check what can be installed from package manager.
    Only Python and Perl.

  2. Then I have checked if any of binaries are working.
    For example, GapCloser is provided as binary. It took me some time to find source code…
    Anyway, none of the binaries worked out of the box. It was expected, as Alpine Linux is super stripped…

  3. I have installed build-base in order to be able to build things.
    Additionally, BWA need zlib-dev.

  4. Alpine Linux doesn’t use standard glibc library, but musl-libc (you can read more about differences between the two), so some programmes (ie. BWA) may be quite reluctant to compile.
    After some hours of trying & thanks to the help of mp15, I have found a solution, not so complicated 🙂

  5. I have realised, that Dockerfile doesn’t like standard BASH brace expansion, that is working otherwise in Docker Alpine console…
    so ls *.{c,h} should be ls *.c *.h

  6. After that, LAST and GapCloser compilation were easy, relatively 😉

Below, you can find the code from Docker file (without RUN commands).

apk add --update --no-cache python perl bash wget build-base zlib-dev
mkdir -p /root/src && cd /root/src && wget http://downloads.sourceforge.net/project/bio-bwa/bwa-0.7.15.tar.bz2 && tar xpfj bwa-0.7.15.tar.bz2 && ln -s bwa-0.7.15 bwa && cd bwa && \
cp kthread.c kthread.c.org && echo "#include <stdint.h>" > kthread.c && cat kthread.c.org >> kthread.c && \
sed -ibak 's/u_int32_t/uint32_t/g' `grep -l u_int32_t *.c *.h` && make && cp bwa /bin/ && \
cd /root/src && wget http://liquidtelecom.dl.sourceforge.net/project/soapdenovo2/GapCloser/src/r6/GapCloser-src-v1.12-r6.tgz && tar xpfz GapCloser-src-v1.12-r6.tgz && ln -s v1.12-r6/ GapCloser && cd GapCloser && make && cp bin/GapCloser /bin/ && \
cd /root/src && wget http://last.cbrc.jp/last-744.zip && unzip last-744.zip && ln -s last-744 last && cd last && make && make install && \
cd /root/src && rm -r last* bwa* GapCloser* v* 

# SSPACE && redundans in /root/srt
cd /root/src && wget -q http://www.baseclear.com/base/download/41SSPACE-STANDARD-3.0_linux-x86_64.tar.gz && tar xpfz 41SSPACE-STANDARD-3.0_linux-x86_64.tar.gz && ln -s SSPACE-STANDARD-3.0_linux-x86_64 SSPACE && wget -O- -q http://cpansearch.perl.org/src/GBARR/perl5.005_03/lib/getopts.pl > SSPACE/dotlib/getopts.pl && \
wget --no-check-certificate -q -O redundans.tgz https://github.com/lpryszcz/redundans/archive/master.tar.gz && tar xpfz redundans.tgz && mv redundans-master redundans && ln -s /root/src/redundans /redundans && rm *gz

apk del wget build-base zlib-dev 
apk add libstdc++

After building & pushing, I have noticed that Alpine-based image is slightly smaller (99Mb), than the one based on Ubuntu (127Mb). Surprisingly, Alpine-based image is larger (273Mb) than Ubuntu-based (244Mb) after downloading. So, I’m afraid all of these hours didn’t really bring any substantial reduction in the image size.

Conclusion?
I was very motivated to build my application on Alpine Linux and expected substantial size reduction. But I’d say that relying on Alpine Linux image doesn’t always pay off in terms of smaller image size, forget about production time… And this I know from my own experience.
But maybe I didn’t something wrong? I’d be really glad for some advices/comments!

Nevertheless, stripping a few dependencies from my application (namely Biopython, numpy & scipy), resulted in much more compact image even using Ubuntu-based image (127Mb vs 191Mb; and 244Mb vs 440Mb after downloading). So I think this is the way to go 🙂

Using docker for application development

I found Docker super useful, but going through a manual is quite time consuming. Here, very stripped manual to create your first image and push it online 🙂

# install docker
wget -qO- https://get.docker.com/ | sh
 
# add your user to docker group
sudo usermod -aG docker $USER
 
# check if it's working
docker run docker/whalesay cowsay "hello world!"
 
# create an account on https://hub.docker.com
# and login
docker login -u $USER --email=EMAIL
 
# run image
docker run -it ubuntu
 
# make some changes ie. create user, install needed software etc
 
# finally open new terminal & commit changes (SESSIONID=HOSTNAME)
docker commit SESSIONID $USER/image:version
 
# mount local directory `pwd`/test as /test in read/write mode
docker run -it -v `pwd`/test:/test:rw $USER/image:version some command with arguments
 
# push image
docker push $USER/image:version

From now, you can get your image from any other machine connected to Internet by executing:

docker run -it $USER/image:version
# ie. redundans image
docker run -it -w /root/src/redundans lpryszcz/redundans:v0.11b ./redundans.py -v -i test/{600,5000}_{1,2}.fq.gz -f test/contigs.fa -o test/run1
 
# you can create alias latest, then version can be skipped on running
docker tag lpryszcz/redundans:v0.11b lpryszcz/redundans:latest
docker push lpryszcz/redundans:latest
 
docker run -it lpryszcz/redundans

You can add info about your repository at https://hub.docker.com/r/$USER/image/