Running Jupyter as public service

Some time ago, I’ve written about setting up IPython as a public service. Today, I’ll write about setting up Jupyter, IPython descendant, that beside Python supports tons of other languages and frameworks.

Jupyter notebook will be running in separate user, so your personal files are safe, but not as system service. Therefore, you will need to restart it upon system reboot. I recommend running it in SCREEN session, so you can easily login into the server and check the Jupyter state.

  1. Install & setup Jupyter
  2. #
    sudo apt-get install build-essential python-dev
    sudo pip install jupyter
    
    # create new user
    sudo adduser jupyter
     
    # login as new user
    su jupyter
    
    # make sure to add `unset XDG_RUNTIME_DIR` to ~/.bashrc
    # otherwise you'll encounter: OSError: [Errno 13] Permission denied: '/run/user/1003/jupyter'
    echo 'unset XDG_RUNTIME_DIR' >> ~/.bashrc
    source ~/.bashrc
    
    # generate ssl certificates
    mkdir ~/.ssl
    openssl req -x509 -nodes -days 999 -newkey rsa:1024 -keyout ~/.ssl/mykey.key -out ~/.ssl/mycert.pem
    
    # generate config
    jupyter notebook --generate-config
    
    # generate pass and checksum
    ipython -c "from IPython.lib import passwd; passwd()"
    # enter your password twice, save it and copy password hash
    ## Out[1]: 'sha1:[your hashed password here]'
     
    # add to ~/.jupyter/jupyter_notebook_config.py
    c.NotebookApp.ip = '*'
    c.NotebookApp.open_browser = False
    c.NotebookApp.port = 8881
    c.NotebookApp.password = u'sha1:[your hashed password here]'
    c.NotebookApp.certfile = u'/home/jupyter/.ssl/mycert.pem'
    c.NotebookApp.keyfile = u'/home/jupyter/.ssl/mykey.key'
    
    # create some directory for notebook files ie. ~/Public/jupyter
    mkdir -p ~/Public/jupyter && cd ~/Public/jupyter
     
    # start notebook server
    jupyter notebook
    
  3. Add kernels
  4. You can add multiple kernels to Jupyter. Here I’ll cover installation of some:

    • Python
    • sudo pip install ipykernel
      
      # if you wish to use matplotlib, make sure to add to 
      # ~/.ipython/profile_default/ipython_kernel_config.py
      c.InteractiveShellApp.matplotlib = 'inline'
      
    • BASH kernel
    • sudo pip install bash_kernel
      sudo python -m bash_kernel.install
      
    • Perl
    • This didn’t worked for me:/

      sudo cpan Devel::IPerl
    • IRkernel
    • Follow this tutorial.

    • Haskell
    • sudo apt-get install cabal-install
      git clone http://www.github.com/gibiansky/IHaskell
      cd IHaskell
      ./ubuntu-install.sh
      

Then, just navigate to https://YOURDOMAIN.COM:8881/, accept self-signed certificate and enjoy!
Alternatively, you can obtain certificate from Let’s encrypt.

Using existing domain encryption aka Apache proxy
If your domain is already HTTPS, you may consider setting up Jupyter on localhost and redirect all incoming traffic (already encrypted) to particular port on localhost (as suggested by @shebang).

# enable Apache mods
sudo a2enmod proxy proxy_http proxy_wstunnel && sudo service apache2 restart

# add to your Apache config
    <Location "/jupyter" >
        ProxyPass http://localhost:8881/jupyter
        ProxyPassReverse http://localhost:8881/jupyter
    </Location>
    <Location "/jupyter/api/kernels/" >
        ProxyPass        ws://localhost:8881/jupyter/api/kernels/
        ProxyPassReverse ws://localhost:8881/jupyter/api/kernels/
    </Location>
    <Location "/jupyter/api/kernels/">
        ProxyPass        ws://localhost:8881/jupyter/api/kernels/
        ProxyPassReverse ws://localhost:8881/jupyter/api/kernels/
    </Location>

# update you Jupyter config (~/.jupyter/jupyter_notebook_config.py)
c.NotebookApp.ip = 'localhost'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8881
c.NotebookApp.base_url = '/jupyter'
c.NotebookApp.password = u'sha1:[your hashed password here]'
c.NotebookApp.allow_origin = '*'

Note, it’s crucial to add Apache proxy for kernels (/jupyter/api/kernels/), otherwise you won’t be able to use terminals due to failed: Error during WebSocket handshake: Unexpected response code: 400 error.

On handy docker images

Motivated by successful stripping problematic dependencies from Redundans, I have decided to generate smaller Docker image, starting with Alpine Linux image (2Mb / 5Mb after downloading) instead of Ubuntu (49Mb / 122Mb). Previously, I couldn’t really rely on Alpine Linux, because it was impossible to make these problematic dependencies running… But now it’s whole new world of possibilities πŸ˜‰

There are very few dependencies left, so I have started… (You can find all the commands below).

  1. First, I have check what can be installed from package manager.
    Only Python and Perl.

  2. Then I have checked if any of binaries are working.
    For example, GapCloser is provided as binary. It took me some time to find source code…
    Anyway, none of the binaries worked out of the box. It was expected, as Alpine Linux is super stripped…

  3. I have installed build-base in order to be able to build things.
    Additionally, BWA need zlib-dev.

  4. Alpine Linux doesn’t use standard glibc library, but musl-libc (you can read more about differences between the two), so some programmes (ie. BWA) may be quite reluctant to compile.
    After some hours of trying & thanks to the help of mp15, I have found a solution, not so complicated πŸ™‚

  5. I have realised, that Dockerfile doesn’t like standard BASH brace expansion, that is working otherwise in Docker Alpine console…
    so ls *.{c,h} should be ls *.c *.h

  6. After that, LAST and GapCloser compilation were easy, relatively πŸ˜‰

Below, you can find the code from Docker file (without RUN commands).

apk add --update --no-cache python perl bash wget build-base zlib-dev
mkdir -p /root/src && cd /root/src && wget http://downloads.sourceforge.net/project/bio-bwa/bwa-0.7.15.tar.bz2 && tar xpfj bwa-0.7.15.tar.bz2 && ln -s bwa-0.7.15 bwa && cd bwa && \
cp kthread.c kthread.c.org && echo "#include <stdint.h>" > kthread.c && cat kthread.c.org >> kthread.c && \
sed -ibak 's/u_int32_t/uint32_t/g' `grep -l u_int32_t *.c *.h` && make && cp bwa /bin/ && \
cd /root/src && wget http://liquidtelecom.dl.sourceforge.net/project/soapdenovo2/GapCloser/src/r6/GapCloser-src-v1.12-r6.tgz && tar xpfz GapCloser-src-v1.12-r6.tgz && ln -s v1.12-r6/ GapCloser && cd GapCloser && make && cp bin/GapCloser /bin/ && \
cd /root/src && wget http://last.cbrc.jp/last-744.zip && unzip last-744.zip && ln -s last-744 last && cd last && make && make install && \
cd /root/src && rm -r last* bwa* GapCloser* v* 

# SSPACE && redundans in /root/srt
cd /root/src && wget -q http://www.baseclear.com/base/download/41SSPACE-STANDARD-3.0_linux-x86_64.tar.gz && tar xpfz 41SSPACE-STANDARD-3.0_linux-x86_64.tar.gz && ln -s SSPACE-STANDARD-3.0_linux-x86_64 SSPACE && wget -O- -q http://cpansearch.perl.org/src/GBARR/perl5.005_03/lib/getopts.pl > SSPACE/dotlib/getopts.pl && \
wget --no-check-certificate -q -O redundans.tgz https://github.com/lpryszcz/redundans/archive/master.tar.gz && tar xpfz redundans.tgz && mv redundans-master redundans && ln -s /root/src/redundans /redundans && rm *gz

apk del wget build-base zlib-dev 
apk add libstdc++

After building & pushing, I have noticed that Alpine-based image is slightly smaller (99Mb), than the one based on Ubuntu (127Mb). Surprisingly, Alpine-based image is larger (273Mb) than Ubuntu-based (244Mb) after downloading. So, I’m afraid all of these hours didn’t really bring any substantial reduction in the image size.

Conclusion?
I was very motivated to build my application on Alpine Linux and expected substantial size reduction. But I’d say that relying on Alpine Linux image doesn’t always pay off in terms of smaller image size, forget about production time… And this I know from my own experience.
But maybe I didn’t something wrong? I’d be really glad for some advices/comments!

Nevertheless, stripping a few dependencies from my application (namely Biopython, numpy & scipy), resulted in much more compact image even using Ubuntu-based image (127Mb vs 191Mb; and 244Mb vs 440Mb after downloading). So I think this is the way to go πŸ™‚

On simplifying dependencies

Lately, to make Redundans more user friendly, I have simplified it’s dependencies, by replacing Biopython, numpy, scipy and SQLite with some (relatively) simple functions or modules.

Here, I will just focus on replacing Biopython, particularly SeqIO.index_db with FastaIndex. You may ask yourself, why I have invested time in reinventing the wheel. I’m big fan of Biopython, yet it’s huge project and some solutions are not optimal or require problematic dependencies. This is the case with SeqIO.db_index, that relies on SQLite3. Here again, I’m a big fan of SQLite, yet building Biopython with SQLite enabled proved not to be very straightforward for non-standard systems or less experience users. Beside, on some NFS settings, the SQLite3 db cannot be created at all.

Ok, let’s start from the basics. SeqIO.index_db allows random access to sequence files, so for example you can rapidly retrieve any entry from very large file. This is achieved by storing the ID and position of each entry from particular file in database, SQLite3 db. Then, if you want to retrieve particular record, SeqIO.index_db looks up if this record is present in SQLite3 db, retrieves record position in the file and reads only small chunk of this file instead of parsing entire file every time you want to get some record(s).
Similar feature is offered by samtools faidx, but in this case, the coordinates of each entry are stored in tab-delimited file .fai (more info about .fai). This format can be easily read & write by any programme, so I have decided to use it. In addition, I have realised, that samtools faidx is flexible enough, so you can add additional columns to the .fai without interrupting its functionality, but about that later…

In Redundans, I’ve been using SeqIO.index_db during assembly reduction (fasta2homozygous.py). Additionally, beside storing index, I’ve been also generating statistics for every FastA file, like number of contigs, cumulative size, N50, N90, GC and so on. I have realised, that these two can be easily combined, by extending .fai with four additional columns, storing number of occurencies for A, C, G & T in every sequence. Such .fai is compatible with samtools faidx and provides very easy way of calculating bunch of statistics about this file.
All of these, I’ve implemented in FastaIndex. Beside being dependency-free & very handy indexer, it can be used also as alternative to samtools faidx to retrieve sequences from large FastA files.

# retrieve bases from 20 to 60 from NODE_2
./FastaIndex.py -i test/run1/contigs.fa -r NODE_2_length_7674_cov_46.7841_ID_3:20-60
>NODE_2_length_7674_cov_46.7841_ID_3
CATAGAACGACTGGTATAAGCCAAACATGACCCATTGTTGC
#Time elapsed: 0:00:00.014243

samtools faidx test/run1/contigs.fa NODE_2_length_7674_cov_46.7841_ID_3:20-60
>NODE_2_length_7674_cov_46.7841_ID_3:20-60
CATAGAACGACTGGTATAAGCCAAACATGACCCATTGTTGC

Using docker for application development

I found Docker super useful, but going through a manual is quite time consuming. Here, very stripped manual to create your first image and push it online πŸ™‚

# install docker
wget -qO- https://get.docker.com/ | sh
 
# add your user to docker group
sudo usermod -aG docker $USER
 
# check if it's working
docker run docker/whalesay cowsay "hello world!"
 
# create an account on https://hub.docker.com
# and login
docker login -u $USER --email=EMAIL
 
# run image
docker run -it ubuntu
 
# make some changes ie. create user, install needed software etc
 
# finally open new terminal & commit changes (SESSIONID=HOSTNAME)
docker commit SESSIONID $USER/image:version
 
# mount local directory `pwd`/test as /test in read/write mode
docker run -it -v `pwd`/test:/test:rw $USER/image:version some command with arguments
 
# push image
docker push $USER/image:version

From now, you can get your image from any other machine connected to Internet by executing:

docker run -it $USER/image:version
# ie. redundans image
docker run -it -w /root/src/redundans lpryszcz/redundans:v0.11b ./redundans.py -v -i test/{600,5000}_{1,2}.fq.gz -f test/contigs.fa -o test/run1
 
# you can create alias latest, then version can be skipped on running
docker tag lpryszcz/redundans:v0.11b lpryszcz/redundans:latest
docker push lpryszcz/redundans:latest
 
docker run -it lpryszcz/redundans

You can add info about your repository at https://hub.docker.com/r/$USER/image/

Conflicting config for htop on machines sharing same /home directory

My friend spotted a problem with htop configuration. Simply when htop was executed on two different Ubuntu distros (10.04 and 14.04) the config was reset.
After some interrogation, we have spotted that 10.04 stores htop config to ~/.htoprc, while 14.04 to ~/.config/htop/htoprc. It was enough to remove one of them and link another one as below:

rm .htoprc
ln -s .config/htop/htoprc .htoprc

Connecting to MySQL without passwd prompt

If you are (like me) annoyed by providing password at every mysql login, you can skip it. Also it makes easier programmatic access to any MySQL db, as not passwd prompting is necessary πŸ™‚
Create `~/.my.cnf` file:

[client]
user=username
password="pass"
 
[mysql]
user=username
password="pass"

And login without `-p` parameter:

mysql -h host -u username dbname

If you want to use `~/.my.cnf` file in MySQLdb, just connect using this:

import MySQLdb
cnx = MySQLdb.connect(host=host, port=port, read_default_file="~/.my.cnf")

Easy citation in LibreOffice / OpenOffice with Mendeley

Creating reference list is always a nightmare. Mendeley and its handy LibreOffice / OpenOffice plugin may be of great help to many. It was for me. Below, I’ll describe how to make it working.

# get & install mendeley from https://www.mendeley.com/download-mendeley-desktop/

# check version of your mendeley
#  Help > About Mendeley Desktop

# clone repo and build plugin
git clone git@github.com:Mendeley/openoffice-plugin.git
cd openoffice-plugin/
python build.py 1.15.2 false

# add to LibreOffice
#  Tools > Extension Manager > Add...
#   and look for `Mendeley-1.15.2.oxt`

After OpenOffice / LibreOffice restart, you should see new bar. Note, in order for the plugin to work, Mendeley has to be running.

What’s great about this plugin, you can adjust citation style by just a few clicks by clicking on `Choose Citation Style`. There is quite extensive database of predefined citation styles, so adjusting the reference style to your favourite journal will take just a few seconds πŸ™‚
More info about the plugin on github.

Download youtube videos

More weekend oriented tip today;)
I have found downloading videos from youtube super easy with youtube-dl.

# install it
sudo apt-get install youtube-dl

# you can fetch many videos at once
youtube-dl LINK1 [LINK2 LINK3]

This tool provide many useful options, like changing the audio/video compression format and quality. You can read more here.