Convert xls table into abstract book PDF

I had to generate Abstract book for #NGSchool2016 (). I had spreadsheet generated by Google Forms with all necessary information. I could copy-paste all entries and format it later on, but I found LaTeX more robust for the task.
As I had already LaTeX template, the only missing part was conversion of .xls to .tex. Thus I have written simple script,, that generate .tex file based on table from .xls file.
This script, among many other things, convert utf into LaTeX escape characters. depends on xlrd and utf8tolatex (from pylatexenc/latexencode, but this is given as single file)

# install dependencies
sudo apt-get install python-xlrd
# generate tex

# generate pdf

Output pdf.

Github push fails due to large files

Lately, I have had lots of problems with pushing large files to github. I am maintaining compilation of materials and software deposited by other people, so cannot control the size of files… and this makes push to fail often.

git push
remote: error: GH001: Large files detected. You may want to try Git Large File Storage -
remote: error: Trace: 6f0f7f66995a394598595375954732db
remote: error: See for more information.
remote: error: File chip_seq/reads/sox2_chip.fastq.gz is 109.69 MB; this exceeds GitHub's file size limit of 100.00 MB

To remove large files from commit, execute

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch chip_seq/reads/sox2_chip.fastq.gz'
git push

To add large files using git-lfs, execute

# tract by git lfs files larger than 50MB, skipping those in .git folder
find . -type f -size +50M ! -iwholename "*.git*" | rev | cut -f1 -d'/' | rev | xargs git lfs track
git add --all . && git commit -m "final" && git push origin

Make sure that your file are smaller than 2GB, otherwise your push will fail again 😉

Then, to before pull in another machine, make sure to install git-lfs

git lfs install
git pull

Stream audio & video from webcam using VLC

Yesterday, I’ve posted about streaming webcam image to www using motion. This solution, although very simple, has many limitations, lack of sound, usage of high bandwidth and low image quality, just to mention a few. In a way, motion stream is just a set of jpeg files.
In order to solve all of these, I have spend quite some time playing with VLC, an open source cross-platform multimedia player, that is able to transcode and stream audio & video.
Streaming can be started from graphical interface, just go to:

Media >> Stream… >> Capture Device, select your devices, Add HTML destination (ie. :8080/webcam.ogg), select Video-Theora + Vorbis (OGG) profile & press Stream.

You stream will be available at: http://localhost:8081/webcam.ogg

But normally, using command line is preferred under Linux:

vlc v4l2:// :input-slave=alsa:// :v4l2-standard=1 :v4l2-dev=/dev/video0 :v4l2-width=1280 :v4l2-height=720 :sout="#transcode{vcodec=theo,vb=2000,acodec=vorb,ab=128,channels=2,samplerate=44100}:http{dst=:8081/webcam.ogg}" -I dummy

Initially, I had problem with streaming sound along with video. Adding, `:input-slave=alsa:// :v4l2-standard=1` solved this. You can try another values for `:v4l2-standard` ie. 0, 1 or 2, depending which microphone you want to use.

Above command will stream HD video (1280×720) in .ogg format (natively suported by most browsers) @ ~2Mbps (2000kbps). If you have slower connection, you can change `vb=2000` to `vb=1000` (~1Mbps) and play with lower resolutions. You can check available resolutions of your camera by:

lsusb -v | egrep -B10 'Width|Height'

This stream, however, is available to everyone. To limit it only to localhost, you can use iptables:

sudo iptables -A INPUT -p tcp -s localhost --dport 8081 -j ACCEPT && sudo iptables -A INPUT -p tcp --dport 8081 -j DROP && vlc v4l2:// :input-slave=alsa:// :v4l2-standard=1 :v4l2-dev=/dev/video0 :v4l2-width=1280 :v4l2-height=720 :sout="#transcode{vcodec=theo,vb=2000,acodec=vorb,ab=128,channels=2,samplerate=44100}:http{dst=:8081/webcam.ogg}" -I dummy

Now, you can create apache2 proxy, similarly to previous post:

# install apache2-utils
sudo apt install apache2-utils
# setup new user & passwd
sudo htpasswd -c /etc/apache2/.htpasswd webcam

# configure apache2 - add to your VirtualHost config
    # webcam
    <Location "/webcam.ogg">
        ProxyPass http://localhost:8081/webcam.ogg
        ProxyPassReverse http://localhost:8081/webcam.ogg
        # htpasswd
        AuthType Basic
        AuthName "Restricted Content"
        AuthUserFile /etc/apache2/.htpasswd
        Require valid-user

Streaming image from webcam through www

Willing to stream image from your webcam through Internet? Nothing easier with Ubuntu!

# install
sudo apt-get install motion

# create config file
mkdir ~/.motion && gedit ~/.motion/motion.conf

# define the port and motion settings
webcam_port 8081
webcam_localhost on
# increase maxrate & quality
webcam_maxrate 30
webcam_quality 90
# slow down the stream to 1 frame per second if no motion
webcam_motion on

# run motion

You can find the stream at http://localhost:8081/.

If you wish to stream it publicly, I recommend at least basic HTTP based authentication.

# install apache2-utils
sudo apt install apache2-utils

# setup new user & passwd
sudo htpasswd -c /etc/apache2/.htpasswd webcam

# configure apache2 - add to your VirtualHost config
    # webcam
    <Location "/cam">
        # proxy
        ProxyPass http://localhost:8081/
        ProxyPassReverse http://localhost:8081/        
        # htpasswd
        AuthType Basic
        AuthName "Restricted Content"
        AuthUserFile /etc/apache2/.htpasswd
        Require valid-user

Now, image from your webcam will be accessible at http://YOURDOMAIN.COM/cam

Finally, you can configure motion to run only when you are away.

Inspired by gist.

Malformed column reporting and joining in BASH by paste or awk

I’ve spent some hours trying to figure out, why the heck my scripts using awk and paste are returning malformed output. Simply, lines were wrongly pasted together, some columns were missing, while some were malformed… and in case of awk, trying to print columns in unsorted order (ie. column #3 before column #2 awk '{print $3,$2}') was producing malformed output.
After some time, I have realised it was due to windows-like new line escape \r\n, instead of standard Linux-like \n (of course I got this file from third party using Windows…).

Below, you can find more details.

# first, let's create dummy files containing 4 lines and 5 columns, each line ending with \r\n
python -c "with open('wrong.tsv','w') as out: out.write(''.join('line%s\t%s\r\n'%(i, '\t'.join('column%s'%j for j in range(1,5))) for i in range(1,4)))"
# and ending just with \n
python -c "with open('correct.tsv','w') as out: out.write(''.join('line%s\t%s\n'%(i, '\t'.join('column%s'%j for j in range(1,5))) for i in range(1,4)))"

# now let's paste wrong and correct files
paste wrong.tsv wrong.tsv
line1	line1n1	column1	column2	column3	column4
line2	line2n1	column1	column2	column3	column4
line3	line3n1	column1	column2	column3	column4

paste correct.tsv correct.tsv
line1	column1	column2	column3	column4	line1	column1	column2	column3	column4
line2	column1	column2	column3	column4	line2	column1	column2	column3	column4
line3	column1	column2	column3	column4	line3	column1	column2	column3	column4

# can you see the difference?

Simply, \r is interpreted as return to the beginning of the line in Unix, thus pasting lines containing such character will fail.
In order to convert files containing \r\n into Unix style \n, simply execute:

# replaces file and creates backup: inputfile.bak
sed -i.bak 's/\r$//' inputfile

# creates outputfile with correct formatting
tr -d '\r' < inputfile > outputfile

You can read more on new-line escape characters at Wikipedia.

Running Jupyter as public service

Some time ago, I’ve written about setting up IPython as a public service. Today, I’ll write about setting up Jupyter, IPython descendant, that beside Python supports tons of other languages and frameworks.

Jupyter notebook will be running in separate user, so your personal files are safe, but not as system service. Therefore, you will need to restart it upon system reboot. I recommend running it in SCREEN session, so you can easily login into the server and check the Jupyter state.

  1. Install & setup Jupyter
  2. #
    sudo apt-get install build-essential python-dev
    sudo pip install jupyter
    # create new user
    sudo adduser jupyter
    # login as new user
    su jupyter
    # make sure to add `unset XDG_RUNTIME_DIR` to ~/.bashrc
    # otherwise you'll encounter: OSError: [Errno 13] Permission denied: '/run/user/1003/jupyter'
    echo 'unset XDG_RUNTIME_DIR' >> ~/.bashrc
    source ~/.bashrc
    # generate ssl certificates
    mkdir ~/.ssl
    openssl req -x509 -nodes -days 999 -newkey rsa:1024 -keyout ~/.ssl/mykey.key -out ~/.ssl/mycert.pem
    # generate config
    jupyter notebook --generate-config
    # generate pass and checksum
    ipython -c "from IPython.lib import passwd; passwd()"
    # enter your password twice, save it and copy password hash
    ## Out[1]: 'sha1:[your hashed password here]'
    # add to ~/.jupyter/
    c.NotebookApp.ip = '*'
    c.NotebookApp.open_browser = False
    c.NotebookApp.port = 8881
    c.NotebookApp.password = u'sha1:[your hashed password here]'
    c.NotebookApp.certfile = u'/home/jupyter/.ssl/mycert.pem'
    c.NotebookApp.keyfile = u'/home/jupyter/.ssl/mykey.key'
    # create some directory for notebook files ie. ~/Public/jupyter
    mkdir -p ~/Public/jupyter && cd ~/Public/jupyter
    # start notebook server
    jupyter notebook
  3. Add kernels
  4. You can add multiple kernels to Jupyter. Here I’ll cover installation of some:

    • Python
    • sudo pip install ipykernel
      # if you wish to use matplotlib, make sure to add to 
      # ~/.ipython/profile_default/
      c.InteractiveShellApp.matplotlib = 'inline'
    • BASH kernel
    • sudo pip install bash_kernel
      sudo python -m bash_kernel.install
    • Perl
    • This didn’t worked for me:/

      sudo cpan Devel::IPerl
    • IRkernel
    • Follow this tutorial.

    • Haskell
    • sudo apt-get install cabal-install
      git clone
      cd IHaskell

Then, just navigate to https://YOURDOMAIN.COM:8881/, accept self-signed certificate and enjoy!
Alternatively, you can obtain certificate from Let’s encrypt.

Using existing domain encryption aka Apache proxy
If your domain is already HTTPS, you may consider setting up Jupyter on localhost and redirect all incoming traffic (already encrypted) to particular port on localhost (as suggested by @shebang).

# enable Apache mods
sudo a2enmod proxy proxy_http proxy_wstunnel && sudo service apache2 restart

# add to your Apache config
    <Location "/jupyter" >
        ProxyPass http://localhost:8881/jupyter
        ProxyPassReverse http://localhost:8881/jupyter
    <Location "/jupyter/api/kernels/" >
        ProxyPass        ws://localhost:8881/jupyter/api/kernels/
        ProxyPassReverse ws://localhost:8881/jupyter/api/kernels/
    <Location "/jupyter/api/kernels/">
        ProxyPass        ws://localhost:8881/jupyter/api/kernels/
        ProxyPassReverse ws://localhost:8881/jupyter/api/kernels/

# update you Jupyter config (~/.jupyter/
c.NotebookApp.ip = 'localhost'
c.NotebookApp.open_browser = False
c.NotebookApp.port = 8881
c.NotebookApp.base_url = '/jupyter'
c.NotebookApp.password = u'sha1:[your hashed password here]'
c.NotebookApp.allow_origin = '*'

Note, it’s crucial to add Apache proxy for kernels (/jupyter/api/kernels/), otherwise you won’t be able to use terminals due to failed: Error during WebSocket handshake: Unexpected response code: 400 error.

Enable HTTPS for your domains in 5 minutes & for free!

For a while, I’ve been thinking about encryption domains, like this one. But cost & complications associated with enabling SSL encryption prohibited me to do so…
Today, I’ve realised, Let’s encrypt, new certificate authority, that is completely free, automated and open, makes SSL encryption super easy!
Try it yourself (this if for Ubuntu 14.04 & Apache, for another system configuration check

sudo apt-get install git

sudo git clone /opt/letsencrypt
cd /opt/letsencrypt
sudo ./letsencrypt-auto --apache -d DOMAIN1 -d DOMAIN2

# setup weekly cron autorenewal on Monday at 2:30
sudo crontab -e
# and paste `30 2 * * 1 /opt/letsencrypt/letsencrypt-auto renew >> /var/log/le-renew.log`

If you wish to redirect all traffic domain through HTTPS, do following:

# enable mod_rewrite engine in apache2
sudo a2enmod rewrite

# add to your apache conf file
    # redirect to HTTPS
    RewriteEngine on
    RewriteCond %{HTTPS} off [OR]
    RewriteCond %{HTTP_HOST} ^YOUR_DOMAIN\.COM*
    RewriteRule ^(.*)$ https://YOUR_DOMAIN.COM/$1 [L,R=301]

# reload apache2 configuration
sudo service apache2 reload


Inspired by digitalocean.
Thanks to @sheebang for underlining the importance of renewing the certificates!

Working with large binary files in git

Git is great, there is no doubt about that. Being able to revert any changes and recover lost data is simply priceless. But recently, I have started to be concerned about the size of some of my repositories. Some, especially those containing changing binary files, were really large!!!
You can check the size of your repository by simple command:

git count-objects -vH

Here, git Large File Storage (LSF) comes into action. Below, I’ll describe how to install and mark large binary files, so they are not uploaded as a whole, but only relevant chunks of changed binary file is uploaded.

  1. Installation of git-lfs
  2. # add packagecloud repo
    curl -s | sudo bash
    # install git-lsf
    sudo apt-get install git-lfs 
    # end enable it
    git lfs install
  3. Marking and commiting binary file
  4. # mark large binary file
    git lfs track some.file
    # add, commit & push changes
    git add some.file
    git commit -m "some.file as LSF"
    git push origin master