Investigate & reduce the size of Drupal sqlite3 database

Today while performing regular Drupal update and backup, I’ve realised Drupal sqlite3 database sites/default/files/.ht.sqliteis over 440 Mb! I found it peculiar, as our website isn’t storing that much information and the size grew significantly since last time I’ve looked it up couple of months ago. I’ve decided to investigate what’s eating up so much DB space.

Investigate what’s eating up space within your sqlite3 db

There is super useful program called sqlite3_analyzer. This program analyses your database file and reports what’s actually taking your disk space. You can download it from here (download precompiled sqlite3-tools). Note, under Linux you’ll likely need to install 32bit-libraries ie. under Ubuntu/Debian execute

sudo apt install libc6-i386 lib32stdc++6 lib32gcc1 lib32ncurses5 lib32z1  

Once you have the program, simply execute sqlite3_analyzer DB_NAME | less and the program will produce detailed report about your DB space consumption. For me it looked like that:

Can you spot how much space the actual data is taking? Yes, only 4.7% (20k pages). And what’s taking most of the space? Freelist.

Quick googling taught me, that freelist is simply empty space left after deletes or data moving. You may ask, why isn’t it cleaned up later? You see, having entire database with all tables in one file is very handy, but troublesome. Every time given table is edited, the space that is freed isn’t used, but rather marked as freelist. And those regions get cleaned up only when vacuumcommand is issued. This should happen automatically from time-to-time if auto vacuum is enabled. I couldn’t know why isn’t it working by default with Drupal…

Reduce the size of sqlite3 DB file

Nevertheless, I’ve decided to perform vacuummanually. Of course I’ve backed-up the db, just in case (you should always do that!). But sqlite3 .ht.sqlite vacuum returned Error: no such collation sequence: NOCASE_UTF8. At this point, I though maybe simple DB dump and recovery would solve my problem – after all that’s more or less what happens under the hood when you perform vacuum.

sqlite3 .ht.sqlite.bck .dump > db.sql
sqlite3 .ht.sqlite < db.sql

DB recovered after dump was indeed smaller (16 Mb), but it was missing some tables (sqlite3 .ht.sqlite .tables). Interestingly, when I’ve investigated the schema of the missing tables (sqlite3 .ht.sqlite.bck .schema block_content), I’ve realised that all of those contain NOCASE_UTF8 in table schema. I found that really peculiar! After further googling and rather lengthy reading, I’ve realised NOCASE_UTF8 is invalid in sqlite3, but it can be replaced simply with NOCASE.

Replace DB schema directly on sqlite3 db

In the brave (and firstly stupid I though) attempt, I’ve decided just to replace wrong statements directly on the DB file using sed (sed 's/NOCASE_UTF8/NOCASE/g' .ht.sqlite.bck > .ht.sqlite). As expected, the database file got corrupted. This is because all tables location are stored internally in the same file, so truncating some text from the DB file isn’t the wisest idea as I’ve expected. Then, I’ve decided to replace NOCASE_UTF8, but keeping the same size of the statement after replacement using white spaces. To my surprise it worked & allowed me to reduce the size of DB from 440 to 30 Mb 🙂

sed 's/NOCASE_UTF8/NOCASE     /g' .ht.sqlite.bck > .ht.sqlite
sqlite3 .ht.sqlite vacuum
-rw-rw-r--  1 lpryszcz www-data  32638976 Feb 28 13:57 .ht.sqlite
-rw-rw-r-- 1 lpryszcz www-data 451850240 Feb 28 13:45 .ht.sqlite.bck

Finally, to make sure, that there is no data missing between old and new, reduced DB, you can use sqldiff .ht.sqlite .ht.sqlite.bck. It’ll simply report all SQL command that will transform one DB into another and nothing if DB contain identical information.

Hopefully replacing NOCASE_UTF8 with NOCASE will allow auto vacuum to proceed as expected on the Drupal DB in the future!

EDIT: The db failed after update to drupal v8.7.6

Lately, I’ve updated drupal and discovered this morning the drupal db file to be corrupted Error: no such collation sequence: NOCASE_UTF8. This is because in the latest update, drupal rebuilt table definitions and NOCASE_UTF8 came back which causes sqlite vacuum crashing again. The solution is very simple, just recover your db from backup and remove replace NOCASE_UTF8 with NOCASE .

sed -i.bck 's/NOCASE_UTF8/NOCASE     /g' .ht.sqlite

Using docker for application development

I found Docker super useful, but going through a manual is quite time consuming. Here, very stripped manual to create your first image and push it online 🙂

# install docker
wget -qO- https://get.docker.com/ | sh
 
# add your user to docker group
sudo usermod -aG docker $USER
 
# check if it's working
docker run docker/whalesay cowsay "hello world!"
 
# create an account on https://hub.docker.com
# and login
docker login -u $USER --email=EMAIL
 
# run image
docker run -it ubuntu
 
# make some changes ie. create user, install needed software etc
 
# finally open new terminal & commit changes (SESSIONID=HOSTNAME)
docker commit SESSIONID $USER/image:version
 
# mount local directory `pwd`/test as /test in read/write mode
docker run -it -v `pwd`/test:/test:rw $USER/image:version some command with arguments
 
# push image
docker push $USER/image:version

From now, you can get your image from any other machine connected to Internet by executing:

docker run -it $USER/image:version
# ie. redundans image
docker run -it -w /root/src/redundans lpryszcz/redundans:v0.11b ./redundans.py -v -i test/{600,5000}_{1,2}.fq.gz -f test/contigs.fa -o test/run1
 
# you can create alias latest, then version can be skipped on running
docker tag lpryszcz/redundans:v0.11b lpryszcz/redundans:latest
docker push lpryszcz/redundans:latest
 
docker run -it lpryszcz/redundans

You can add info about your repository at https://hub.docker.com/r/$USER/image/

Installing new version of Python without root

UPDATE 2019: If you have access to sudo, I’d definitely recommend installing python through system manager and using venv i.e.

# install python3.7 
sudo apt install python3.7-venv python3.7-dev
# create virtual environment using python3.7
python3.7 -m venv py37
# activate environment
source py37/bin/activate
# check version
python --version # python -c 'import sys; print(sys.version_info)'
# install new package
pip install numpy
# leave virtual environment
deactivate

Some time ago I was recommending to use Python virtual environment to install local version of Python packages. However this will not solve the issue of outdated version Python in the server your are working in. Here, pythonbrew may be help for you.

# install pythonbrew to ~/.pythonbrew
curl -kL http://xrl.us/pythonbrewinstall | bash

# add to ~/.bashrc to automatically activate pythonbrew
[[ -s "$HOME/.pythonbrew/etc/bashrc" ]] &amp;amp;amp;amp;amp;amp;amp;amp;&amp;amp;amp;amp;amp;amp;amp;amp; source "$HOME/.pythonbrew/etc/bashrc"                                                         

# open new terminal tab (Ctrl+Shift+T) or window (Ctrl+Shift+N)

# install python 2.7.10
pythonbrew install 2.7.10

# and enable the new version
pythonbrew switch 2.7.10

# from now on, you can enjoy the version of your choice and install dependencies
which python
#/home/.../.pythonbrew/pythons/Python-2.7.10/bin/python
python --version
#Python 2.7.10
which pip
#/home/.../.pythonbrew/pythons/Python-2.7.10/bin/pip

Using pybedtools under Python 2.6

I encountered an import error from collections import OrderedDict, while using pybedtools under Python 2.6. It took me some time to find a workaround… I posted it below.

# install dependencies and pybedtools
pip install cython ordereddict
pip install pybedtools

# edit file ... lib/python2.6/site-packages/pybedtools/contrib/venn_maker.py
# comment import from collections and add try-except
#from collections import OrderedDict
try:                                 
    from collections import OrderedDict
except ImportError:
    from ordereddict import OrderedDict

Virtual environment (venv) in Python

Working on the machines with no root access is sometimes annoying, especially if you depend on multiple Python packages and your distro is somehow outdated… You may find Python virtual environment (venv) very useful.
First, you need to create venv directory structure (this is done only once):

mkdir -p ~/src/venv
cd ~/src/venv
virtualenv py27

Then you can open new BASH terminal and activate your venv by:

source ~/src/venv/py27/bin/activate

After that, you can install / upgrade any packages using pip / easy_install (even including PIP ) ie.

pip install --upgrade pip
pip install --upgrade scipy

Insipired by python-guide.

Apache2 reading from sshfs share

Today, I have encountered problems trying to read data from sshfs share in apache2. I was getting 403 Forbidden error. It turned out you need to enable other_user in sshfs, so other users than the one mounting the share can access the data, as apache2 is using www-data user.

# uncomment last line of /etc/fuse.conf
# Allow non-root users to specify the allow_other or allow_root mount options.
user_allow_other
 
# enable other_user and read access by non-root
sudo chmod a+r /etc/fuse.conf
 
# remount
sudo umount DESTINATION
sshfs -o allow_other SHARE DESTINATION

Inspired by serverfault and unix.stackexchange.

Easy install to specific folder on machines without root

I’m using more & more remote machines in which I don’t have admin rights, so usual sudo easy_install isn’t working. But there is simple work around by installing Python packages to non-default directory.

# prepare new Python packages directory
mkdir -p ~/src/python/site-packages

# edit ~/.bashrc so Python knows where to look for packages
export PYTHONPATH=$PYTHONPATH:~/src:~/src/python:~/src/python/site-packages

# install packages
easy_install --install-dir /home/USER/src/python/site-packages PACKAGE_NAME

# ie for pysam installation you should do
easy_install --install-dir /home/lpryszcz/src/python/site-packages cython pyrex pysam