Reducing the size of large git repository

The github repository of #NGSchool website has grown to over 5GB. I wanted to reduce the size & simplify this repository, but this task turned out to quite complicated. Instead, I have decided to leave current repo as is (and probably removed it soon) and start new repo for existing version. I could do that, as I don’t care about version earlier than the one I’m currently using. This is short how-to:

  1. Push all changes and remove .git folder
  2. git push origin master
    rm -rI .git
    
  3. Rename existing repo
  4. Settings > Repository name > RENAME

  5. Start new repository using old repo name
  6. Don’t need to create any files as all already exists.

  7. Init your local repo and add new remote
  8. git init
    git remote add origin git@github.com:USER/REPO
    
  9. Commit changes and push
  10. git add --all . && git commit -m "fresh" && git push origin master
    

Doing so, my new repo size is below 1GB, which is much better compared to 5GB previously.

Github push fails due to large files

Lately, I have had lots of problems with pushing large files to github. I am maintaining compilation of materials and software deposited by other people, so cannot control the size of files… and this makes push to fail often.

git push
remote: error: GH001: Large files detected. You may want to try Git Large File Storage - https://git-lfs.github.com.
remote: error: Trace: 6f0f7f66995a394598595375954732db
remote: error: See http://git.io/iEPt8g for more information.
remote: error: File chip_seq/reads/sox2_chip.fastq.gz is 109.69 MB; this exceeds GitHub's file size limit of 100.00 MB

To remove large files from commit, execute

git filter-branch -f --index-filter 'git rm --cached --ignore-unmatch chip_seq/reads/sox2_chip.fastq.gz'
git push

To add large files using git-lfs, execute

# tract by git lfs files larger than 50MB, skipping those in .git folder
find . -type f -size +50M ! -iwholename "*.git*" | rev | cut -f1 -d'/' | rev | xargs git lfs track
# 
git add --all . && git commit -m "final" && git push origin

Make sure that your file are smaller than 2GB, otherwise your push will fail again 😉

Then, to before pull in another machine, make sure to install git-lfs

git lfs install
git pull

Working with large binary files in git

Git is great, there is no doubt about that. Being able to revert any changes and recover lost data is simply priceless. But recently, I have started to be concerned about the size of some of my repositories. Some, especially those containing changing binary files, were really large!!!
You can check the size of your repository by simple command:

git count-objects -vH

Here, git Large File Storage (LSF) comes into action. Below, I’ll describe how to install and mark large binary files, so they are not uploaded as a whole, but only relevant chunks of changed binary file is uploaded.

  1. Installation of git-lfs
  2. # add packagecloud repo
    curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
    
    # install git-lsf
    sudo apt-get install git-lfs 
    
    # end enable it
    git lfs install
    
  3. Marking and commiting binary file
  4. # mark large binary file
    git lfs track some.file
    
    # add, commit & push changes
    git add some.file
    git commit -m "some.file as LSF"
    git push origin master