Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Ever try committing a 100meg file into git? 100% CPU for 10s of seconds. Game over.

Git is a great idea for this, but in practice it performs terribly.



It gets only worse when you keep adding more binary files. After doing this for while your repo will not work unless you pack it, which will take horrific times. If it will succeed: my git was 'surprisingly' segfaulting when trying to acquire 16Gb of ram at once during pack operation. (I was trying to keep RAW-photos synced, git stopped working when there was about 4Gb of files stored.)

Git is made and optimized for text files, no big surprise that it doesn't cope very well with binaries.


Sure, if you do this the naive way (map files onto the git repo 1:1).

I don't know how this project is doing it, but adding a buffering/sharding layer for large files would alleviate this problem at the cost of worse delta compression (among other things).

That said, I still don't think git is necessarily the best platform to base a Dropbox alternative upon, but the commit resource hogging is very fixable.


It's unclear to me how buffering/sharding could fix this? The bytes still need to get into the repo. That's still expensive and time consuming even if it is chunked or asynchronous.


Look at bup a backup system based on git. They face simular problems. I was at a talk by one of the developers. Very intressting and very technical about how they solve some of the problems, spezally about large files.

https://www.youtube.com/watch?v=u_rOi2OVvwU https://github.com/apenwarr/bup


git-annex is a nice extension for this kind of use case. Metadata are stored in git, and file contents are stored separately.

http://git-annex.branchable.com/


bup uses the git repository format (packfiles) but splits big files into chuck to avoid this problem. Also does de-duplication at the chunk level.

https://github.com/apenwarr/bup/#readme https://github.com/apenwarr/bup/blob/master/DESIGN#L92


A couple of days ago I attended a talk about bup, might be intressting to you. (really technical)

https://www.youtube.com/watch?v=u_rOi2OVvwU


I have this same problem with mercurial. Is there a better alternative for source control of large files?


I hear perforce is the go to solution for people who really need this. Or dedicated media asset management solutions.


Perforce is expensive ($740 per user) but I can't imagine using anything else. Just about every major game studio uses it to the best of my knowledge.


Just be aware that Perforce is a server based system and not really on-par with Git/Mercurial. I'm forced to use it at work and while it works great for storing large binaries (images, movies, etc.) it sucks for daily use, IMHO. At least compared to a modern DVCS.


We recently finished migrating a >6 year old project from perforce to git, splitting out a horrible bloated thirdparty/ directory full of binaries into its own git submodule to eventually be replaced by something sane(r) like Maven or Ivy. I hear subversion is Good Enough for large file dumps, and I'd rather use that through a git interface than its own. p4 and git-p4 are both pretty awful I think after using them. Of course the general response to source control of large files is "don't do that!"


The extension "kbfiles" may do what you want:

http://kiln.stackexchange.com/questions/1873/how-do-you-use-...


You don't want kbfiles - you want to install hg 2.0 and just use the bigfiles extension included with the download. It's based on kbfiles, the kiln folks helped upstream it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: