Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Open-source Dropbox alternative powered by Git (github.com/bazaarlabs)
142 points by ibrahimcesar on Jan 1, 2012 | hide | past | favorite | 57 comments


Some mention of hosting this on GitHub and BitBucket here. I can't imagine Git-As-A-Service providers would be thrilled with this kind of application, it will completely hammer their services compared to the occasional commits and pulls that occur with a code base.

It will probably update their conditions if they don't already preclude it. Or maybe they will embrace it with a premium pricing plan. I certainly wouldn't count on hosting this against BitBucket's free plan.


Ever try committing a 100meg file into git? 100% CPU for 10s of seconds. Game over.

Git is a great idea for this, but in practice it performs terribly.


It gets only worse when you keep adding more binary files. After doing this for while your repo will not work unless you pack it, which will take horrific times. If it will succeed: my git was 'surprisingly' segfaulting when trying to acquire 16Gb of ram at once during pack operation. (I was trying to keep RAW-photos synced, git stopped working when there was about 4Gb of files stored.)

Git is made and optimized for text files, no big surprise that it doesn't cope very well with binaries.


Sure, if you do this the naive way (map files onto the git repo 1:1).

I don't know how this project is doing it, but adding a buffering/sharding layer for large files would alleviate this problem at the cost of worse delta compression (among other things).

That said, I still don't think git is necessarily the best platform to base a Dropbox alternative upon, but the commit resource hogging is very fixable.


It's unclear to me how buffering/sharding could fix this? The bytes still need to get into the repo. That's still expensive and time consuming even if it is chunked or asynchronous.


Look at bup a backup system based on git. They face simular problems. I was at a talk by one of the developers. Very intressting and very technical about how they solve some of the problems, spezally about large files.

https://www.youtube.com/watch?v=u_rOi2OVvwU https://github.com/apenwarr/bup


git-annex is a nice extension for this kind of use case. Metadata are stored in git, and file contents are stored separately.

http://git-annex.branchable.com/


bup uses the git repository format (packfiles) but splits big files into chuck to avoid this problem. Also does de-duplication at the chunk level.

https://github.com/apenwarr/bup/#readme https://github.com/apenwarr/bup/blob/master/DESIGN#L92


A couple of days ago I attended a talk about bup, might be intressting to you. (really technical)

https://www.youtube.com/watch?v=u_rOi2OVvwU


I have this same problem with mercurial. Is there a better alternative for source control of large files?


I hear perforce is the go to solution for people who really need this. Or dedicated media asset management solutions.


Perforce is expensive ($740 per user) but I can't imagine using anything else. Just about every major game studio uses it to the best of my knowledge.


Just be aware that Perforce is a server based system and not really on-par with Git/Mercurial. I'm forced to use it at work and while it works great for storing large binaries (images, movies, etc.) it sucks for daily use, IMHO. At least compared to a modern DVCS.


We recently finished migrating a >6 year old project from perforce to git, splitting out a horrible bloated thirdparty/ directory full of binaries into its own git submodule to eventually be replaced by something sane(r) like Maven or Ivy. I hear subversion is Good Enough for large file dumps, and I'd rather use that through a git interface than its own. p4 and git-p4 are both pretty awful I think after using them. Of course the general response to source control of large files is "don't do that!"


The extension "kbfiles" may do what you want:

http://kiln.stackexchange.com/questions/1873/how-do-you-use-...


You don't want kbfiles - you want to install hg 2.0 and just use the bigfiles extension included with the download. It's based on kbfiles, the kiln folks helped upstream it.


Just curious - why are there so many Dropbox alternative posting lately? Is re-doing an already elegant solution really a top priority for people? I haven't looked at this post at all (so I'm not trying to make any judgement on it), I'm just curious as to why so many people are interested in making Dropbox alternatives lately.


I'd assume the short version is that Dropbox:

1) Isn't free

2) Is accessible to $stateAuthority if they come knocking

3) Makes your data access reliant on a third party who may or may not be up in an emergency or:

4) Could disappear at any time and take your data with them

So basically, the same problems that affect every cloud solution provider ever.


I'd be interested in which `free` you are talking about. Beer or speech (or both)?

Because for me, the prime reason I refuse to use Dropbox[1] is that it's unfree software as per the speech definition. I am not running software on my computer that needs internet access and transmits files when I can not verify what this program does, in addition to my personal policy of reducing my usage of unfree software to an absolute, unavoidable minimum.

[1] I'm studying CS and it's really annoying how ubiquitous Dropbox has become among the student crowds. I stopped counting after about the first three dozen attempts of people trying to get me to use Dropbox.


Mostly as in beer. You have to cough up cash to get a respectable quota.

Personally, I think you're being rather paranoid on the whole free software (as in speech) thing, but hey, whatever works for you :)


18.2GB for free isn't enough?

I guess not when companies like box.net are giving out 50GB for free. =/


I've been looking for an alternative mostly because of #1. It's actually quite expensive for a small team. I have yet to find anything that works as well as DropBox.

I don't agree with #4. All your data is stored locally, it DropBox goes away the data just stops syncing. (You would lose version history, but it's only 30 days so not useful)


Unless you have pack-rat, in which case you have unlimited history: http://www.dropbox.com/help/113


1) You can get up to 18.2GB of storage for free

2) This is a problem with just about any service unless you use something based in another country, but then you play by that governments rules

3) That third party is Amazon S3/EBS/EC2 which, if used correctly, provides a solution thousands of times better than you can with the same budget (redundant datacenters, high availability, etc)

4) If they do disappear, you still have all your data on your computers with Dropbox installed.


I'm a dropbox user, although I like the GP's standpoint. For practical reasons I don't refuse to use it like he does.

Now, help me understand point 1) please.

I assume for now that you're talking about the 'get someone to sign up' bonus? Which would be very shady in my world and is probably part of the reason why the GP gets annoyed by people at his university, trying to sift through the masses and find those that didn't sign up yet and could offer them a slight bonus.

It's spamming your own network.


Or use a $100 AdWords coupon to run a campaign with your affiliate URL to the Dropbox signup page instead of spamming people.


One reason may be that it actually has interesting engineering aspects. The other one may have to do with some sort of self-validation: "Hey, Dropbox is a massive success, can I build something like this?". It's what usually happens to popular services like Twitter, GitHub - they all have had numerous "me too" attempts.


The alternatives try to address the biggest problem of Dropbox: the Dropbox company knows the content of my files. That in turn means that various US agencies can browse and analyze my files without my knowledge, approval and a warrant.


Encryption seems like a much better solution to this problem than reinventing the entire Dropbox wheel.


The problem is to get the simplicity of Dropbox (the filesystem is the UI) into your encryption system.

Sure, you can just put a Truecrypt disk image in a Dropbox folder, but you're missing the granularity of Dropbox (e.g. to sync certain files to my SSD) and that granularity also affords you protection against race conditions (if both machines have the encrypted drive mounted and are writing, sounds like trouble).

Not an insolvable problem by any means, but not one that has yet been solved either.


You could use EncFS. But this then this isn't multi-platform friendly.


Neither is TrueCrypt. I'd love to use it with Dropbox on my Android phone, but that's just not possible. I use EncFS on Dropbox for certain files that I know I will not need to access on my phone. That's an annoying compromise though. I'd much rather encfs the entire thing.


For example https://spideroak.com/ (not affiliated, not a user).

Unfortunately the one time I installed it the user experience and interface was miserable..


I think it's part of a general push by the bleeding edge away from hosted-cloud solutions and towards controlling bits of the stack themselves. dropbox is a good candidate for replacement because storing a few personal documents to sync with your various devices sounds like something you should be doing for yourself, unlike, say, email which is manifestly harder to do right.


Good point.


Just want to add a quick comment about our goals for gitdocs. Obviously this was never intended to be a 'dropbox-killer', in fact at Miso we actually use both. Dropbox for videos, large binary files, business and legal files.

Instead we use gitdocs for storing our "docs": Task lists, wiki, planning, collaborative design, note taking, code snippets et al. And the gitdocs web front-end (http://imgur.com/eaTTY) is optimized for that since it renders wiki pages (formatted markdown/textile), has full code syntax highlighting, file search, revision history and a rich text editor.


Dropbox wins on UX, which is where FOSS fails almost every time. A quick way to know whether such projects will fail: if they mention anything about the technologies used. Making it obvious to use and "just work" is what is needed.

Too bad that is actually the hard part.


For what definition of "fail"?

I think it's pretty obvious that this isn't intended, at least in it's current incarnation, to be a dropbox-killer. It doesn't need to pander to the same audience.

"dares to mention technologies used" is only a killer for a subset of all possible audiences. For example, knowing that Git is hacked together with C and bash doesn't make it any less useful to me (ie, someone who isn't afraid of computers)


Look at Sparkleshare's website and see what you think?


The problem is setting up the Git server. Even if you use Github - and then you have to pay if you don't want your files open to the public - creating the repo, adding the keys, understanding what they have to put in the Address field, etc is still extremely difficult with the knowledge (and patience) of the average person.


For synchronization of your personal files, check out Unison.

http://www.cis.upenn.edu/~bcpierce/unison/


Seconded - I've used Unison to sync across PCs and between different OSs for many many years; it just works


Not really an alternative till it supports clients for all sorts of platforms and mobile devices IMHO


So, Windows support then? And most mobile devices will be able to access the files through the browser, no need for a client to do that, like with Dropbox.

I love the idea and the hackyness over it all. Love it!


Git clients exist for iOS, Android, Windows, and Linux. There's 98% of the computing world right there.


It's neat because it can be used on top of git without any additional effort but it'd be more interesting to see something that leverages git's server-side post-push hook to notify the other clients about changes. gitdocs, depending on the polling interval, either has an increased probability of causing merge conflicts or does a crazy amount of git pulls.


Actually, it uses file system events so the pushes occur instantly. However, the pulls are done with polling.


I don't see any reason they can't add this later. Get something that works first, then make it better.


The real problem with this is that Git is awful at versioning binary / large files.



The readme suggests to use bitbucket.org which I didn't realise had unlimited storage, even on the free plan for up to 5 users?

So if the CPU issue can be resolved for committing large files, could it be used as a backup for your warez etc?


I think Lipsync - https://github.com/philcryer/lipsync - might be better, Git isn't great for lots of large binary files.


A comparable alternative is never going to free for everyone. Someone has to pay the Amazon fees for S3 use. Isn't it true that the few Dropbox users who do pay for it support all the ones who do not?


Hah, I've done something similar to this, except it uses SVN. Cross platform too! But you know what sucked? Performance. SVN is slooowwwww when it came to binary files.

You can check it out here.


This was posted here a few weeks ago, but at the time it was Mac-only. Now it works on Mac, Linux, and Windows. I assume this is the reason for the re-post.


It reminds me again that Dropbox is hard to beat.


I had a same idea right after dropbox but I lost the thought about the clients for all popular OS.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: