> you caan get most of the useful git-like time-travel semantics (modulo schema ...

gen220 · on June 16, 2020

Heyo, thank you for responding! I totally agree that the commit, PR, merge semantics are powerful and valuable, and you can’t get that easily with existing databases.

I guess in my imaginary perfect world, you don’t need to use commit, PR, merge semantics to make normal edits. You can use existing bitemporal/transaction history ideas in online updates, appends, and deletes, and then you have a higher-level abstraction for offline(-only?) branching and merging.

I guess what I’m saying is that I don’t totally buy the idea that you need git “all the way down”, especially if it gets in the way of performance. But maybe I’m just used to “the old way”, and I’ll cringe reading this in 20 years. :)

Major kudos for this project, it’s super impressive. I’m excited to see how it grows!

jacques_chester · on June 16, 2020

I have a hunch that some form of multi-temporalism, or perhaps more fancily some kind multi-lattice-ism, as the secret model underlying all of these.

But putting that probably-wrong-because-I'm-a-dilettante wankery to one side: a serious advantage of bitemporalism is that most layfolk understand how clocks and calendars work. This is less true of git commits.

gen220 · on June 16, 2020

I think I agree with you. Git uses logical ordering rather than temporal ordering, and their concept of a primary key is rather nebulous.

We've been able to train up "regular folk" on bitemporalism at our company. The distinction between valid and transaction history takes a little while, but it sticks. Git has two or three such conceptual sticking points (staging vs committing, branching & merging, remote vs local).

zachmu · on June 16, 2020

Hey, thanks for the kudos!

FWIW we are absolutely working on modes of using the product that "just work" like a normal database without any explicit commits / merges. In that case commits would happen automatically (maybe one for each write transaction?) and you'd get all the history "for free." We aren't sure how useful that is, though. Definitely looking for customers interested in this use case to guide us on their needs.

gen220 · on June 16, 2020

I work at an insurance company, where it's extremely important that we have a way of persisting (1) what happened when [transaction history], and (2) how our opinions of what happened when changed over time [valid history]. Basically, you can think of (2) as a materialized view of (1). But it all lives in one database.

There is much business value to be extracted out of this idea, because most companies (1) do not have the ability to cheaply (i.e. ad-hoc) introspect this way (2) would gain a lot of value from this ability. My hunch is that most companies out there are interested in answering that set of questions (referred to as bitemporal data, in the literature).

This is the space that TimescaleDB seems to be competing in, although we don't use them (we use our own extension of Postgres, that's only a few hundred lines of PL-pgSQL, plus some helper libraries in Go/Python).

In my perspective, I think git-style semantics would be very powerful as a layer above an existing bitemporal system. We've implemented similar systems, where records get a semver-style version in their bitemporal key, so we can see how different producer logic versions minted different records over time. It's be really cool to have branching versions, and to be able to rebase them -- this is something our system doesn't currently support.

Anyways, hope our data point is useful for you all. Happy to share more if you have questions about anything.

LukeEF · on June 19, 2020

Thanks for this - really interesting (Luke from Terminus here). We have engaged plenty with some reinsurance folk around some query use cases (recursive stuff to better understand risk), but that was before we did a lot of the git-like stuff. Can see how this might be powerful on a historical view in insurance.

nyanpasu64 · on June 16, 2020

Have you heard of patch-theory-based VCS systems (Pijul is unstable, I heard Darcs is slower)? Instead of a chain of independent states (commits), each state is a set of patches (some of which depend on others). Pijul stores an internal structure that's more complex than "a series of chars/lines".

This eliminates complex rebasing topologies, and Git not recognizing that old and rebased commits are the same. But I'm not sure if it works well in practice. And it doesn't extend to SQL data sets yet.

I don't know if you're interested in this or not, but I just wanted to mention it.

xiaodai · on June 16, 2020

Dolt doesn't work for large data right?

zachmu · on June 16, 2020

Depends what you mean by "work" and "large," I guess. We have repos in the 250 GB range that definitely push the limits of functionality, but everything still works (pull, push, merge, clone, etc., and queries return the right results, if slowly).

xiaodai · on June 17, 2020

"Slow" is the issue. I think it's going to be too slow for ppl's liking.