I think I've written about it here before, but I imported ≈1 TB of logs into Duc...

marcosdumay · on May 27, 2024

> And by the time you have ... the Big Data Solution™ probably have a lower TCO...

I doubt it. The common Big Data Solutions manage to have a very high TCO, where the least relevant share is spent on hardware and software. Most of its cost comes from reliability engineering and UI issues (because managing that "proper ACL" that doesn't fit your business is a hell of a problem that nobody will get right).

yourapostasy · on May 28, 2024

> ...managing that "proper ACL" that doesn't fit your business is a hell of a problem that nobody will get right...

I'm not sure there is a way to get this right unless there is a programmatic integration into the org chart, and ability to describe and parse in a declarative language the organizational rules of who has access to what, when, under what auth, etc. It has otherwise been for me an exercise in watching massive amounts of toil manually interpreting between the SOT of the org chart and all the other applications mediated by many manual approval policies and procedures. And at every client I've posed this to, I've always been denied that programmatic access for integration.

A lot of sites try to avoid this by designing ACL's around certain activity or data domains because those are more stable than organizations, but this breaks down when you get to the fine-grained levels of the ACL's so we get capped benefits from this approach.

I'd love to hear how others solve this in large (10K+ staff) organizations that frequently change around teams.

riku_iki · on May 27, 2024

you probably didn't do joins for example on your dataset, because DuckDB is OOMing on them if they don't fit memory.