supercoco9's comments

supercoco9 · 2025-11-04T10:36:57 1762252617

I don't think it would. The room should be called "Cloud Native Databases", as it was called last year. This is the CFP they issued on the FOSDEM mailing list

---

We're excited to announce the Databases devroom at FOSDEM 2026. The purpose of this devroom is to discuss developing databases that are designed to take advantage of the cloud native architecture in order to meet the demands of modern applications.

Instead of focusing on a single technology or community, our goal is to bring open source database communities and developers together to share their experience, learn from their work, and foster collaboration on challenges ahead.

Suggested Topics - Evolution of new database architecture for cloud native environments (e.g., to address scalability, elasticity, reliability) - Integrating AI/ML workloads with distributed databases - How to ease/streamline database migration (especially from legacy databases) - Achieving database upgrades without downtime - How to benchmark and tune distributed databases - Query optimization for cloud native databases - Tools/methodologies for monitoring, observability, optimizations, testing, etc. - Working with streaming data - Stories from end-users (both successes and failures) - Security and compliance for cloud native databases - State of open-source licenses and database communities - Emerging trends/use cases

These are just suggestions and we encourage you to be creative with your proposals! You can also view accepted talks from the 2025 devroom at https://archive.fosdem.org/2025/schedule/track/databases/.

---

I don't think the traditional talks that were scheduled at the PostgreSQL or MySQL and Friends DevRooms would be welcome there.

supercoco9 · 2025-07-17T09:16:19 1752743779

QuestDB. It is built for finance workloads (can be also used for other timeseries data, like energy, or aerospace, but has heavy optimizations for common finance data patterns), it is very performant, it has been in used for years at large finance entities, and it is Apache 2.0.

Full disclosure: I am a Developer Advocate at QuestDB.

The Open Source edition does not limit any commercial use or the size of the machine you can install, as per the Apache 2.0 license terms.

If you want more enterprise-y options, like single sign on, or RBAC, there is an Enterprise edition. But Open Source is as performant as the Enterprise version. Enterprise offers also things like replication and TLS on all endpoints, which can be somehow replicated in Open Source with manual sharding or proxies.

AUnterrainer · 2025-07-18T12:08:22 1752840502

QuestDB is just a DB. What people get wrong about KDB is that it's so much more. It's a programming language with DB capabilities. You can use for real time streaming, in memory DB and on disk DB. You can build your entire analytics on top of that. There's nothing else out there that let's you build and entire framework/platform with just one single tech stack.

antmarch · 2025-07-18T02:08:13 1752804493

QuestDB does not even support nanosecond, not quite suitable for financial funds

supercoco9 · on Feb 14, 2025

Hi! I am the author of that post :)

Exactly my thinking. It is alpha, and I am sure it will be vastly improved, yet...

supercoco9 · on Feb 8, 2024

By the way, I presented this repository at the FOSDEM conference just a few days ago. If you want to have more context about what it does, or see a walk-though of the different components, the video is available at https://fosdem.org/2024/schedule/event/fosdem-2024-2871-inge...

supercoco9 · on Nov 23, 2023

Thank you! (original article writer here)

DuckDB is awesome. A couple of comments here. First of all, this is totally my fault, as I didn't explain it properly.

I am trying to simulate performance for streaming ingestion, which is the typical use case for QuestDB, Clickhouse, and Timescale. The three of them can do batch processing as well, but they shine when data is coming at high throughput in real time. So, while the data is presented on CSV (for compatibility reasons), I am reading line after line, and sending the data from a python script using the streaming-ingestion API exposed by those databases.

I guess the equivalent in DuckDB would be writing data via INSERTS in batches of 10K records, which I am sure would still be very performant!

The three databases on the article have more efficient methods for ingesting batch data (in QuestDB's case, the COPY keyword, or even importing directly from a Pandas dataframe using our Python client, would be faster than ingesting streaming data). I know Clickhouse and Timescale can also efficiently ingest CSV data way faster than sending streaming inserts.

But that's not the typical use case, or the point of the article, as removing duplicates on batch is way easier than on streaming. I should have made that clearer and will probably update it, so thank you for your feedback.

Other than that, I ran the batch experiment you mention on the box I used from the article (had already done it in the past actually) and the performance I am getting is 37 seconds, which is slower than your numbers. The reason here is that we are using a cloud instance using an EBS drive, and those are slower than a local SSD on your laptop.

You can use local drives on AWS and other cloud providers, but those are way more expensive and have no persistence or snapshots, so not ideal for a database (we use them sometimes for large one-off imports to store the original CSV and speeding up reads while writing to the EBS drive).

Actually one of the claims in DuckDB's entourage is that "big data is dead". With the power in a developer's laptop today you can do things faster than with cloud at unprecedented scale. DuckDB is designed to run locally on a data scientist machine, rather than on a remote server (of course you have Motherduck if you want to do remote, but you are now adding latency and a proprietary layer on top).

Once again thank you for your feedback!

supercoco9 · on Nov 22, 2023

Hi. Sorry if my query offended you.

I basically executed literally what Clickhouse recommends at their guides for deduplication https://clickhouse.com/docs/en/guides/developer/deduplicatio....

Of course you can also materialize with aggregations or just use a group by, or even force optimize of the table. But my point is that you don't really get exactly once guarantees. Whoever is querying that table needs to be aware than a `SELECT * FROM tb` might contain duplicates and needs to adapt their queries accordingly.

higeorge13 · on Nov 24, 2023

I believe there are 0 people working with CH and ReplacingMergeTree and don’t know that they have to use final or group by in order to get non duplicate data. It’s mentioned in the table engine page, their knowledge base everywhere.

Also i have not recently seen anyone not recommending it. It might have been the case a few years ago, but performance of final has improved and it’s faster than alternatives. People suggest to use MergeTrees obviously but if no alternative, replacing is the way to go.

supercoco9 · on Nov 28, 2022

I enjoyed reading about how technical decisions are made to make databases smarter and optimise without the need to relay on users fine-tuning obscure configuration parameters