And you honestly still haven’t addressed the main point - you are literally usin...

vjerancrnjak · 2026-01-21T08:39:53 1768984793

Clickhouse is column based storage, I can also apply delta compression, where gapless timestamp columns basically have 0 storage cost. I can apply Gorilla as well and get nice compression from irregular columns. I am aware of Redshift's AZ64 cols and they are a let down.

I can change sort order, same as in Redshift with its sort keys, to improve compression and compute. Redshift does not really exploit this sort-key config as much as it could.

My own assessment is that I'm extremely skilled at making any kind of DB system yield to my will and get it to its limits.

I have never used Redshift, Clickhouse or Snowflake with 1 by 1 inserts. I have mentioned S3 consumers (a library or a service, optimized to work well with autoscaling done by S3, respecting SlowDown -- something Redshift itself is incapable of respecting -- and achieving enormous download rates -- some of the consumers I've used completely saturate the 200Gbps limits of some EC2 machines at AWS). These consumers cannot be used in a 1-by-1 setting, the whole point is to have an insanely fast pipelining system with batched processing, interleaving network downloads with CPU compute, so that in the end, any kind of data repackaging and compression is negligible compared to download, so you can just predict how long the system will take to ingest by knowing what your peak download speed is, because the actual compute is fully optimized and pipelined.

Now, it might just be Redshift has bugs and I should report them, but I did not have the experience of AWS reacting quickly to any of the reports I've made.

I disagree, it's not a me problem. I am a bit surprised after all I've written that you're still implying I want OLTP, am using the wrong tool for the job. There are just some tools I would never pick, because they just don't work as advertised, Redshift is one of them. There are much better in-memory compute engines that work directly with S3, and you can create any kind of trash low-value pipelines with them, if you reach mem limits of your compute system, there are much better compute engine + storage combos than Redshift. My belief is that Redshift is purely a nontechnical choice.

Now, to steelman you, if you're saying:

* data warehouse as managed service,

* cost efficiency via guardrails,

* scale by policy, not by expertise,

* optimize for nontechnical teams,

* hide the machinery,

* use AWS-native bloated, slow or expensive glue (Glue, Athena, Kinesis, DMS),

* predictable monthly bill,

* preventing S3 abuse,

* preventing runaway parallelism,

* avoiding noisy-neighbor incidents (either by protecting me or protecting AWS infra),

* intentionally constrained to satisfy all of the above,

then yes, I agree, I am definitely using the wrong tool but as I said, if the value proposition is nontechnical, I do not really care about that.

raw_anon_1111 · 2026-01-21T14:12:38 1769004758

> My own assessment is that I'm extremely skilled at making any kind of DB system yield to my will and get it to its limits.

Yes an according to my assessment I’m also very good in bed and extremely handsome.

But there is an existence proof seeing that you are running into issues yet millions of people use AWS services and know how to use the right tool for the job

I’m not defending Redshift for your use case, I’m saying you didn’t do your research and you did absolutely everything wrong. From my cursory research of Clickhouse, I probably would have chosen that too for use case

vjerancrnjak · 2026-01-21T18:37:28 1769020648

I did not do anything wrong. I had no choice with Redshift and had instructions from above. I made it work really well for what it can do and was surprised how much it sucks even when it has its own data inside of it and has to do compute. As a completely closed system, it's not impressive at all. It has absolutely shameful group-by SQL, completely inefficient sort-key and compression semantics, and absolutely can't attach itself to Kinesis directly without costing you insane amounts of money, because as you already know, Redshift is not a live service (you won't use it by connecting directly to it and expect good performance), it's primarily a parallel compute engine.

Your assessment of me is flawed. You haven't really shown any kind of low-level expertise on how actually these systems work, you've just name dropped OLTP OLAP as if that means anything at all. What is Timescale (now TigerData), OLTPOLAPBLAPBLAP? If someone tells you to use Timescale, you have to figure out how to use it and make the system yield to your will. If system sucks, it yields harder, if system is well designed, it's absolutely beautiful. For example, I would never use Timescale as well, yet you can go on their page and see unicorns using it. I have no idea why, but let them have their fun. There's successful companies using Elasticsearch for IoT telemetry, so who am I to argue I wouldn't do that as well.

There's nothing wrong with using PostgreSQL for timeseries data, you just need to know how to use it. At some point, scaling wise, it will fail, but you're deciding on tradeoffs.

So yes, my assessments have a good track record, not only of myself, but of others as well. I am extremely open to any kind of precise criticism and have been wrong bazillion times and I take part in these kinds of passionate discussions on the internet because I am aware I can absolutely be convinced of the other side. Otherwise, I would have quit a long time ago.