Is there really a market for these kinds of relational tables?
I created a system to support my custom object store where the metadata tags are stored within key-value stores. I can use them to create relational tables and query them just like conventional row stores used by many popular database engines.
My 'columnar store database' can handle many thousands of columns within a single table. So far, I have only tested it out to 10,000 columns, but it should handle many more.
I can get sub-second query times against it running on a single desktop. I haven't promoted this feature since everyone I have talked to about it, never had a compelling use for it.
A concrete case where this comes up is multi-omics research. A single study routinely combines ~20k gene expression values, 100k–1M SNPs, thousands of proteins and metabolites, plus clinical metadata — all per patient.
Today, this data is almost never stored in relational tables. It lives in files and in-memory matrices, and a large part of the work is repeatedly rebuilding wide matrices just to explore subsets of features or cohorts.
In that context, a “wide table” isn’t about transactions or joins — it’s about having a persistent, queryable representation of a matrix that already exists conceptually. Integration becomes “load patients”, and exploration becomes SELECT statements.
I’m not claiming this fits every workload, but based on how much time is currently spent on data reshaping in multi-omics, I’m confident there is a real need for this kind of model.
Interesting. Are you willing to try out some 'experimental' software?
As I indicated in my previous post, I have a unique kind of data management system that I have built over the years as a hobby project.
It was originally designed to be a replacement for conventional file systems. It is an object store where you could store millions or billions of files in a single container and attach metadata tags to each one. Searches for data could be based on these tags. I had to design a whole new kind of metadata manager to handle these tags.
Since thousands or millions of different kinds of tags could be defined, each with thousands or millions of unique values within them; the whole system started to look like a very wide, sparse relational table.
I found that I could use the individual 'columnar stores' that I built, to also build conventional database tables. I was actually surprised at how well it worked when I started benchmarking it against popular database engines.
I would test my code by downloading and importing various public datasets and then doing analytics against that data. My system does both analytic and transactional operations pretty well.
Most of the datasets only had a few dozen columns and many had millions of rows; but I didn't find any with over a thousand columns.
As I said before, I had previously only tested it out to 10,000 columns. But since reading your original question, I started to play with large numbers of columns.
After tweaking the code, I got it to create tables with up to a million columns and add some random test data to them. A 'SELECT *' query against such a table can take a long time, but doing some queries where only a few dozen of the columns were returned, worked very fast.
How many patients were represented in your dataset? I assume that most rows did not have a value in every column.
I created a system to support my custom object store where the metadata tags are stored within key-value stores. I can use them to create relational tables and query them just like conventional row stores used by many popular database engines.
My 'columnar store database' can handle many thousands of columns within a single table. So far, I have only tested it out to 10,000 columns, but it should handle many more.
I can get sub-second query times against it running on a single desktop. I haven't promoted this feature since everyone I have talked to about it, never had a compelling use for it.