>But in reality looks like parsing CSV string works faster than bloated and overengineered parquet format with libs.
Anecdotally having worked with large CSVs and large on-disk Parquet datasets, my experience is the opposite of yours. My DuckDB queries operate directly on Parquet on disk and never load the entire dataset, and is always much faster than the equivalent operation on CSV files.
I think your experience might be due to -- what it sounds like -- parsing the entire CSV into memory first (CREATE TABLE) and then processing after. That is not an apples-to-apples comparison because we usually don't do this with Parquet -- there's no CREATE TABLE step. At most there's a CREATE VIEW, which is lazy.
I've seen your comments bashing Parquet in DuckDB multiple times, and I think you might be doing something wrong.
> I think your experience might be due to -- what it sounds like -- parsing the entire CSV into memory first (CREATE TABLE) and then processing after. That is not an apples-to-apples
original discussion was about CSV vs parquet "reader" part, so this is exactly apple to apple testing, easy to benchmark and I stand my ground. What you are doing downstream, it is another question which is not possible to discuss because no code for your logic is available.
> I've seen your comments bashing Parquet in DuckDB multiple times, and I think you might be doing something wrong.
like running one command from DuckDB doc.
Also, I am not "bashing", I just state that CSV reader is faster.
Anecdotally having worked with large CSVs and large on-disk Parquet datasets, my experience is the opposite of yours. My DuckDB queries operate directly on Parquet on disk and never load the entire dataset, and is always much faster than the equivalent operation on CSV files.
I think your experience might be due to -- what it sounds like -- parsing the entire CSV into memory first (CREATE TABLE) and then processing after. That is not an apples-to-apples comparison because we usually don't do this with Parquet -- there's no CREATE TABLE step. At most there's a CREATE VIEW, which is lazy.
I've seen your comments bashing Parquet in DuckDB multiple times, and I think you might be doing something wrong.