Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why is .parquet better than protobuf?


Parquet is columnar storage, which is much faster for querying. And typically for protobuf you deserialize each row, which has a performance cost - you need to deserialize the whole message, and can't get just the field you want.

So, of you want to query a giant collection of protobufs, you end up reading and deserializing every record. For parquet, you get much closer to only reading what you need.


Thank you.


Parquet ~= Dremel, for those who are up on their Google stack.

Dremel was pretty revolutionary when it came out in 2006 - you could run ad-hoc analyses in seconds that previously would've taken a couple days of coding & execution time. Parquet is awesome for the same reasons.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: