Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Why do you assume it's an "outside" view? I was freelancing for many years on projects with "Data Scientist" / "Data Engineer" in my job title, having been hired by managers who heard about "Big Data" at an event by KPMG or Accenture or whatever.

And I don't understand why you're reading "boutique bit-optimized C++ crap" into "most basic and obvious optimizations".

One of those most basic and obvious optimizations is to avoid reading a dataframe into memory in its entirety, when the math that you want to do on top of it can actually be done as a running accumulator while reading the data from a stream. This is possible in 90% of realistic use cases, but the fraction of software written back then that took advantage of this was shockingly small. Solving the problem by buying more machines, chopping up the dataframe into smaller pieces, and farming out the payload through Hadoop had management buy-in. Yet, for some reason, doing the sane thing, namely rewriting poorly-written software, didn't.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: