Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Now, you have to consider the cost it takes for you whole team to learn how to use AWK instead of SQL. Then you do these TCO calculations and revert back to the BigQuery solution.


Not necessarily. I always try to write to disk first, usually in a rotating compressed format if possible. Then, based on something like a queue, cron, or inotify, other tasks occur, such as processing and database logging. You still end up at the same place, and this approach works really well with tools like jq when the raw data is in jsonl format.

The only time this becomes an issue is when the data needs to be processed as close to real-time as possible. In those instances, I still tend to log the raw data to disk in another thread.


For someone who is comfortable with sql we are talking minutes to hours to figure out awk well enough to see how its used or use it.


I have been using sql for decades and I am not comfortable with awk or intend to become so. There are better tools.


It is not only about whether people can figure it out awk. It is also about how supportable the solution is. SQL provides many features specifically to support complex querying and is much more accessible to most people - you can't reasonably expect your business analysts to do complex analysis using awk.

Not only that, it provides a useful separation from the storage format so you can use it to query a flat file exposed as table using Apache Drill or a file on s3 exposed by Athena or data in an actual table stored in a database and so on. The flexibility is terrific.


With the exception of regexes- which any programmer or data analyst ought to develop some familiarity with anyway- you can describe the entirety of AWK on a few sheets of paper. It's a versatile, performant, and enduring data-handling tool that is already installed on all your servers. You would be hard-pressed to find a better investment in technical training.


No, if you want SQL you install postgresql on the single machine.

Why would use use bigquery just to get SQL?


sqlite cli


About $20/month for chatgpt or similar copilot, which really they should reach for independently anyhow.


And since the data scientist cannot verify the very complex AWK output that should be 100% compatible with his SQL query, he relies on the GPT output for business-critical analysis.


Only if your testing frameworks are inadequate. But I belive you could be missing or mistaken on how code generation successfully integrates into a developer and data scientist's work flow.

Why not take a few days to get familiar with AWK, a skill which will last a lifetime? Like SQL, it really isn't so bad.


It is easier to write complex queries in SQL instead of AWK. I know both AWK and SQL, and I find SQL much easier for complex data analysis, including JOINS, subqueries, window functions, etc. Of course, your mileage may vary, but I think most data scientists will be much more comfortable with SQL.


Many people have noted how when using LLMs for things like this, the person’s ultimate knowledge of the topic is less than it would’ve otherwise been.

This effect then forces the person to be reliant on the LLM for answering all questions, and they’ll be less capable of figuring out more complex issues in the topic.

$20/mth is a siren’s call to introduce such a dependency to critical systems.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: