Hacker Newsnew | past | comments | ask | show | jobs | submit | towelpluswater's commentslogin

Yep we’re here. I think lots of us who don’t often post


Modular Mojo is the most well funded and full of respectable players for making an alternative possible


Check out Hidet [1]. Not as well funded, but delivers Python based ML acceleration with GPU support (unlike Mojo).

[1] https://github.com/hidet-org/hidet


I mean they’re building the labeled dataset right now by having creators label it for them.

I would suspect this helps make moderation models better at estimating confidence levels of ai generated content that isn’t labeled as such (ie for deception).

Surprised we aren’t seeing more of this in labeling datasets for this new world (outside of captchas)


Think it’s “vibes” based search (aka dense embedding similarity search)


I’ve never understood why people want a more verbose version of sql.

I think what people really want is business rules and data cleaning and schema discovery.

If you had to use English against multiple source systems to and tons of joins, the sentence would be paragraphs.

Where I think there’s value is in using something like a data catalog to label business rules against a data warehouse, tied to dashboard queries and other common ones.

But that’s a hard problem and a unique model to every customer. And always changing.


Combining schema discovery and data catalog seems like it might be a hard problem requiring a lot of LLM prompt engineering gymnastics but maybe I underestimate the state of the art.


This is a fantastic write up and great parallel to the state of where we’re headed.


This is a really great idea and use case. It also makes a ton of sense as a pilot use case for this type of open source project given extensions are smaller in scope.

I mean even having it document a best draft of what the extension code is doing would be awesome.

Unless it’s made into an extension and then you have a recursive hell.


Bought a copy! Your posts and newsletter content has been such a huge inspiration for me throughout 2023 - good luck, this is a huge effort!


thanks for the kind words!


I think the bigger problem is that replication/ingestion (ie: what fivetran does) has come to represent 'ELT'. Likely by design.

And you don't need that pesky transformation part.

Except you really do, when you get beyond having a source system or two.


I'd love an invite if you still have any.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: