More

towelpluswater · 2025-03-10T05:16:45 1741583805

Yep we’re here. I think lots of us who don’t often post

towelpluswater · on April 9, 2024

Modular Mojo is the most well funded and full of respectable players for making an alternative possible

pavelstoev · on April 9, 2024

Check out Hidet [1]. Not as well funded, but delivers Python based ML acceleration with GPU support (unlike Mojo).

[1] https://github.com/hidet-org/hidet

towelpluswater · on March 19, 2024

I mean they’re building the labeled dataset right now by having creators label it for them.

I would suspect this helps make moderation models better at estimating confidence levels of ai generated content that isn’t labeled as such (ie for deception).

Surprised we aren’t seeing more of this in labeling datasets for this new world (outside of captchas)

towelpluswater · on Feb 29, 2024

Think it’s “vibes” based search (aka dense embedding similarity search)

towelpluswater · on Feb 29, 2024

I’ve never understood why people want a more verbose version of sql.

I think what people really want is business rules and data cleaning and schema discovery.

If you had to use English against multiple source systems to and tons of joins, the sentence would be paragraphs.

Where I think there’s value is in using something like a data catalog to label business rules against a data warehouse, tied to dashboard queries and other common ones.

But that’s a hard problem and a unique model to every customer. And always changing.

staticautomatic · on Feb 29, 2024

Combining schema discovery and data catalog seems like it might be a hard problem requiring a lot of LLM prompt engineering gymnastics but maybe I underestimate the state of the art.

towelpluswater · on Feb 16, 2024

This is a fantastic write up and great parallel to the state of where we’re headed.

towelpluswater · on Feb 5, 2024

This is a really great idea and use case. It also makes a ton of sense as a pilot use case for this type of open source project given extensions are smaller in scope.

I mean even having it document a best draft of what the extension code is doing would be awesome.

Unless it’s made into an extension and then you have a recursive hell.

towelpluswater · on Jan 29, 2024

Bought a copy! Your posts and newsletter content has been such a huge inspiration for me throughout 2023 - good luck, this is a huge effort!

rasbt · on Jan 30, 2024

thanks for the kind words!

towelpluswater · on Dec 30, 2023

I think the bigger problem is that replication/ingestion (ie: what fivetran does) has come to represent 'ELT'. Likely by design.

And you don't need that pesky transformation part.

Except you really do, when you get beyond having a source system or two.

towelpluswater · on Sept 22, 2023

I'd love an invite if you still have any.