More

meowkit · 2026-03-11T21:10:55 1773263455

Zero Knowledge Proof schemes

Applied ZKPs are being actively worked on in the blockchain sphere.

DANmode · 2026-03-12T06:10:56 1773295856

“A zero-knowledge rollup (zk-rollup) is a layer-2 scaling solution that moves computation and state off-chain into off-chain networks while storing transaction data on-chain on a layer-1 network (for example, Ethereum). State changes are computed off-chain and are then proven as valid on-chain using zero-knowledge proofs.”

meowkit · 2026-01-06T21:36:01 1767735361

We are not talking about inference.

The prompts and responses are used as training data. Even if your provider allows you to opt out they are still tracking your usage telemetry and using that to gauge performance. If you don’t own the storage and compute then you are training the tools which will be used to oppress you.

Incredibly naive comment.

Aurornis · 2026-01-06T21:40:07 1767735607

> The prompts and responses are used as training data.

They show a clear pop-up where you choose your setting about whether or not to allow data to be used for training. If you don't choose to share it, it's not used.

I mean I guess if someone blindly clicks through everything and clicks "Accept" without clicking the very obvious slider to turn it off, they could be caught off guard.

Assuming everyone who uses Claude is training their LLMs is just wrong, though.

Telemetry data isn't going to extract your codebase.

lukan · 2026-01-06T22:10:45 1767737445

"If you don't choose to share it, it's not used"

I am curious where your confidence that this is true, is coming from?

Besides lots of GPU's, training data seems the most valuable asset AI companies have. Sounds like strong incentive to me to secretly use it anyway. Who would really know, if the pipelines are set up in a way, if only very few people are aware of this?

And if it comes out "oh gosh, one of our employees made a misstake".

And they already admitted to train with pirated content. So maybe they learned their lesson .. maybe not, as they are still making money and want to continue to lead the field.

simonw · 2026-01-07T00:37:34 1767746254

My confidence comes from the following:

1. There are good, ethical people working at these companies. If you were going to train on customer data that you had promised not to train on there would be plenty of potential whistleblowers.

2. The risk involved in training on customer data that you are contractually obliged not to train on is higher than the value you can get from that training data.

3. Every AI lab knows that the second it comes out that they trained on paying customer data saying they wouldn't, those paying customers will leave for their competitors (and sue them int the bargain.)

4. Customer data isn't actually that valuable for training! Great models come from carefully curated training data, not from just pasting in anything you can get your hands on.

Fundamentally I don't think AI labs are stupid, and training on paid customer data that they've agreed not to train on is a stupid thing to do.

RodgerTheGreat · 2026-01-07T01:28:10 1767749290

1. The people working for these companies are already demonstrably ethically flexible enough to pirate any publicly accessible training data they can get their hands on, including but not limited to ignoring the license information in every repo on GitHub. I'm not impressed with any of these clowns and I wouldn't trust them to take care of a potted cactus.

2. The risk of using "illegal" training data is irrelevant, because no GenAI vendors have been meaningfully punished for violating copyright yet, and in the current political climate they don't expect to be anytime soon. Even so,

3. Presuming they get caught redhanded using personal data without permission- which, given the nature of LLMs would be extremely challenging for any individual customer to prove definitively- they may lose customers, and customers may try to sue, but you can expect those lawsuits to take years to work their way through the courts; long after these companies IPO, employees get their bag, and it all becomes someone else's problem.

4. The idea of using carefully curated datasets is popular rhetoric, but absolutely does not reflect how the biggest GenAI vendors do business. See (1).

AI labs are extremely shortsighted, sloppy, and demonstrably do not care a single iota about the long term when there's money to be made in the short term. Employees have gigantic financial incentives to ignore internal malfeasance or simple ineptitude. The end result is, if anything, far worse than stupidity.

simonw · 2026-01-07T06:25:28 1767767128

There is an important difference between openly training on scraped web data and license-ignored data from GitHub and training on data from your paying customers that you promised you wouldn't train on.

Anthropic had to pay $1.5bn after being caught downloading pirated ebooks.

lunar_mycroft · 2026-01-07T07:32:23 1767771143

So Anthropic had to pay less than 1% of their valuation despite approximately their entire business being dependent on this and similar piracy. I somehow doubt their takeaway from that is "let's avoid doing that again".

ben_w · 2026-01-08T11:28:31 1767871711

Two things:

First: Valuations are based on expected future profits.

For a lot of companies, 1% of valuation is ~20% of annual profit (P/E ratio 5); for fast growing companies, or companies where the market is anticipating growth, it can be a lot higher. Weird outlier example here, but consider that if Tesla was fined 1% of its valuation (1% of 1.5 trillion = 15 billion), that would be most of the last four quarter's profit on https://www.macrotrends.net/stocks/charts/TSLA/tesla/gross-p...

Second: Part of the Anthropic case was that many of the books they trained on were ones they'd purchased and destructively scanned, not just pirated. The courts found this use was fine, and Anthropic had already done this before being ordered to: https://storage.courtlistener.com/recap/gov.uscourts.cand.43...

simonw · 2026-01-07T07:44:22 1767771862

Their main takeaway was that they should legally buy paper books, chop the spines off and scan those for training instead.

lunar_mycroft · 2026-01-07T03:57:28 1767758248

Every single point you made is contradicted by the observed behavior of the AI labs. If any of those factors were going to stop them from training on data they legally can't, they would have done so already.

Aurornis · 2026-01-07T02:04:15 1767751455

> I am curious where your confidence that this is true, is coming from?

My confidence comes from working in big startups and big companies with legal teams. There's no way the entire company is going to gather all of the engineers and everyone around, have them code up a secret system to consume customer data into a secret part of the training set, and then have everyone involved keep quiet about it forever.

The whistleblowing and leaking would happen immediately. We've already seen LLM teams leak and and have people try to whistleblow over things that aren't even real, like the Google engineer who thought they had invented AGI a few years ago (lol). OpenAI had a public meltdown when the employees disagreed with Sam Altman's management style.

So my question to you is: What makes you think they would do this? How do you think they'd coordinate the teams to keep it all a secret and only hire people who would take this secret to their grave?

lukan · 2026-01-07T06:44:11 1767768251

"There's no way the entire company is going to gather all of the engineers and everyone around, have them code up a secret system "

No, that is why I wrote

"Who would really know, if the pipelines are set up in a way, that only very few people are aware of this?" (Typo fixed)

There is no need for everyone to know. I don't know their processes, but I can think of ways to only include very few people who need to know.

The rest is just working on everything else. Some work with data, where they don't need to know where it came from, some with UI, some with scaling up, some .. they all don't need to know, that the source of DB XYZ comes from a dark source.

theshrike79 · 2026-01-07T19:40:37 1767814837

> I am curious where your confidence that this is true, is coming from?

We have a legal binding contract with Anthropic. Checked and vetted by our laywers, who are annoying because they actually READ the contracts and won't let us use services with suspicious clauses in them - unless we can make amendments.

If they're found to be in breach of said contract (which is what every paid user of Claude signs), Anthropic is going to be the target of SO FUCKING MANY lawsuits even the infinite money hack of AI won't save them.

lukan · 2026-01-08T11:55:27 1767873327

Are you refering to the standard contract/terms of use, or does your company has a special contract made with them?

theshrike79 · 2026-01-10T21:06:15 1768079175

Usually we have the standard contract if Legal approves.

We have stopped using major services because of their TOS wording, Midjourney being one.

ben_w · 2026-01-06T23:19:28 1767741568

> Besides lots of GPU's, training data seems the most valuable asset AI companies have. Sounds like strong incentive to me to secretly use it anyway. Who would really know, if the pipelines are set up in a way, if only very few people are aware of this?

Could be, but it's a huge risk the moment any lawsuit happens and the "discovery" process starts. Or whistleblowers.

They may well take that risk, they're clearly risk-takers. But it is a risk.

yunwal · 2026-01-07T00:12:34 1767744754

Eh they’re all using copyrighted training data from torrent sites anyway. If the government was gonna hold them accountable for this it would have happened already.

ragequittah · 2026-01-07T00:54:24 1767747264

You're probably right [1]

[1]https://www.cbc.ca/news/business/anthropic-ai-copyright-sett...

ben_w · 2026-01-07T09:27:37 1767778057

The piracy was found to be unlawful copyright infringement.

The training was OK, but the piracy wasn't, they were held accountable for that.

blibble · 2026-01-07T00:28:22 1767745702

the US no longer has any form of rule of law

so there's no risk

Aurornis · 2026-01-07T02:04:48 1767751488

> the US no longer has any form of rule of law

AI threads really bring out the extreme hyperbole and doomerism.

ben_w · 2026-01-07T09:28:37 1767778117

The USA is a mess that's rapidly getting worse, but it has not yet fallen that far.

meowkit · 2026-01-06T21:28:27 1767734907

Spotify only streams 16-bit lossless as far as I have seen (though they claim 24 bit in this post). Might require artists to reupload the audio?

Tidal has much more 24 bit options when I did an A/B.

The dynamic range difference is very material on quality sound setups.

As a side note Bluetooth (at least for Airpods) only does 16-bit!

SirMaster · 2026-01-06T21:32:44 1767735164

There's no benefit to more than 16-bits. 16-bit allows for a dynamic range of 96dB. No music is mastered anywhere near this dynamic range.

24-bit helps in production pipelines for mixing, but for end user playback it's pointless.

doublerabbit · 2026-01-06T21:39:11 1767735551

Maybe pointless, but if provided why not?

nomel · 2026-01-06T22:19:20 1767737960

By the same logic, if pointless, why?

doublerabbit · 2026-01-06T22:29:55 1767738595

As quoted from the OP.

> 24-bit helps in production pipelines for mixing, but for end user playback it's pointless.

If you have two versions of something, where one is better than the other and the resource cost is more or less the same it makes more sense to provide the better than the worse.

Maybe the end-user takes interest in mixing/production for which they then have the higher version allowing them to work with without the faff of having to obtain the better quality works. The end-user won't know the difference and the new apprentice has a copy that they can work with.

That's not a loss, that's a benefit even if pointless to the end user.

cyberax · 2026-01-07T00:29:35 1767745775

> Maybe the end-user takes interest in mixing/production for which they then have the higher version allowing them to work with without the faff of having to obtain the better quality works.

16-bit is enough for mixing. 24-bit (or 32-bit floats, even better) are useful _within_ the mixing pipeline, so you don't need to care if one of the steps results in clipping as long as the final result is within the bounds.

nsteel · 2026-01-08T14:12:39 1767881559

Because it's a complete waste of bandwidth.

meowkit · 2026-01-06T06:18:30 1767680310

Your example assumes there would be sufficient liquidity on that bet. The existing platforms aren’t houses or market makers that just provide functionally infinite liquidity on any bets. The “win” criteria on this example is so specific that verification becomes its own problem.

In theory a fun example, but practically it doesn’t play out the way you’re describing.

meowkit · 2025-11-29T13:53:35 1764424415

Wheres the wooshing joke jpeg when you need it

cluckindan · 2025-11-29T16:03:50 1764432230

Not everyone reading these discussions is going to be expecting humor, and will take any commentary affirming their prior indoctrination at face value.

dylan604 · 2025-11-29T16:57:48 1764435468

That sounds like a YP not an MP though. Everyone jokes about the sarcasm font being hard to use, but the printed word has been around for a long time, much longer than the internet, yet the sarcasm font complaint has only been an internet thing.

cluckindan · 2025-11-29T23:06:57 1764457617

Books were notoriously bad at two-way communication, though.

meowkit · 2025-11-29T13:43:36 1764423816

That is not where trust in the dollar comes from.

It comes from stability. Predictability.

Courts and law enforcement certainly provide these things, but they are not required. The inherent design of blockchains makes them trustworthy (an oversimplified statement), which is even better.

kasey_junk · 2025-11-29T13:50:47 1764424247

Blockchains don’t, and can’t, solve for the risk of the off chain component of an exchange.

The transactions aren’t atomic so someone is taking on counterparty risk. One of governments prime responsibilities is dealing with that risk, no matter the currency in question.

meowkit · 2025-11-17T22:51:47 1763419907

The prediction algorithms are so good that indirect behaviors and data can be informative.

You might also be profiled by Google and bucketed into a group of similar people who leak their data. They also went to this website and their YT recommendations became a signal to inform your own.

Not claiming any certainty here just possible ideas.

barbazoo · 2025-11-17T23:23:25 1763421805

Tin foil hat comes off then for now, thank you :)

meowkit · 2025-10-30T17:48:25 1761846505

I was a bit peeved by the title, but I think its a fair use of clickbait as the article has a lot of little details about acoustics in humans that I was unfamiliar with (i.e. a link to a primer on the the transduction implementation of cochlear cilia)

But yeah there is a strict vs colloquial collision here.

meowkit · 2025-10-03T08:29:01 1759480141

Google zero knowledge proofs for verification

meowkit · 2025-09-26T17:58:23 1758909503

https://docs.ens.domains/learn/protocol/

Supporting DNS all up should be possible but organizing the other decentralized services (compute, storage) is the hard part

fruitworks · 2025-09-26T19:11:18 1758913878

The name service is easy, namecoin did it more efficiently than ENS a decade ago.

The decentralized services need not be attached to some blockchain due to the resource constraints. But there are examples like Filecoin and such.