oscarmoxon's comments

oscarmoxon · 2026-03-10T19:57:55 1773172675

Some of this exists already in pockets (Common Crawl, The Pile, RedPajama are all volunteer/open efforts). I suppose there's no equivalent of the "edit this page and see the impact" like with have with Wikipedia. Contributing to an open dataset has no feedback loop if the training infrastructure that would consume it is closed... seems like a feedback problem.

oscarmoxon · 2026-03-10T19:54:32 1773172472

Agree that this makes it unlikely we see frontier training data OS'd but this is a separate problem from software and infrastructure transparency, which has none of those constraints. Training stack, the parallelism decisions, documented failure modes are engineering knowledge and there's no principled reason it doesn't ship.

oscarmoxon · 2026-03-09T23:48:33 1773100113

The framing here is undersold in the broader discourse: "open weights" is a ruse for reproducibility. What you have is closer to a compiled binary than source code. You can run it, you can diff it against other binaries, but you cannot, in any meaningful sense, reproduce or extend it from first principles.

This matters because OSS truly depends on the reproducibility claim. "Open weights" borrows the legitimacy of open source (the assumption that scrutiny is possible, that no single actor has a moat, that iteration is democratised). Truly democratised iteration would crack open the training stack and let you generate intelligence from scratch.

Huge kudos to Addie and the team for this :)

Wowfunhappy · 2026-03-10T18:06:08 1773165968

But how useful is source code if it takes millions of dollars to compile? At that point, if you do need to make changes, it probably makes more sense to edit the precompiled binary. Even the original developers are doing binary edits in most cases.

I agree that open weight models should not be considered open source, but I also think the entire definition breaks down under the economics of LLMs.

scottlamb · 2026-03-10T18:16:14 1773166574

There are lots of reasons to read through source code you never edit or recompile: security audits, interoperability, learning from their techniques, etc. And I think many of those same ideas apply to seeing the training data of a LLM. It will help you understand quickly (without as much experimentation) what it's likely to be good at, where its biases may be, where some kind of supplement (transfer learning? RAG? whatever) might be needed. And the why.

vova_hn2 · 2026-03-11T00:46:04 1773189964

> security audits

If you are unable to run the multimillion training, then any kind of security audit of the training code is absolutely meaningless, because you have no way to verify that the weights were actually produced by this code.

Also, the analogy with source code/binary code fails really fast, considering that model training process is non-deterministic, so even if are able to run the training, then you get different weights than those that were released by the model developers, then... then what?

scottlamb · 2026-03-11T15:41:25 1773243685

I probably shouldn't have led with that example because yeah, reproducible (and cheap) builds would be best for security audits. But I wouldn't say it's absolutely meaningless. At least it can guide your experimentation, and if results start differing radically from what you'd expect from the training data, that raises interesting questions.

Dylan16807 · 2026-03-11T12:41:19 1773232879

If you're going through the effort to be open source you can probably set up fixed batch sizes and deterministic combination of batches without too much more effort. At least I hope it's not super hard.

HappMacDonald · 2026-03-11T07:19:00 1773213540

> considering that model training process is non-deterministic

Why would it have to be? Just use PRNG with published seeds and then anyone can reproduce it.

dataflow · 2026-03-11T08:41:14 1773218474

I have zero actual experience in training models, but in general, when parallelizing work: there can be fundamental nondeterminism (e.g., some race conditions) that is tolerated, whose recording/reproduction can be prohibitive performance-wise.

oscarmoxon · 2026-03-10T18:37:40 1773167860

Agree, this feels like a distinction that needs formalising...

Passive transparency: training data, technical report that tells you what the model learned and why it behaves the way it does. Useful for auditing, AI safety, interoperability.

Active transparency: being able to actually reproduce and augment the model. For that you need the training stack, curriculum, loss weighting decisions, hyperparameter search logs, synthetic data pipeline, RLHF/RLAIF methodology, reward model architecture, what behaviours were targeted and how success was measured, unpublished evals, known failure modes. The list goes on!

addiefoote8 · 2026-03-10T18:50:49 1773168649

I'd also add training checkpoints to the list for active transparency. I think the Olmo models do a decent job, but it would be cool to see it for bigger models and for ones that are closer to state-of-the-art in terms of both architecture and algorithms.

kazinator · 2026-03-10T22:17:10 1773181030

Security audits, etc, are possible because binary code closely implements what the source code says.

In this case, you have no idea what the weights are going to "do", from looking at the source materials --- the training data and algorithm --- without running the training on the data.

oscarmoxon · 2026-03-10T18:31:28 1773167488

Compute costs are falling fast, training is getting cheaper. GPT-2 costs pocket change to train, and now it costs pocket train to tune >1T parameter models. If it was transparent what costs went into the weights, they could be commodified and stripped of bloat. Instead the hidden cost is building the infrastructure that was never tested at scale by anyone other than the original developers who shipped no documentation of where it fails. Unlike compute, this hidden cost doesn't commodify on its own.

addiefoote8 · 2026-03-10T18:45:20 1773168320

yeah, the costs are definitely a factor and prohibitive in completely replicating an open source model. Still, there's a lot of useful things that can be done cheaply, including fine tuning, interpretability work, and other deeper investigations into the model that can't happen without the infrastructure.

maxwg · 2026-03-10T23:12:02 1773184322

The training methods are largely published in their open research papers - though arguably some open weight companies are less open with the exact details.

Realistically a model will never be "compiled" 1:1. Copyrighted data is almost certainly used and even _if_ one could somehow download the petabytes of training data - it's quite likely the model would come out differently.

The article seems to be talking more about the difficulties of fine tuning models though - a setup problem that likely exists in all research, and many larger OSS projects that get more complicated.

alansaber · 2026-03-10T23:49:11 1773186551

Yes the issue is they can embelish the shit out of the papers b/c we only see the final result

anon373839 · 2026-03-11T05:54:39 1773208479

> "Open weights" borrows the legitimacy of open source

I don't really see how open-weights models need to borrow any legitimacy. They are valuable artifacts being given away that can be used, tested and repurposed forever. Fully open models like the OLMo series and Nvidia's Nemotron are much more valuable in some contexts, but they haven't quite cracked the level of performance that the best open-weights models are hitting. And I think that's why most startups are reaching for Chinese base LLMs when they want to tune custom models: the performance is better and they were never going to bother with pretraining anyway.

oscarmoxon · 2025-05-28T15:09:12 1748444952

Personally, I love this theory. The thought of natural assembly and selection at the level of Black Holes is alluring. Not sure what The Black Mirror Hypothesis (https://curtjaimungal.substack.com/p/when-you-fall-into-a-bl...) would have to say about this, though.

Xmd5a · 2025-05-28T17:54:09 1748454849

I've been calling out the similarity of works done by Barbour, Turok, Farnes & Petit for a long time, and the last developments by Turok's team vindicate this intuition. It is now very close to Jean-Pierre Petit's Janus model. Curt Jaimungal announced he'd interview him soon.

https://januscosmologicalmodel.com

Petit's models implies negative masses that would sit at the center of the cosmic voids, giving it structure.

Someone wrote simulation showcasing this emergent phenomenon a few years ago: https://www.youtube.com/watch?v=vtcbBpieR5U

oscarmoxon · 2025-05-28T15:06:54 1748444814

They're also littering the London tube system with ads - there's definitely been a lottery win or a series A.

parkaboy · 2025-05-28T15:10:31 1748445031

They were one of the earliest to adopt bitcoin and monero payments--if they didn't convert all those payments immediately to cash, they're probably sitting pretty right now.

dijit · 2025-05-28T16:03:37 1748448217

They also have a partnership with Tailscale that can't be undersold.

I'm not sure how much it adds to their bottom line for each sale, but my corp was using the Mullvad VPN addition to tailscale to do global testing by our developers.

IE; "is something blocked, do we detect GEOIP properly" etc;

haiku2077 · 2025-05-28T19:47:37 1748461657

The Tailscale integration is super handy while traveling. One app to access my home server and my home region.

george_perez · 2025-05-28T16:18:39 1748449119

And Mozilla VPN as well.

kfreds · 2025-05-29T07:15:18 1748502918

> there's definitely been a lottery win or a series A

We have neither won the lottery nor taken on outside investment. We've been growing for years, and we've reached a point where we can afford campaigns like this. It is an interesting experiment by our marketing team. Still, I think people on HN overestimate the cost of campaigns like this.

noir_lord · 2025-05-28T17:21:34 1748452894

Now’s a good time since the online safety bill kicks in towards end of July.

UK use of VPN’a outside the office/work environment is gonna skyrocket.

unfitted2545 · 2025-05-28T15:21:22 1748445682

And whole buses!

oscarmoxon · 2025-05-15T11:00:29 1747306829

The fact that the machines generated this Wikipedia page for their version of events makes this artwork.

https://claude.ai/public/artifacts/b0e14755-0bd9-4da6-8175-c...

oscarmoxon · on Feb 16, 2025

The National Library of Scotland has an impressive web tool that allows you to overlay detailed historical Ordnance Survey (OS) maps (dating back to 1841 for this instance) onto modern satellite imagery. The interface lets you adjust transparency and blend between past and present landscapes.

Here is the tool overlaid on Bath.

oscarmoxon · on Jan 28, 2025

by this metric, Midjourney produce 1,920,000,000 images/day, that would put it on-par with Google at 8.9bn searches/day!