Hacker Newsnew | past | comments | ask | show | jobs | submit | karpathy's commentslogin

came here to look exactly for this thank you!


You’re welcome! I wanted it to add to Scour (https://scour.ing) but glad it was helpful for someone else too!


I agree with this fwiw, for many months I talked to people who never used o3 and didn’t know what it was because it sounded weird. Maybe it wasn’t obvious at the time but that was a good major point release to make then.


You’re absolutely right!

Jk jk, now that you pointed it out I can’t unsee it.


The CC point is more about the data and environmental and general configuration context, not compute and where it happens to run today. The cloud setups are clunky because of context and UIUX user in the loop considerations, not because of compute considerations.


Agree with the GP, though -- you ought to make that clearer. It really reads like you're saying that CC runs locally, which is confusing since you obviously know better.


I think we need to shift our mindset on what an agent is. The LLM is a brain in a vat connected far away. The agent sits on your device, as a mech suit for that brain, and can pretty much do damn near anything on that machine. It's there, with you. The same way any desktop software is.


Yeah, I made some edits to clarify.


Yes I noticed a few of these around. The LLM is a little too willing to give out grades for comments that were good/bad in a bit more general sense, even if they weren't making strong predictions specifically. Another thing I noticed is that the LLM has a very impressive recognition of the various usernames and who they belong to, and I think shows a little bit of a bias in its evaluations based on the identity of the person. I tuned the prompt a little bit based on some low-hanging fruit mistakes but I think one can most likely iterate it quite a bit further.


I think you were getting at this, but in case others didn't know: cstross is a famous sci-fi author and futurist :)


Thank you


It will work great with 40GB GPU, probably a bit less than twice slower. These are micro models of a few B param at most and fit easily during both training and inference.


How low can this go? Can this run on a 5090 card (32GiB)?


Set nproc_per_node-1 instead of 8 (or run the training script directly instead of using torchrun) and set device_batch_size=4 instead of 32. You may be able to use 8 with a 5090, but it didn't work on my 4090. However it's way slower than expected, one H100 isn't 250x the 4090, so I'm not sure it's training correctly. I'll let it run overnight and see if the outputs make any sense, maybe the metrics are not accurate in this config.


Still under development, remaining work includes tuning nanochat (current state being solid v0.1) and finalizing the in-between projects so that students can "unlock" all complexity that hides underneath: `torch.Tensor`, `torch.dist`, `.backward()`, '.compile()`, etc. And then the more ops heavy aspects.


What's the pricing for the course/EurekaLabs? P.s. thanks for all you're doing


Sorry I thought it would be clear and could have clarified that the code itself is just a joke illustrating the point, as an exaggeration. This was the thread if anyone is interested

https://chatgpt.com/share/68e82db9-7a28-8007-9a99-bc6f0010d1...


This part from the first try made me laugh:

      if random.random() < 0.01:

          logging.warning("This feels wrong. Aborting just in case.")

          return None


I actually laughed when I read that. This one got me, too. The casual validation of its paranoia gives me Marvin the Paranoid Android vibes.

  try:
      result = a / b
      if math.isnan(result):
          raise ArithmeticError("Result is NaN. I knew this would happen.")


I think that’s the funniest joke I’ve ever seen an LLM make. Which probably means it’s copied from somewhere.


"Why is a laser beam like goldfish? Because neither one can whistle." - Mike, The Moon is a Harsh Mistress


Fantastic book, just read it. Surprised no movie has been made.


If you haven't read Ursula Le Guin's "The Dispossessed", check it out too.

It's like a fine wine pairing for "The Moon is a Harsh Mistress."


The protagonists are libertarians with teenage harems, who fake an election and team up with with a sex pest. That's extremely reductive to the point of parody, but that will likely be the media coverage of it then moment someone reads the women and politics in the book.

If you completely excise anything too distasteful for a current-day blockbuster, but want a film about a space mining colony uprising you might as well just adapt the game Red Faction instead: have the brave heros blasting away with abandon at corpo guards, mad genetic experimenters and mercenaries and the media coverage can talk about how it's a genius deconstruction of Elon Musk's Martian dream or whatever.


You’d think some filmmaker would have run with the dystopian theme. The accuracy of the book’s predictions is impressive, even the location of the North American Space Defense Command. The biggest miss was people using wired telephones everywhere.


I liked it when I was 17 but have soured on it later after re-reading.

The only reason their libertarian revolution succeeds is because they have a centralised computer that secretly does everything for them.


> I liked it when I was 17

same with pretty much every scifi movie and book from my youth. What movies that wouldn't have been rendered ridiculous by the invention of the cellphone were done in by the hairstyles or fashion.


If you're an extensive user of ChatGPT, or if you can give it some material about yourself like say, a resume or a LinkedIn profile, ask it to roast you. It will be very specific to the content you give it. Be warned, it can be brutal.


Whoa dude! It was brutal, but highly constructive! Actually extremely helpful (and quite funny, though I have a high sense of humor about things so others might not appreciate some of it :-D)

This was my favorite line after asking it to review my resume and roast me:

> Structure & Flow: “Like Kubernetes YAML — powerful, but not human-readable.”

Some other good ones:

> Content & Tone: “You’re a CTO — stop talking like a sysadmin with a thesaurus.”

> Overall Impression: “This resume is a technical symphony… that goes on for too many movements.”

I've got some resume work to do haha


They meant roast you, not your resume.


So rehash of top comments in /r/roastme?


I came back to this comment just to thank you - I started off with Claude, feeding it my personal site, my résumé, the HN roast of me, etc. and it was super funny.

But then, I veered that same conversation into asking for GTM (go to market) advice, and it was actually really good. It actually felt tailored to me (unsurprisingly) and a lot more useful.

As always, I don't know whether this is a very light form of "ai psychosis" haha but still, super grateful for the advice. Cheers


Periodic reminder that there’s also HN Wrapped. [0]

[0]: https://hn-wrapped.kadoa.com


ooooh boy, gotta mentally prepare myself for this one

<press enter>

damn these ai's are good!

<begins shopping for new username>


"The user will start a comment with 'I'm a social libertarian but...' only to be immediately downvoted by both libertarians and socialists. The irony will not be lost on them, just everyone else."

I can't say I'm not impressed. That's very funny


>You voted with your feet and moved to Western Europe for better well-being, but you still won't vote with your cursor and use a browser other than Edge.

I love this and hate this at the same time.


Absolutely hilarious, and gives me some self awareness tbh


Spot on and I don't even mind.


It would not be shocking if LLMs are legitimately better at making jokes about tasks they are extensively trained on.


Years and years ago, the MongoDB Java driver had something like this to skip logging sometimes in one of its error handling routines.

   } catch (Exception e) {
                if (!((_ok) ? true : (Math.random() > 0.1))) {
                    return res;
                }

                final StringBuilder logError = (new StringBuilder("Server seen down: ")).append(_addr);

                /* edited for brevity: log the error */
 
https://github.com/mongodb/mongo-java-driver/blob/1d2e6faa80...


One of my earlier jobs a decade ago involved doing pipeline development and Jenkins administration for the on-site developer lab on one of the NRO projects, and I inserted a random build failure code snippet to test that pipelines could recover from builds that failed for unpredictable reasons, like a network error rather than anything actually wrong with the build. I had to do this on the real system because we didn't have funds for a staging environment for the dev environment, and naturally I forgot to get rid of it when I was done. So builds randomly failed for years after that before I remembered and fixed it.


If we’re talking about funny error msgs, a buddy of mine got this yesterday in salesforce. It’s not _that_ funny but pretty funny for Salesforce.

System.DmlException: Insert failed. First exception on row 0; first error: UNKNOWN_EXCEPTION, Something is very wrong: []


I think there’s always a danger of these foundational model companies doing RLHF on non-expert users, and this feels like a case of that.

The AIs in general feel really focused on making the user happy - your example, and another one is how they love adding emojis to the stout and over-commenting simple code.


This feels like RLVR, not RLHF.

With RLVR, the LLM is trained to pursue "verified rewards." On coding tasks, the reward is usually something like the percentage of passing tests.

Let's say you have some code that iterates over a set of files and does processing on them. The way a normal dev would write it, an exception in that code would crash the entire program. If you swallow and log the exception, however, you can continue processing the remaining files. This is an easy way to get "number of files successfully processed" up, without actually making your code any better.


> This is an easy way to get "number of files successfully processed" up, without actually making your code any better.

Well, it depends a bit on what your goal is.

Sometimes the user wants to eg backup as many files as possible from a failing hard drive, and doesn't want to fail the whole process just because one item is broken.


You're right, but the way to achieve this is to allow the error to propagate at the file level, then catch it one function above and continue to the next one.

However, LLM generated code will often, at least in my experience, avoid raising any errors at all, in any case. This is undesirable, because some errors should result in a complete failure - for example, errors which are not transient or environment related but a bug. And in any case, a LLM will prefer turning these single file errors into warnings, though the way I see it, they are errors. They just don't need to abort the process, but errors nonetheless.


Yes, that's cleaner.

> And in any case, a LLM will prefer turning these single file errors into warnings, though the way I see it, they are errors.

Well, in general they are something that the caller should have opportunity to deal with.

In some cases, aborting back to the caller at the first problem is the best course of action. In some other cases, going forward and taking note of the problems is best.

In some systems, you might event want to tell the caller about failures (and successes) as they occur, instead of waiting until the end.

It's all very similar to the different options people have available when their boss sends them on an errand and something goes wrong. A good underling uses their best judgement to pick the right way to cope with problems; but computer programs don't have that, so we need to be explicit.

See https://en.wikipedia.org/wiki/Mission-type_tactics for a related concept in the military.


And more advanced users are more likely to opt out of training on their data, Google gets around it with a free api period where you can't opt out and I think from did some of that too, through partnerships with tool companies, but not sure if you can ever opt out there.


*grok, not 'from'


'over-commenting simple code' is preparing it for future agent work. pay attention to those comments to learn how you can better scaffold for agents.


They do seem to leave otherwise useless comments for itself. Eg: on the level of

// Return the result

return result;

I find this quite frustrating when reading/reviewing code generated by AI, but have started to appreciate that it does make subsequent changes by LLMs work better.

It makes me wonder if we'll end up in a place where IDEs hide comments by default (similar to how imports are often collapsed by default/automatically managed), or introduce some way of distinguishing between a more valuable human written comment and LLM boilerplate comments.


They should have a step to remove those sorts of comments, they only add noise to the code.


This is stunning English: "Perfect setup for satire. Here’s a Python function that fully commits to the bit — a traumatically over-trained LLM trying to divide numbers while avoiding any conceivable danger:" "Traumatically over-trained", while scoring zero google hits, is an amazingly good description. How can it intuitively know what "traumatic over-training" should mean for LLMs without ever having been taught the concept?


I don't know. It's a classic LLM-ism. "Traumatically over-X" is probably a common enough phrase. The prmpt says, "I don't know what labs are doing to these poor LLMs during RL," so the model connects that to some form of trauma. The training is traumatic, so the model is traumatically over-trained.

It sounds fine and flows nicely, but it doesn't quite make sense. Too much training over-fits an LLM; that's not what we're describing. Bad training might traumatize a model, but bad how? A creative response would suggest an answer to that question—perhaps the model has been made paranoid, scarred by repeat exposure to the subtlest and most severe bugs ever discovered—but the LLM isn't being creative. Its response has that spongy, plastic LLM texture that comes from the model rephrasing its prompt to provide a sycophantic preamble for the thing that was actually being asked for. It uses new words for the same old idea, and a bit of the precision is lost during the translation.


Eh, you are rationalizing. The phrase "traumatically over-X" is extremely rare. Any problem is easy after you've seen the solution. :) The solution "traumatically over-trained LLM" to the problem "What description best fits karpathy's description?" is certainly not easy to find. Connecting RL, poor LLMs, extreme fear, and welfare to excess training and severe lasting emotional pain is pretty darn impressive. E.g., I know exactly what situation karpathy describes is, but I couldn't in a million years put it into writing as succinctly and as precisely as the LLM.


> The phrase "traumatically over-X" is extremely rare.

There are plenty of "over-x" phrases in English associated with trauma or harm. Do a web search in quotes for "traumatic over{extension/exertion/stimulation}" (off the top of my head) and you'll get direct hits. And this isn't a Markov chain—its doesn't have to pull n-grams directly from its training material. That it could glue trauma and training into "traumatic over-training" is deeply unsurprising to me.

> I couldn't in a million years put it into writing as succinctly and as precisely as the LLM.

If that's the case, then (with respect) that may be down to your skills as a writer. The LLM puts it decently enough, but it's not very expressive and it doesn't add anything.

> Connecting RL, poor LLMs, extreme fear, and welfare to excess training and severe lasting emotional pain is pretty darn impressive

Is it? Really, we're just analogizing it to an abused pet. You over-train your dog, so it gets traumatized. The LLM connects the ideas and then synthesizes a lukewarm sentence to capture that connection at the cost of losing a degree of precision, because LLMs aren't animals. Models are good at those vector-embedding-style conceptual connections—I won't begrudge them that. Expressive use of language and fine-grained reasoning, though? Not so much.


Hard to know but if you could express "traumatically" as a number, and "over-trained" as a number, it seems like we'd expect "traumatically" + "over-trained" to be close to "traumatically over-trained" as a number. LLMs work in mysterious ways.


LLMs operate at token level, not word. it doesn't operate in terms of "traumatic", "over-training", "over" or "training", but rather "tr" "aum" "at" "ic, ", etc.


I think you are confusing tokens with vectors/embedding/parameters.

king and rex (king in latin) map to different tokens but will map to very similar vectors.


> it doesn't operate in terms of "traumatic", "over-training", "over" or "training", but rather "tr" "aum" "at" "ic, ", etc.

And "毛片免费观看" (Free porn movies), "天天中彩票能" (Win the lottery every day), "热这里只有精品" (Hot, only fine products here) etc[1].

[1]: https://news.ycombinator.com/item?id=45483924


Weird thing I've noticed.

Some LLMs can output nerd font glyphs and others can't.

If I recall grok code fast can but codex and sonnet can't


“Traumatic overtraining” does have hits though. My guess is that “traumatically” is a rarely used adverb, and “traumatic” is much more common. Possibly it completed traumatic into an adverb and then linked to overtraining which is in the training data. I dunno how these things work though.


You need to read more if you think that's stunning English


The same way that you and I think up a word and what it might mean without being taught the concept.

Adverb + verb


But the machines cannot possibly have the magic brain-juice!


> How can it intuitively know what "traumatic over-training" should mean for LLMs without ever having been taught the concept?

Because, and this is a hot take, LLMs have emergent intelligence


Or language has patterns


Kind of interesting it didn't add type hints though! You'd think for all that paranoia it would at least add type hints.



It was a great joke, that's why I posted it


<3


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: