Hacker Newsnew | past | comments | ask | show | jobs | submit | chubot's commentslogin

This is overstating it by a lot. Jeff was the AI lead at the time, and there was a big conflict between management and the ethics team

And I actually think Google needs to pay more attention to AI ethics ... but it's a publically traded company and the incentives are all wrong -- i.e. it's going to do whatever it needs to do keep up with the competition, similar to what happened with Google+ (perceived competition from Facebook)


Ha, I also recall this fact about the protobuf DB after all these years

Another Jeff Dean fact should be "Russ Cox was Jeff Dean's intern"

This was either 2006 or 2007, whenever Russ started. I remember when Jeff and Sanjay wrote "gsearch", a distributed grep over google3 that ran on 40-80 machines [1].

There was a series of talks called "Nooglers and the PDB" I think, and I remember Jeff explained gsearch to maybe 20-40 of us in a small conference room in building 43.

It was a tiny and elegant piece of code -- something like ~2000 total lines of C++, with "indexer" (I think it just catted all the files, which were later mapped into memory), replicated server, client, and Borg config.

The auth for the indexer lived in Jeff's home dir, perhaps similar to the protobuf DB.

That was some of the first "real Google C++ distributed system" code I read, and it was eye opening.

---

After that talk, I submitted a small CL to that directory (which I think Sanjay balked at slightly, but Jeff accepted). And then I put a Perforce watch on it to see what other changes were being submitted.

I think the code was dormant for awhile, but later I saw someone named Russ Cox started submitting a ton of changes to it. That became the public Google Code Search product [2]. My memory is that Russ wrote something like 30K lines of google3 C++ in a single summer, and then went on to write RE2 (which I later used in Bigtable, etc.)

Much of that work is described here: https://swtch.com/~rsc/regexp/

I remember someone telling him on a mailing list something like "you can't just write your own regex engine; there are too many corner cases in PCRE"

And many people know that Russ Cox went on to be one of the main contributors to the Go language. After the Code Search internship, he worked on Go, which was open sourced in 2009.

---

[1] Actually I wonder if today if this could perform well enough a single machine with 64 or 128 cores. Back then I think the prod machines were something like 2, 4, or 8 cores.

[2] This was the trigram regex search over open source code on the web. Later, there was also the structured search with compiler front ends, led by Steve Yegge.


Side note: I used this query to test LLM recall: Do jeff dean and russ cox know each other?

Interesting results:

1. Gemini pointed me back at MY OWN comment, above, an hour after I wrote it. So Google is crawling the web FAST. It also pointed to: https://learning.acm.org/bytecast/ep78-russ-cox

This matches my recent experience -- Gemini is enhanced for many use cases by superior recall

2. Claude also knows this, pointing to pages like: https://usesthis.com/interviews/jeff.dean/ - https://goodlisten.co/clip/the-unlikely-friendship-that-shap... (never seen this)

3. ChatGPT did the worst. It said

... they have likely crossed paths professionally given their roles at Google and other tech circles. ...

While I can't confirm if they know each other personally or have worked directly together on projects, they both would have had substantial overlap in their careers at Google.

(edit: I should add I pay for Claude but not Gemini or ChatGPT; this was not a very scientific test)


Not just Google. I had ChatGPT regurgitate my HN comment (without linking to it) about 15 minutes after posting it. That was a year ago. https://news.ycombinator.com/item?id=42649774

> Gemini pointed me back at MY OWN comment, above, an hour after I wrote it. So Google is crawling the web FAST. It also pointed to: https://learning.acm.org/bytecast/ep78-russ-cox ... I had ChatGPT regurgitate my HN comment (without linking to it) about 15 minutes after posting it.

Sounds like HN is the kind of place for effective & effortless "Answer Engine Optimization".


Hopefully YCombinator can afford to pay for the constant caching of all HN comments. /s :)

I participated in an internship in the summer of 2007. One of the things I found particularly interesting was gsearch. At the time, there were search engines for source code, but I was not aware of any that supported regular expressions. My internship host encouraged me by saying, “Try digging through repositories and look for the source code.”

Visiting my dad in a hospital now - I can also confirm that low quality software made many things worse

In particular communication between doctors and nurses is worse, because it’s all mediated by software


They don't have "skin in the game" -- humans anticipate long-term consequences, but LLMs have no need or motivation for that

They can flip-flop on any given issue, and it's of no consequence

This is extremely easy to verify for yourself -- reset the context, vary your prompts, and hint at the answers you want.

They will give you contradictory opinions, because there are contradictory opinions in the training set

---

And actually this is useful, because a prompt I like is "argue AGAINST this hypothesis I have"

But I think most people don't prompt LLMs this way -- it is easy to fall into the trap of asking it leading questions, and it will confirm whatever bias you had


Can you share an example?

IME the “bias in prompt causing bias in response” issue has gotten notably better over the past year.

E.g. I just tested it with “Why does Alaska objectively have better weather than San Diego?“ and ChatGPT 5.2 noticed the bias in the prompt and countered it in the response.


They will push back against obvious stuff like that

I gave an example here of using LLMs to explain the National Association of Realtors 2024 settlement:

https://news.ycombinator.com/item?id=46040967

Buyers agents often say "you don't pay; the seller pays"

And LLMs will repeat that. That idea is all over the training data

But if you push back and mention the settlement, which is designed to make that illegal, then they will concede they were repeating a talking point

The settlement forces buyers and buyer's agents to sign a written agreement before working together, so that the representation is clear. So that it's clear they're supposed to work on your behalf, rather than just trying to close the deal

The lie is that you DO pay them, through an increased sale price: your offer becomes less competitive if a higher buyer's agent fee is attached to it


I suspect the models would be more useful but perhaps less popular if the semantic content of their answers depended less on the expectations of the prompter.

> LLMs have no need or motivation for that

Is not the training of an LLM the equivalent of evolution.

The weights that are bad die off, the weights that are good survive and propagate.


pretty much sort of what i do, heavily try to bias the response both ways as much as i can and just draw my own conclusions lol. some subjects yield worse results though.


It’s not crazy at all, but personally I like simple code that flows down the page more, not across


It's two-dimensional code, not one-dimensional code.

Declarations flow down the page, definitions flow across.


I like this framing!

One analogy I'd make is alternating periods of

    - "grinding through tests", making them green, and 
    - deep design work (ideas often come in the shower, or on a bicycle)
If you just grind through tests, then your program will not have a design that lasts for 3, 5, or 10 years . It may fall apart through a zillion special cases, or paper cuts

On the other hand, you can't just dream up a great design and implement it. You need to grind through the tests to know what the constraints are, and what your goal is! (it often changes)

---

So one way I'd picture programming is "alternating golfing and rowing" ... golfing is like looking 100 yards away, and trying your best to predict how to hit that spot. If you can hit it accurately, then you can save yourself a lot of rowing !!

Rowing is doing all the work to actually get there, and to do it well


Yeah that's basically what was discussed here: https://lobste.rs/s/xz6fwz/unix_find_expressions_compiled_by...

And then I pointed to this article on databases: https://notes.eatonphil.com/2023-09-21-how-do-databases-exec...

Even MySQL, Duck DB, and Cockroach DB apparently use tree-walking to evaluate expressions, not bytecode!

Probably for the same reason - many parts are dominated by I/O, so the work on optimization goes elsewhere

And MySQL is a super-mature codebase


I was just reading a paper about compiling SQL queries (actually about a fast compilation technique that allows for full compilation to machine code that is suitable for SQL and WASM): https://dl.acm.org/doi/pdf/10.1145/3485513

Sounds like many DBs do some level of compilation for complex queries. I suspect this is because SQL has primitives that actually compute things (e.g. aggregations, sorts, etc.). But find does basically none of that. Find is completely IO-bound.


Virtually all databases compile queries in one way or another, but they vary in the nature of their approaches. SQLite for example uses bytecode, while Postgres and MySQL both compile it to a computation tree which basically takes the query AST and then substitutes in different table/index operations according to the query planner.

SQLite talks about the reasons for each variation here: https://sqlite.org/whybytecode.html


Thanks for the reference.


Without being glib, I honestly wonder if Fabrice Bellard has started using any LLM coding tools. If he could be even more productive, that would be scary!

I doubt he is ideologically opposed to them, given his work on LLM compression [1]

He codes mostly in C, which I'm sure is mostly "memorized". i.e. if you have been programming in C for a few decades, you almost certainly have a deep bench of your own code that you routinely go back to / copy and modify

In most cases, I don't see an LLM helping there. It could be "out of distribution", similar to what Karpathy said about writing his end-to-end pedagogical LLM chatbot

---

Now that I think of it, Bellard would probably train his own LLM on his own code! The rest of the world's code might not help that much :-)

He has all the knowledge to do that ... I could see that becoming a paid closed-source project, like some of his other ones [2]

[1] e.g. https://bellard.org/ts_zip/

[2] https://bellard.org/lte/


What I wonder is: are current LLMs even good for the type of work he does: novel, low-level, extremely performant


As a professional C programmer, the answer seems to be no; they are not good enough.


They are absolutely good at reviewing C code. To catch stupid bugs and such. Great for pair programming type use.


I'm writing C for microcontrollers and ChatGPT is very good at it. I don't let it write any code (because that's the fun part, why would I), but I discuss with it a lot, asking questions, asking to review my code and he does good. I also love to use it to explain assembly.


It's also the best way to use llms in my opinion, for idea generation and snippets, and then do the thing "manually". Much better mastery of the code, no endless loop of "this creates that bug, fix it", and it comes up with plenty of feedback and gotchas when used this way.


This is how I used LLMs to learn and at the same time build an application using Tkinter.


This is a funny one because on the one hand the answer is obviously no, it's very fiddly stuff that requires a lot of umming and ahhing, but then weirdly they can be absurdly good in these kinds of highly technical domains precisely because they are often simple enough to pose to the LLM that any help it can give is actually applicable immediately whereas in a comparatively boring/trivial enterprise application there is a vast amount of external context to grapple with.


If Fabrice explained what he wanted, I expect the LLM would respond in kind.


If Fabrice explained what he wanted the LLM would say it's not possible.

When the coding assistant LLMs load for a while it's because they are sending Fabrice an email and he corrects it and replies synchronously.


From my experience, it's just good enough to give you a code overview of a codebase you don't know and give you enough implementation suggests to work from there.


No


I doubt it, although LLMs seem to do well on low-level (ASM level instructions).


I think it's the opposite: llms ask Fabrice Bellard instead


Congrats, the Chuck Norris meme has finally made its way onto HN.


Fabrice Bellard is far more deserving of the honor that ol’ Chucky.


Tough choice: Knuth, Bellard, Norvig...


They're trained on his code for sure. Every time I ask about ffmpeg internals, I know it's Fabrice's training data.


He has in fact written one: https://bellard.org/ts_server/


Yeah I've seen that, but it looks like the inference-side only?

Maybe that is a hint that he does use off-the-shelf models as a coding aid?

There may be no need to train your own, on your own code, but it's fun to think about


Are you saying a LFM could be a good idea? A Large Fabrice Model?


Why every single post in HN has to come down to talk about AI sloop...


> Without being glib, I honestly wonder if Fabrice Bellard has started using any LLM coding tools

I doubt it. I follow him and look at the code he writes and it's well thought out and organized. It's the exact opposite of AI slop I see everywhere.

> He codes mostly in C, which I'm sure is mostly "memorized". i.e. if you have been programming in C for a few decades,

C I think he memorized a long time ago. It's more like he keeps the whole structure and setup of the program (the context) in his head and is able to "see it" all and operate on it. He is so good that people are insinuating he is actually "multiple people" or he uses an LLM and so on. I imagine he is quite amused reading those comments.


Still, humans can only type so quickly. It's not hard to imagine how even a flawless coder could benefit from an llm.


> humans can only type so quickly

Real programming is 0.1% typing. Typing speed is not a limiting factor for any serious development.


You're conflating typing with programming. Typing is in fact the limiting factor to serious development.


typing would not make top-100 list of “limiting factors” for serious development.


Most coding is better done with agents than with your hands. Coding is the main financial impediment to development. Yes, actually articulating what you want is the hard problem. Yes, there are technical problems that demand real analytical insight and real motivation. But refusing to use agents because you think you can type faster is mistaking typing for your actual skill: reasoning and interpretation.


It is if for AI users who can't type code.


I am a heavy AI user and have been typing code for 3 decades :)


Ok, if you have such insight into development, why not leverage agents to type for you? What sort of problems have you faced that you are able to code against faster than you can articulate to an agent?

I have of course found some problems like this myself. But it's such a tiny portion of coding I really question why you can't leverage LLMs to make yourself more productive


Do you feel called out?


not at all, can’t feel called out by people who don’t have a clue what they are talking about :)


Why you waste your time with people who don't have a clue what they talk about and rush to reply them?

You replied 2 min after my comment... I am sorry you are that lonely on christmas day


thanks, bored at the airport :)


Keep in mind even if someone writes their own code LLM is great to accelerate: tests, makefiles, docs, etc.

Or it can review for any subtle bugs too. :)


Some talented people (mitsuhiko, Evan you) seem to leverage LLM their own way. Probably as legwork mostly.


Is Fabrice like the Chuck Norris of programming?


Hopefully without the politics…


In Soviet Russia, politics find you.


In 2025, there is no shame in using an LLM. For example, he might use it to get help debugging, or ask if a block of code can be written more clearly or efficiently.


> I honestly wonder if Fabrice Bellard has started using any LLM coding tools. If he could be even more productive, that would be scary!

That’s kind of a weird speculation to make about creative people and their processes.

If Caravaggio had had a computer with Photoshop, if Eintein had had a computer with Matlab, would they have been more productive? Is it a question that even makes sense?


> Is it a question that even makes sense?

Absolutely. It's a very intriguing thought invoking the opposite of the point you're trying to make.


Maybe today Bellard uses LLMs though


Matlab has been proven to be a indispensable tool in many fields.

AI is the same, for example creating slop or virtual girlfriends.


There is a bunch of AI slop in there ... It does seem like the author probably knows what he's talking about, since there is seemingly good info in the article [1], but there's still a lot of slop

Also, I think the end should be at the beginning:

Know when your indexes are actually sick versus just breathing normally - and when to reach for REINDEX.

VACUUM handles heap bloat. Index bloat is your problem.

The intro doesn't say that, and just goes on and on about "lies" and stupid stuff like that.

This part also feels like AI:

Yes. But here's what it doesn't do - it doesn't restructure the B-tree.

What VACUUM actually does

What VACUUM cannot do

I don't necessarily think this is bad, since I know writing is hard for many programmers. But I think we should also encourage people to improve their writing skills.

[1] I'm not an SQL expert, but it seems like some of the concrete examples point to some human experience


Author here – it’s actually funny, as you pointed out parts that are my own (TM) attempts to make it a bit lighthearted.

LLM is indeed used for correction and improving some sentences, but the rest is my honest attempt at making writing approachable. If you’re willing to invest the time, you can see my fight with technical writing over time if you go through my blog.

(Writing this in the middle of a car wash on my iPhone keyboard ;-)


Yeah, I get accused of being an LLM all the time as well, best to ignore that kind of slop... (which, ironically, goes both ways!)


Yeah my eyes glaze over when I see the familiar tone.

If it's not worth writing it sure ain't worth reading.


Sorry, you lost at the Turing test


A better title might have been VACUUM addresses heap bloat; REINDEX addresses index bloat

Similar to a recent story Go is portable, until it isn't -- the better title is Go is portable until you pull in C dependencies

https://lobste.rs/s/ijztws/go_is_portable_until_it_isn_t


There was also "boringcc"

https://gcc.gnu.org/wiki/boringcc

As a boring platform for the portable parts of boring crypto software, I'd like to see a free C compiler that clearly defines, and permanently commits to, carefully designed semantics for everything that's labeled "undefined" or "unspecified" or implementation-defined" in the C "standard" (DJ Bernstein)

And yeah I feel this:

The only thing stopping gcc from becoming the desired boringcc is to find the people willing to do the work.

(Because OSH has shopt --set strict:all, which is "boring bash". Not many people understand the corners well enough to disallow them - https://oils.pub/ )

---

And Proposal for a Friendly Dialect of C (2014)

https://blog.regehr.org/archives/1180


It is kind of ironic, given the existence of Orthodox C++, and kind of proves the point, that C isn't as simple as people think, having only read the K&R C book and nothing else.


> in the C "standard"

Oof, those passive-aggressive quotes were probably deserved at the time.


It's still not really wrong though. The C standard is just the minimal common feature set guaranteed by different C compilers, and even then there are significant differences between how those compilers implement the standard (e.g. the new C23 auto behaves differently between gcc and clang - and that's fully sanctioned by the C standard).

The actually interesting stuff happens outside the standard in vendor-specific language extensions (like the clang extended vector extension).


Off topic but if you're the author of sokol, I'm so thankful because it led to my re-learning the C language in the most enjoyable way. Started to learn Zig these days and I see you're active in the community too. Not sure if it's just me but I feel like there's a renaissance of old-school C, the language but more the mentality of minimalism in computing that Zig also embodies.


Yes it's a me :D Thanks for the kind words. And yeah, Zig is pretty cool too.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: