My take on the difference between now and then is “effort”. All those things mentioned above are now effortless but the door to “effort” remains open as it always has been. Take the first point for example. Those little black boxes of AI can be significantly demystified by, for example, watching a bunch of videos (https://karpathy.ai/zero-to-hero.html) and spending at least 40 hours of hard cognitive effort learning about it yourself. We used to purchase software or write it ourselves before it became effortless to get it for free in exchange for ads and then a subscription when we grew tired of ads or were tricked into bait and switch. You can also argue that it has never been easier to write your own software than it is today.
Hostile operating systems. Take the effort to switch to Linux.
Undocumented hardware, well there is far more open source hardware out there today and back in the day it was fun to reverse engineer hardware, now we just expect it to be open because we couldn’t be bothered to put in the effort anymore.
Effort gives me agency. I really like learning new things and so agentic LLMs don’t make me feel hopeless.
I’ve worked in the AI space and I understand how LLMs work as a principle. But we don’t know the magic contained within a model after it’s been trained. We understand how to design a model, and how models work at a theoretical level. But we cannot know how well it will be at inference until we test it. So much of AI research is just trial and error with different dials repeated tweaked until we get something desirable. So no, we don’t understand these models in the same way we might understand how an hashing algorithm works. Or a compression routine. Or an encryption cypher. Or any other hand-programmed algorithm.
I also run Linux. But that doesn’t change how the two major platforms behave and that, as software developers, we have to support those platforms.
Open source hardware is great but it’s not on the same league of price and performance as proprietary hardware.
Agentic AI doesn’t make me feel hopeless either. I’m just describing what I’d personally define as a “golden age of computing”.
but isn't this like a lot of other CS-related "gradient descent"?
when someone invents a new scheduling algorithm or a new concurrent data structure, it's usually based on hunches and empirical results (benchmarks) too. nobody sits down and mathematically proves their new linux scheduler is optimal before shipping it. they test it against representative workloads and see if there is uplift.
we understand transformer architectures at the same theoretical level we understand most complex systems. we know the principles, we have solid intuitions about why certain things work, but the emergent behavior of any sufficiently complex system isn't fully predictable from first principles.
that's true of operating systems, distributed databases, and most software above a certain complexity threshold.
No. Algorithm analysis is much more sophisticated and well defined than that. Most algorithms are deterministic, and it is relatively straightforward to identify complexity, O(). Even nondeterministic algorithms we can evaluate asymptotic performance under different categories of input. We know a lot about how an algorithm will perform under a wide variety of input distributions regardless of determinism. In the case of schedulers, and other critical concurrency algorithms, performance is well known before release. There is a whole subfield of computer science dedicated to it. You don't have to "prove optimality" to know a lot about how an algorithm will perform. What's missing in neural networks is the why and how any inputs will propagate, through the network during inference. It is a black box of understandability. Under a great deal of study, but still very poorly understood.
i agree w/ the the complexity analysis point, but that theoretical understanding actually translates to real world deployment decisions in both subfields. knowing an algorithm is O() tells you surprisingly little about whether itll actually outperform alternatives on real hardware with real cache hierarchies, branch predictors, and memory access patterns. same thing with ML (just with the very different nature of GPU hw), both subfields hve massive graveyards of "improvements" that looked great on paper (or in controlled environments) but never made it into production systems. arxiv is full of architecture tweaks showing SOTA on some benchmark and the same w/ novels data structures/algorithms that nobody ever uses at scale.
I think you missed the point. Proving something is optimal, is a much higher bar than just knowing how the hell the algorithm gets from inputs to outputs in a reasonable way. Even concurrent systems and algorithm bounds under input distributions have well established ways to evaluate them. There is literally no theoretical framework for how a neural network churns out answers from inputs, other than the most fundamental "matrix algebra". Big O, Theta, Omega, and asymptotic performance are all sound theoretical methods to evaluate algorithms. We don't have anything even that good for neural networks.
>Those little black boxes of AI can be significantly demystified by, for example, watching a bunch of videos (https://karpathy.ai/zero-to-hero.html) and spending at least 40 hours of hard cognitive effort learning about it yourself.
That's like saying you can understand humans by watching some physics or biology videos.
Except it's not. Traditional algorithms are well understood because they're deterministic formulas. We know what the output is if we know the input. The surprises that happen with traditional algorithms are when they're applied in non-traditional scenarios as an experiment.
Whereas with LLMs, we get surprised even when using them in an expected way. This is why so much research happens investigating how these models work even after they've been released to the public. And it's also why prompt engineering can feel like black magic.
I think the historical record pushes back pretty strongly on the idea that determinism in engineering is new. Early computing basically depended on it. Take the Apollo guidance software in the 60s. Those engineers absolutely could not afford "surprising" runtime behavior. They designed systems where the same inputs reliably produced the same outputs because human lives depended on it.
That doesn't mean complex systems never behaved unexpectedly, but the engineering goal was explicit determinism wherever possible: predictable execution, bounded failure modes, reproducible debugging. That tradition carried through operating systems, compilers, finance software, avionics, etc.
What is newer is our comfort with probabilistic or emergent systems, especially in AI/ML. LLMs are deterministic mathematically, but in practice they behave probabilistically from a user perspective, which makes them feel different from classical algorithms.
So I'd frame it less as "determinism is new" and more as "we're now building more systems where strict determinism isn't always the primary goal."
Going back to the original point, getting educated on LLMs will help you demystify some of the non-determinism but as I mentioned in a previous comment, even the people who literally built the LLMs get surprised by the behavior of their own software.
That’s some epic goal post shifting going on there!!
We’re talking about software algorithms. Chemical and biomedical engineering are entirely different fields. As are psychology, gardening, and morris dancing
Yeah. Which any normal person would take to mean “all technologies in software engineering” because talking about any other unrelated field would just be silly.
We know why they work, but not how. SotA models are an empirical goldmine, we are learning a lot about how information and intelligence organize themselves under various constraints. This is why there are new papers published every single day which further explore the capabilities and inner-workings of these models.
Ok, but the art and science of understanding what we're even looking at is actively being developed. What I said stands, we are still learning the how. Things like circuits, dependencies, grokking, etc.
The UV light source that ASML uses for its lithography machine is technology that was acquired in the 2013 acquisition of San Diego-based Cymber. The tech stack for the advanced light source dates back to the "EUV LLC Initiative," led by DARPA in the mid-1990s.
US origins of the key technology here give the US a veto here regardless of ASML's Dutch headquarters.
Yes, the mechanism is effectively EAR/FDPR and you can check (https://en.wikipedia.org/wiki/Extreme_ultraviolet_lithograph...) for more detailed history, but the EUV technology ASML further developed and commercialized (no small feat) was licensed from a US Govt created consortium by which the government maintains their IP rights.
I wonder if this is the same reason why Microsoft's Remote SSH plugin on VS Code is so flaky even with a decent internet connection. Every couple of months I try to give it another go and give up due to the poor keyboard latency I inevitably experience. And the slow reconnects whenever I glance away from my computer monitor briefly. This is on a fiber connection with a 20ms ping to the remote machine.
You surely mean the latency in its embedded terminal and not the code editor, right? I use VSCode’s remote SSH specifically so that code editing doesn’t suck. It really does not.
That’s an interesting speculation, but I’m inclined to believe their official reasoning. (That being they just didn’t really care about the format and/or went with whatever Chrome said at first. A year or so later they changed their mind and said they wanted an implementation in a memory-safe language, which prompted the JXL team to work on it.)
Yes and yes. I make open source software because I fundamentally enjoy the act of learning something new and then applying that knowledge by making something. I publish it for the ego boost only. I am equally likely to be irritated by contributions than to be excited by them. My day job contributions are up for scrutiny but the personal projects I publish on github are my island, my sovereign ground. As exciting as PR interest is, sometimes I don’t really want someone to paint over my painting. It’s mine after all. I obviously don’t speak for all open source contributors but I don’t want compensation. If someone wants to fork my work and turn it into a community then they are free to do so as a result of my licensing choice. If the first few contributions I receive are pleasant and someone takes over then that is great too. My point is that not all creators are aggregators. Leave us alone and stop complaining. We gave it away for free after all.
vLLM needs to perform similar operations to an operating system. If you write an operating system in Python you will have scope for many 40% improvements all over the place and in the end it won’t be Python anymore, at least under the hood it won’t be.
It's not about the python at all. Optimization techniques are on a completely different level, on the level of the chip and/or hw platform and finding ways to utilize them in a max manner by exploiting the intrinsic details about their limitations.
< “Those often require estimates in much the same way they're required from Star Trek's engineers: so the people with main character syndrome have something to dramatically ignore or override to prove their dominance over the NPCs and material reality.”
Thanks. It was hard won. I spent maybe a decade naively thinking that if we just made software methods that worked in service of stated business goals and values, they'd get adopted and we'd all live happily ever after.
It took me a long time to come to grips with the POSIWID [1] version of the purpose of planning and estimates. One of the things that really blew my mind is Mary Poppendieck's story about how they built the Empire State Building on time and under budget even though they didn't have it fully designed when they started. [2] Different, more effective approaches are not only possible, they exist. But they can no longer win out, and I think it's because of the rise of managerialism, the current dominant ideology and culture of big business. [3]
Thanks for the links. To the limit of my influence I try to protect my team from distractions, be fluid about methodology (constant agile churn can be depressing), limit the toxicity of pull requests, and to spend as much time with them as I can. A happy team is a productive team. Oh and I try not to work with leaders who obsess over Gantt charts. To me estimates are more about trust and respect rather than metrics and velocity. It has to be the right kind of company though.
Further: In The Next Generation, when Scotty shows up, he mentions to Geordi he anyways padded his estimates because he knew Kirk would do things like that.
Looks like they didn’t meet the minimum crew requirement on this one.
reply