Full-stack JavaScript-focused engineer/entrepreneur with lots of experience in building scalable single page apps in different frameworks, web3, multiplayer web games, building transformers and other networks, all sorts of things. Product-oriented, executive and leadership experience, comfortable in both autonomous and collaborative settings, capable Linux sysadmin. I know how to ship.
I've focused for several years now on agentic and generative work such as simulating networked LLM-augmented embodied agents, interface research, too much to list here but always happy to talk more over email. While the rest of the industry is just starting to latch onto agentic systems, I can offer years of experience and product insight.
Available for consulting, projects, anything web or agentic.
2025 has been a wild year for agentic coding models. Cutting-edge models in January 2025 don't hold a candle to cutting edge models in December 2025.
Just the jump from Sonnet 3.5 to 3.7 to 4.5, and Opus 4.5 has been pretty massive in terms of holistic reasoning, deep knowledge as well as better procedural and architectural adherence.
GPT-5 Pro convinced me to pay $200/mo for an OpenAI subscription. Regular 5.2 models, and 5.2 codex, are leagues better than GPT-4 when it comes to solving problems procedurally, using tools, and deep discussion of scientific, mathematic, philosophical and engineering problems.
Models have increasingly longer context, especially some Google models. OpenAI has released very good image models, and great editing-focused image models in general have been released. Predictably better multimodal inference over the short term is unlocking many cool near-term possibilities.
Additionally, we have seen some incredible open source and open weight models released this year. Some fully commercially viable without restriction. And more and more smaller TTS/STT projects are in active development, with a few notable releases this year.
Honestly, the landscape at the end of the year is impressive. There has been great work all over the place, almost too much to keep up with. I'm very interested in the Genie models and a few others.
For an idea:
At the beginning of the year, I was mildly successful getting at coding models to make changes in some of my codebases, but the more esoteric problems were out of reach. Progress in general was deliberate and required a lot of manual intervention.
By comparison, in the last week I've prototyped six applications at levels that would take me days to weeks individually, often developing multiple at the same time, monitoring agentic workflows and intervening only when necessary, relying on long preproduction phases with architectural discussions and development of documentation, requirements, SDDs... and detailed code review and refactoring processes to ensure adherence to constraints. I'm morphing from a very busy solo developer into a very busy product manager.
>By comparison, in the last week I've prototyped six applications at levels that would take me days to weeks individually [...]
I don't doubt that the models have got better, but you can go back two or three years and find people saying the exact same stuff about the latest models back then.
I don't think that's true of three years ago - that's taking us back into GPT-3 territory.
And two years ago we were mostly still stuck with GPT-4 which had an 8,000 input context limit, very challenging to get real coding work done with that.
Easy enough to prove though, find some examples of people saying that 2-3 years ago and I shall concede the point!
GPT-4 was released in March 2023, so it pretty clearly comes under the heading of “two or three years” ago. It’s only three months shy of its third birthday.
I see that 2023 LinkedIn has (deservedly) gone down your memory hole, but it is very easy to find innumerable examples of people saying this kind of thing:
> Just the jump from Sonnet 3.5 to 3.7 to 4.5, and Opus 4.5 has been pretty massive in terms of holistic reasoning, deep knowledge as well as better procedural and architectural adherence.
I don't really agree. Aside from how it handled frontend code, changes in Sonnet did not truly impact my overall productivity (from Sonnet 3.7 to 4 to 4.5, i did not try 3.5). Opus 4.5/Codex 5.2 are when the changes truly happenned for me (and i'm still a bit distrustfull of Codex 5.2, but i use it basically to help me during PRs).
That's fine. Maybe you're holding it wrong, or maybe your work is too esoteric/niche/complex for newer models to be bigger productivity boosters. Some of mine certainly is, I get that. But for other stuff, these newer models are incredible productivity boosters.
I also chat with these models for long hours about deep, complicated STEM subjects and am very impressed with the level of holistic knowledge and wisdom compared to models a year ago. And the abstract math story has gotten sooooo much better.
There is a vast amount of complexity involved in rolling things from scratch today in this fractured ecosystem and providing the same experience for everyone.
Sometimes, the reduction of development friction is the only reason a product ends up in your hands.
I say this as someone whose professional toolkit includes Docker, Python and Electron; Not necessarily tools of choice, but I'm one guy trying to build a lot of things and life is short. This is not a free lunch and the optimizer within me screams out whenever performance is left on the table, but everything is a tradeoff. And I'm always looking for better tools, and keep my eyes on projects such as Tauri.
I've been running either Qubes OS or KVM/QEMU based VMs as my desktop daily driver for 10 years. Nothing runs on bare metal except for the host kernel/hypervisor and virt stack.
I've achieved near-native performance for intensive activities like gaming, music and visual production. Hardware acceleration is kind of a mess but using tricks like GPU passthrough for multiple cards, dedicated audio cards and and block device passthrough, I can achieve great latency and performance.
One benefit of this is that my desktop acts as a mainframe, and streaming machines to thin clients is easy.
My model for a long time has been not to trust anything I run, and this allows me to keep both my own and my client's work reasonably safe from a drive-by NPM install or something of that caliber.
Now that I also use a Apple Silicon MacBook as a daily driver, I very much miss the comfort of a fully virtualized system. I do stream in virtual machines from my mainframe. But the way Tahoe is shaping up, I might soon put Asahi on this machine and go back to a fully virtualized system.
I think this is the ideal way to do things, however, it will need to operate mostly transparently to an end user or they will quickly get security fatigue; the sacrifices involved today are not for those who lack patience.
I think it's fine if you do it for yourself. It's a bit of a poor man's Linux-turned-microkernel solution. In fact, I work like this too, and this extends to my Apple Silicon Mac. The separation does have big security advantages, especially when different pieces of hardware are exclusively passed to the different, closed-off "partitions" of the system and the layer orchestrating everything is as minimal as it gets, or at least as guarded against the guests as it gets.
What worries me is when this model escalates from being cobbled up together by a system administrator with limited resources, to becoming baked into the design of software; the appropriation of the hypervisor layer by software developers who are reluctant to untangle the mess they've created at the user/kernel boundary of their program and instead start building on top of hardware virtualization for "security", to ultimately go on and pollute the hypervisor as the level of host OS access proves insufficient. This is beautifully portrayed by the first XKCD you've linked. I don't want to lose the ability to securely run VMs as the interface between the host and the guest OSes grows just as unmanageable as that of Linux and BSD system calls and new software starts demanding that I let it use the entirety of it, just like some already insists that I let it run as root because privilege dropping was never implemented.
If you develop software, you should know what kind of operating system access it needs to function and sandbox it appropriately, using the operating system's sandboxing facilities, not the tools reserved for system administrators.
I'm not talking about an IBM mainframe. The definition Google gives me for mainframe is `a large high-speed computer, especially one supporting numerous workstations or peripherals`, which is exactly what my machine is.
mainframe (noun)
main· frame ˈmān-ˌfrām
1: a large, powerful computer that can handle many tasks concurrently and is usually used commercially
2: (dated): a computer with its cabinet and internal circuits especially when considered separately from any peripherals connected to the computer
The features you list are great to have, but my setup fits the first definition of mainframe as described. If you feel this definition is not specific enough, email Merriam-Webster and don't bother me about it.
Webster is wrong. A mainframe is not a generic high performance computer (that would be HPC). A mainframe is a very specific high performance computer.
I repeat: I understand that mainframe has a specific meaning to many people, especially those who work on traditional mainframes, but I would rather you and the other user to email both Google and Merriam-Webster about their wrong definitions, and not bother me about it. I will correct my usage once they have updated the definition to your standards.
M2 MBP here. Definitely skipping Tahoe. Sequoia is already just terrible, not only is the UX clunky and hostile, but Apple seems to have flat out broken its Bluetooth and networking stacks in multiple ways, and in general the system is extremely unstable.
Best hardware around, but at this point I might even take W11 over this locked down mess. At least Asahi support is decent these days.
And I'm tired of paying for things that should be stock, such as proper window and mouse management, or reasonable fan control so that the keyboard doesn't burn my fingers under moderate workloads.
Your suggestion is to not use the platform as intended, and to understand the source code of the extension. That advice is not actionable by non-technical people and does not help mitigate mass surveillance.
Ok, should we just use the provided 'app' and assume things are fine? FAANG or whoever take our privacy and security very seriously, you know!
The only reasonable approach is to view the code that is run on your system, which is possible with a extension script, and not possible with whatever non-technical people are using.
I don't know what point you're trying to make, but I already expect OpenAI to maintain records of my usage of their service. I do not however want other parties to be privy to this data, especially without my knowledge or consent.
A friend and I worked on a startup together that did this back when only the GPT-3 API was available. Sucked up everything we could think of, including HN and traditionally opaque sources such as Telegram
Remote: Yes
Willing to relocate: No
Technologies: JavaScript/TypeScript/Web, Python, C, Full-stack, AI/ML
Email: hello @ bad-software dot com
GitHub: https://github.com/soulofmischief
Full-stack JavaScript-focused engineer/entrepreneur with lots of experience in building scalable single page apps in different frameworks, web3, multiplayer web games, building transformers and other networks, all sorts of things. Product-oriented, executive and leadership experience, comfortable in both autonomous and collaborative settings, capable Linux sysadmin. I know how to ship.
I've focused for several years now on agentic and generative work such as simulating networked LLM-augmented embodied agents, interface research, too much to list here but always happy to talk more over email. While the rest of the industry is just starting to latch onto agentic systems, I can offer years of experience and product insight.
Available for consulting, projects, anything web or agentic.
reply