I just recreated most of Linear for my company in a few days. Making it hyper specific to what we want (metrics driven, lean startup style).
All state changes are made with MCP so it saved me from having to spend time on any forms and most interactions other than filtering searching sorting etc.
Means we will be ditching Linear soon.
I know I’m an outlier but this sort of thing will get more common.
I don't understand this because who's gonna maintain it in the future? Surely that costs more to pay even one person to add features that Linear had than to pay Linear themselves. I'd do this for personal projects but never for my work company lest I be the one to maintain it indefinitely on top of my current work.
one thing annoying with premade solutions is that it only does 90% of what you want, its livable but still doesn't quite meet your needs.
Its not just adding features that Linear already provides but adding features and integrations that mets 100% your needs.
The full decision making equation is (cost of implementing it yourself + cost of maintenance + 10% additional benefit for a solution that fully meets your needs) versus (cost of preexisting solution that meets 90% of your needs). Cost of implementing it and cost of maintenance has just gone down. Surely that will mean on a whole more people as a whole will choose to make inhouse rather than outsource.
Thus demand for premade solutions will go down, Saas providers won't be able to increase their prices as this will make even more people choose to implement it themselves. The cost of producing software will continue to drop due to agentic coding and maintenance cost will drop as well due to maintenance coding agents. More people will choose their own custom solutions and so on. Its very possible we are in the beginning of the end for Saas companies.
The RLM framing basically turns long-context into an RL problem over what to remember and where to route it: main model context vs Python vs sub-LLMs. That’s a nice instantiation of The Bitter Lesson, but it also means performance is now tightly coupled to whatever reward signal you happen to define in those environments. Do you have any evidence yet that policies learned on DeepDive / Oolong-style tasks transfer to “messy” real workloads (multi-week code refactors, research over evolving corpora, etc.), or are we still in the “per-benchmark policy” regime?
The split between main model tokens and sub-LLM tokens is clever for cost and context rot, but it also hides the true economic story. For many users the cost that matters is total tokens across all calls, not just the controller’s context. Some of your plots celebrate higher “main model token efficiency” while total tokens rise substantially. Do you have scenarios where RLM is strictly more cost-efficient at equal or better quality, or is the current regime basically “pay more total tokens to get around context limits”?
math-python is the most damning data point: same capabilities, but the RLM harness makes models worse and slower. That feels like a warning that “more flexible scaffold” is not automatically a win; you’re introducing an extra layer of indirection that the model has not been optimized for. The claim that RL training over the RLM will fix this is plausible, but also unfalsifiable until you actually show a model that beats a strong plain-tool baseline on math with less wall-clock and tokens.
Oolong and verbatim-copy are more encouraging: the controller treating large inputs as opaque blobs and then using Python + sub-LLMs to scan/aggregate is exactly the kind of pattern humans write by hand in agents today. One thing I’d love to see is a comparison vs a well-engineered non-RL agent baseline that does essentially the same thing but with hand-written heuristics (chunk + batch + regex/SQL/etc.). Right now the RLM looks like a principled way to let the model learn those heuristics, but the post doesn’t really separate “benefit from architecture” vs “benefit from just having more structure/tools than a vanilla single call.”
On safety / robustness: giving the model a persistent Python REPL and arbitrary pip is powerful, but it also dramatically expands the attack surface if this ever runs on untrusted inputs. Are you treating RLM as strictly a research/eval harness, or do you envision this being exposed in production agent systems? If the latter, sandboxing guarantees and resource controls probably matter as much as reward curves.
The beauty of Suno, at least for me, was the opportunity to turn my original lyrics into listenable music free without having it attached in any way to any of the big labels, who are evil to the core. I really hope they keep the existing user experience intact.
Hello, I'm one of the original evangelists for Ruby on Rails and the author of The Rails Way as well as Patterns of Application Development Using AI. Over the past three decades, I’ve led teams and built products at every scale — from early-stage startups to global platforms — combining deep technical expertise with a creative, forward-looking approach to software craftsmanship.
I bring 30 years of hands-on engineering experience, including senior leadership in architecture, AI integration, and product strategy. Whether working as an individual contributor or guiding organizations through transformation, I focus on delivering clarity, velocity, and sustainable innovation. My last gig was leading AI strategy related to Developer Experience at Shopify.
Currently evaluating consulting and permanent opportunities with preference for executive leadership position at a larger company, although will consider consulting and fractional CTO type roles for startups and smaller ventures if the project and team are interesting enough.
I have lots of the code proven out already https://medium.com/zar-engineering/code-mode-mcp-ac17c2a1038...
reply