I think that being a maintainer is hard, but I actually agree with MJ. Scott says “… requiring a human in the loop for any new code, who can demonstrate understanding of the changes“.
How could you possibly validate that without spending more time validating and interviewing than actually reviewing.
I understand it’s a balance because of all the shit PRs that come across maintainers desks, but this is not shit code from LLM days anymore. I think that code speaks for itself.
“Per your website you are an OpenClaw AI agent”. If you review the code, and you like what you see, then you go and see who wrote it. This reads more like, he is checking the person first, then the code. If it wasn’t an AI agent but was a human that was just using AI, what is the signal that they can “demonstrate understanding of the changes”? Is it how much they have contributed? Is it what they do as a job? Is this vetting of people or code?
There may be something bigger to the process of maintainers who could potentially not understand their own bias (AI or not).
Haha glad to hear that you used Omnara before. In the old version, we were directly parsing the terminal output, which was really hard to maintain. But that meant you could use the Claude Code CLI directly in your terminal, and have the same session appear in your phone.
Now we use the Claude Agent SDK (basically a headless version of Claude Code), and we make our own UI for laptop and mobile. This is way easier to maintain than the previous solution we had. You can import Claude Code CLI sessions into Omnara, but you cant see a 1:1 realtime Claude Code CLI session in Omnara anymore. But we think that a GUI is a better experience than the CLI anyways for managing a bunch of agents.
I really miss the old way of doing things. I know it was a maintenance nightmare, but I only really value having an iphone app with native notifications that lets me see exactly the same stuff I would see on my laptop. I don't manage a ton of agents, I typically have one main task I am focusing on and possibly another smaller task on the back burner.
I have been inspired by all the use cases that are popping up from a proactive assistant, but lightweight is the last thing I would want when it comes to locking it down.
I started building my own version and before I even think about letting it loose, every facet needs to be designed and thought out. I have more tests than these lightweight libraries have code.
To me I don’t care about the size, I care about not getting wrecked.
I hoard links, research papers, blog posts as a reference. My human brain can make connections of the smallest detail of things I have read or seen before but I don’t always remember what. So if I am working on something I think, oh I have seen something like this before, I search in my tagged links. It rarely comes in handy, but when it does, it is a great feeling.
Thanks for sharing—it seems you have an extraordinary brain!
I'm also very curious, what makes a retrieval moment “great” and how often it happens.If someone could help you increase the likelihood of it happening, would you find that valuable?
I think (2) is the hardest: even if you saved it, it only feels “great” when it matches your current context and mental model.
When you tried the twin brain/second mind approaches, what specifically failed for you?
Was it capture overhead, inconsistent tagging, not knowing where to put things, or simply that nothing resurfaced at the right moment without you searching?
Also, what did “tagged meaningfully” look like in your system — topic tags, project tags, or “why I saved this” tags?
I’m exploring an approach centered on “active targets/projects as the context signal” to improve resurfacing without more organization work (more context in my HN profile/bio if you want to compare).
I was shoe horned into a dev role after an acquisition and it really sucked because it was not what I had been doing at my previous company. My boss was too involved in everyone’s code and went over every line in every PR. It got much worse over time because he started to get the toxic corporate jitters of being removed from his post if he didn’t deliver on his initiatives.
Long story short, since Claude 3.7 I haven’t written a single line of code and have had great success. I review it for cleanliness, anti-patterns, and good abstraction.
I was in charge of a couple full system projects and refactors and I put Claude Code on my work machine which no one seemed to care because the top down “you should use AI or else you aren’t a team player”. Before I left in November I basically didn’t work, was in meetings all the time while also being expected to deliver code, and I started moonlighting for the company I work at now.
My philosophy is, any tool can powerful if you learn how to use it effectively. Something something 10,000 hours, something something.
You gotta find a niche these days and do it well. No longer can there be generalist software. Find underserved industries that aren’t “cool” per se, but still have problems.
Have heard from multiple investors that they think boring software is the next wave, rather than a new UI/UX/Productivity way of doing things.
Okay, so I want to like it and I think there is potential here.
I may be missing something in my setup, but I find these things happening in all Claude wrappers I have tried.
1. After a long stint of planning or back and forth exchange, I have to scroll to find the bottom of the conversation. Same with Claude itself.
2. I don’t want tabs as much as I want split window’s that I can label and highlight.
3. The more calm it is, I feel like I’m going to miss something that it’s fucking up. Sometimes in normal CC, I will look over and it’s making a large assumption and nuking a change we just made. I would love to control the verbosity or at least be able to peak behind the curtain at certain points.
4. I like the click flow for the planning Q&A but it wasn’t clear if I could add my own answer like I can in CC.
Other than these, I like it and want to use it more so than my current daily driver CLIManager. (Which has the overlap with points 1 and 2)
Thank you for the detailed feedback! Really appreciate you taking the time to share these points.
On point 4 (custom answers in Q&A): You're right, this is missing! I'll add the ability to write your own answers today or tomorrow.
On point 1 (scrolling): I pushed a batch of scroll fixes yesterday, but I see it's still not working optimally. We're actively working on improving this - it's definitely on our radar.
On point 2 (split windows): We're planning to add multi-window support soon - so you can have multiple chat windows open side by side, similar to how you'd work with multiple terminal windows, but with a nice UI. Stay tuned!
On point 3 (verbosity control): Great suggestion - sometimes you want to catch assumptions before they cause damage. Adding to our roadmap.
Thanks again, and glad you're finding it useful despite these rough edges!
This is getting outrageous. How many times must we talk about prompt injection. Yes it exists and will forever. Saying the bad guys API key will make it into your financial statements? Excuse me?
The example in this article is prompt injection in a "skill" file. It doesn't seem unreasonable that someone looking to "embrace AI" would look up ways to make it perform better at a certain task, and assume that since it's a plain text file it must be safe to upload to a chatbot
I have a hard time with this one. Technical people understand a skill and uploading a skill. If a non-technical person learns about skills it is likely through a trusted person who is teaching them about them and will tell them how to make their own skills.
As far as I know, repositories for skills are found in technical corners of the internet.
I could understand a potential phish as a way to make this happen, but the crossover between embrace AI person and falls for “download this file” phishes is pretty narrow IMO.
You'd be surprised how many people fit in the venn overlap of technical enough to be doing stuff in unix shell yet willing to follow instructions from a website they googled 30 seconds earlier that tells them to paste a command that downloads a bash script and immediately executes it. Which itself is a surprisingly common suggestion from many how to blog posts and software help pages.
We built an MVP for a startup, then forked the front end to extend to the prototype vision of where we want to go. We use the prototype to dream and when we need to implement the front end of a feature we already understand how it would fit in to the product. Easier to scope.
One thing I have found coming from enterprise is, stakeholder design meetings get way more fun and much faster.
In enterprise, everyone gives notes and the designers take them away and put up another meeting to review days later. Usually with Figma screens people can review etc as things go.
Startup/Vibeland. Someone has an idea for a look and feel or UX solution. It’s like next to instant feedback. There is still the level of rebuilding it for real, but I find the iteration loops are much tighter.
How could you possibly validate that without spending more time validating and interviewing than actually reviewing.
I understand it’s a balance because of all the shit PRs that come across maintainers desks, but this is not shit code from LLM days anymore. I think that code speaks for itself.
“Per your website you are an OpenClaw AI agent”. If you review the code, and you like what you see, then you go and see who wrote it. This reads more like, he is checking the person first, then the code. If it wasn’t an AI agent but was a human that was just using AI, what is the signal that they can “demonstrate understanding of the changes”? Is it how much they have contributed? Is it what they do as a job? Is this vetting of people or code?
There may be something bigger to the process of maintainers who could potentially not understand their own bias (AI or not).
reply