so how would you eval your own claude.md? Each context is unique to the project, team, and personal root claude.md. Do you just take given task and ask it to redo the same one over and over again against a known solution? Do you just keep using it and "feel" whether or not it's working? How is that different from what everyone is already doing?
The review eval tests language, activation etc of skills. I guess you could move it all to a skill quick and then run an eval on that if using Tessl. This checks if the way you write the instructions etc are being well understood by the agent
Is it just me, or is vibe coding only useful for greenfield projects that have minimal complexity? Seems like they collapse once enough complexity has built up.
I've tried to vibe code small stuff a few times, but there's not one success story. After about 2-4 hours, I'd hit a wall, and ultimately threw it away, because it wasn't worth the manual programming effort it would have required (which is why I tried vibe coding it in the first place).
I think vibe coding might be more successful for people doing things an experienced developer can do in their sleep with a few lines of code in Django or something. Something a non programmer might have previously done with some no code tool.
I'm someone who hated leetcode style interviews for the longest, but I'm starting to come around on them. I get that these style of questions are easy to game, but I still think they have _some_ value. The point of these style of questions was supposed to test your ability to problem solve and come up with a good solution given the tools you knew. That being said, I don't think every company should be using this type of question for their interviews. I think leetcode style questions should be reserved for companies that are pushing the boundary of the industry since they're exploring charted territory and need people who can come up with unique solutions to problems no one really knows. I think most companies would be fine with some kind of pairing problem since most people are probably solving engineering problems instead of computer science problems. But none of this matters, since, we all know that even if we went that direction as an industry, the business people would fuck it up some how anyways.
> reserved for companies that are pushing the boundary of the industry
In a world where every company beleives (or wants to beleive) that they are doing some ground-breaking, bleeding edge work (see any tech company blog and you can only find hyped technologies in there), I do not think one can expect companies to do a fair assessment of if they really are doing such work.