Headstart accelerates software development by up to 100x with Claude

matt3210 · on Oct 16, 2024

This is all lies. I use Claude and it's just not good enough to 100x productivity.

Update: I think this is an ad by Anthropic

fhdsgbbcaA · on Oct 16, 2024

The one time I trusted Claude to write a full class - with excessive rounds of prompts and revisions - it introduced a very subtle bug that I would have never made but took a couple months to show up.

Fixing the bug required a root and stem overhaul of the class, and ended up taking more time in aggregate.

And that’s the problem: just like with self-driving cars, if it isn’t right 100% of the time you are worse off because you think it’s ok to take your hand off the wheel when it very much is not.

We’ll get full autonomy in cars before we get an LLM that can write production code reliably, and we’re still very far from that.

chii · on Oct 16, 2024

> get an LLM that can write production code reliably

it depends on what that user wanted the code to do, and how important it is.

For example, an average, non-technical user could use this to generate a script to sort out their email, or a script for automation in MS Office VBA.

Just because it's not perfect, doesn't mean it doesn't have a good use, and won't improve. Tom Scott's video[0] makes a very good argument: we don't know where we are on the technology curve.

[0]: https://www.youtube.com/watch?v=jPhJbKBuNnA

from-nibly · on Oct 16, 2024

The comment you replied to said production code.

chii · on Oct 16, 2024

production code is code that is not just for testing, but for actual use.

Nothing about production code says critical.

fhdsgbbcaA · on Oct 16, 2024

I mean systems critical infrastructure code where error tolerance is zero.

For one off scripts or sketching a concept quickly it’s good enough, and for language reference it’s generally useful.

However, one thing I’ve noticed with Claude in particular is it tends to overweight the top answers in stack overflow.

The problem there is top answers are rarely the best answer - rather they tend to be overly verbose and long, whereas the best answer is usually the second one that just tells you what function to call.

On multiple occasions I’ve had Claude answer a simple prompt with horribly verbose and complicated code.

Then I’ll say “what about this single call?” (Eg the type of SO answer than gets the second most votes), and it’s says “You’re right! That’s a much better answer”.

Likewise I suggest to anybody to take the domain you are most knowledgeable in and pepper your LLM of choice with lots of questions and see how much it knows.

You’ll get a feel for how much you can trust it in other domains - which is “not much”.

chii · on Oct 16, 2024

> where error tolerance is zero.

which is quite niche, and you'd be correct that no one would trust GPT generated code blindly for that!

But this is a spectrum, and while i think today's GPT models don't quite make it there, i argue that we're closer to success with this than auto-driving cars; mainly due to the larger tolerance for bad code, rather than actual tech improvements.

wenc · on Oct 16, 2024

What if you treated Claude like a Junior dev?

Have it write small chunks of code, which you code review and unit test.

milch · on Oct 16, 2024

My problem is even if I do that I'm not convinced it's making me any faster. It feels like when it gets it right and I compare the time to writing it myself I would estimate it's maybe 20% faster. But when it gets it wrong after a few prompts and I have to write it myself anyway then it's more like 20% slower. Those two seem to average out, but then in the p90 it gets it subtly wrong enough in a way where I accept the code and then spend twice the amount of time reviewing and making adjustments to get to the solution compared to doing it myself in the first place. So I'm not convinced it's making me any faster, if anything I feel like it's either the same or a bit slower. Other than a junior engineer there is also no ROI on this time investment since it's just as likely to get it wrong again the next time.

The only thing where I noticed a pronounced speed up is when I use other languages I'm not super familiar with. AI can more easily help me translate concepts from languages I do know better, and then a good old Google search is often enough to fill in the rest of the blanks for me to be reasonably productive in a way that I wouldn't be without AI

wenc · on Oct 16, 2024

I think it depends on the problem domain. I have to implement a lot of throwaway ideas quickly, and LLMs are really useful there.

For instance, say I wanted to plot a complicated Matplotlib diagram. It takes me 10+ minutes plus many context switches to get the syntax right (I don't use Matplotlib enough to have all the args at the tip of my fingers). Also I don't know everything Matplotlib is able to do -- I haven't read the entire docs. Fortunately LLMs have and they get me to the right ballpark in 10-20 seconds. I usually want to try maybe 10-15 plots before settling on something. LLMs definitely do get me there much faster.

I think if you have a clear idea of what you want to do, and how to do it, then maybe the time savings are not compelling. But if you're in space where you're ideating and groping at an idea, LLMs can significantly cut down the iteration time and even open up new channels of inquiry that you didn't know existed.

They're primarily generative assistants. Using them to implement ideas in production is probably a secondary use.

fhdsgbbcaA · on Oct 16, 2024

I think that’s a fine approach, but they are claiming Claude is a 100x coder which is patently absurd.

CharlieDigital · on Oct 16, 2024

I have a friend who recently built a SaaS from the ground up using only Replit and Claude (it is integrated).

Non-technical, never built a React App, never built with Supabase, never built with Firebase (for auth). Never coded a single flow of Stripe.

100x might be an understatement. He built it from nothing with minimal knowledge of React, Tailwind, Supabase, Postgres, Stripe, and Firebase using Claude.

(He knew what all of the blocks were, but no technical coding knowledge at all)

Legit has paying customers in under a month and just running it directly via Replit (not even hosted externally).

lovethevoid · on Oct 16, 2024

Here you go, you can do exactly like your friend did without even using Claude!

https://github.com/KolbySisk/next-supabase-stripe-starter

https://github.com/felissi/nextjs-tailwind-template

https://github.com/CriticalMoments/CMSaasStarter

https://github.com/antoineross/Hikari

I'm guessing your friend is hoping enough paying customers join and are able to fund hiring someone to fix bugs or performance issues that arise

bamboozled · on Oct 16, 2024

What does he do when he wants to add new features, fix bugs, scale the platform etc?

anonylizard · on Oct 16, 2024

Once you have paying customers, you can hire an actual developer.

Claude is not 100x for any typical software work, but the biggest gains come from precisely 'non-typical work', which is previously impossible.

Imagine a domain expert, who knows a niche super well with all the weird edge cases and untapped demand. Hiring a developer for it doesn't work because.

1. The communication costs are too high, the developer won't know the business niche in depth enough to make a good product.

2. The niche is not profitable enough to risk hiring a developer.

Now LLMs allows the solo non-technical founder to make a MVP app, and put it to market to test, for very little cost and risk. Sure the app is not really extendable, may have to be heavily rewritten to expand and maintain, but hiring a developer then, will be a much lower risk task.

It doesn't even reduce developer employment this way, as now there's a ton more niche use cases being opened up, and becoming profitable enough to support developers.

CharlieDigital · on Oct 16, 2024

This is exactly what I told him. If he gets to X customers, then hire a dev.

fhdsgbbcaA · on Oct 16, 2024

That’s the fun part: he’s likely not knowledgeable enough about his own code to know he should be looking for bugs.

“It just works!”

CharlieDigital · on Oct 16, 2024

As a very senior dev and having worked in good number of startups, including YC backed startups, I can tell you that the hardest part is rarely technical for most SaaS. It's actually validating the idea, market, and business model.

If he can get 10 or 20 paying customers, then it's easy to find money to fix or scale the code.

fhdsgbbcaA · on Oct 16, 2024

I agree there, but the problem I’m pointing out is you can’t fix a problem you don’t see as a problem in the first place.

This type of nonsense from Anthropic isn’t helpful whatsoever.

CharlieDigital · on Oct 16, 2024

If you can't see the problem, it doesn't matter for an early startup; the only thing that matters is what users complain about or request and getting more users paying you. Everything can be fixed later once the idea itself has been validated.

fhdsgbbcaA · on Oct 16, 2024

Data breaches, financial errors, and the like for an early stage company are a death sentence. That’s the type of error I’m suggesting, not some flakey CSS.

CharlieDigital · on Oct 16, 2024

Severely overestimating it.

Death knell is not solving a problem in the first place. Almost everything is negotiable if you solve a valuable problem that people are willing to pay you to solve.

linotype · on Oct 16, 2024

Probably loads the whole thing into Claude’s massive context window and asks it to make alterations.

I do similar things with code bases on a smaller scale with ChatGPT all the time. Half the time I need to make small tweaks but it’s increased my productivity tremendously.

svaha1728 · on Oct 16, 2024

rm -rf and start over with better prompts /s

drewbug01 · on Oct 16, 2024

Link?

CharlieDigital · on Oct 16, 2024

https://www.bullship.co/

staticautomatic · on Oct 16, 2024

I just wrapped up a project where I had to do a bunch of work with audio, which I’ve never done before, and I wouldn’t say 100X but I did at night over a handful of weeks what would’ve taken me months to teach myself and some of it I probably never would have figured out. Like I could prototype on an easy library and go “rewrite it with librosa instead” or “nah I don’t really like this let’s just do such and such with ffmpeg that would work and probably be faster, right?” or “find a way to do this funky thing with torchaudio and a bunch of file I/O in memory” and then be like “how much money would I save on GCP egress if I did such and such?” It’s not always right on the first try but omg does it save a shitload of time and energy.

matt3210 · on Oct 16, 2024

I agree that it makes sense for this style of learning but only if you already know the fundamentals and are knowledgeable enough to know the right questions to ask.

peter_d_sherman · on Oct 16, 2024

I think the key quote is:

>"Claude's extensive context window has also transformed their approach to handling large codebases. When the 200K context window was released, Hedley notes they "ripped out the entire RAG and just put it in the context window instead and it went from 60 percent accuracy to 98."

RAG = Retrieval Augmented Generation

https://github.blog/ai-and-ml/generative-ai/what-is-retrieva...

Retrieval Augmented Generation:

https://en.wikipedia.org/wiki/Retrieval-augmented_generation

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks (2021) (apparently the original paper where the term 'RAG' was coined):

https://arxiv.org/pdf/2005.11401

userbinator · on Oct 16, 2024

More like, accelerates bug development by up to 100x.

OutOfHere · on Oct 16, 2024

A good deal of software development is clarifying the required specification. This alone takes a lot of work and a lot of coding! If you don't know the nitty-gritty of what you really need, you can't get it at 100x or even at 10x speed.

OutOfHere · on Oct 16, 2024

When it comes to financial accounting work, there is just no room for buggy or sloppy code. The customer will go away forever at the first instance of being billed incorrectly, also seeking a refund from the credit card.

throwaway918299 · on Oct 16, 2024

complete horse-plop

akmarinov · on Oct 16, 2024

Come on guys, my higher ups read these headlines and then expect some miracles …

GiorgioG · on Oct 16, 2024

Oh fuck off!