Hacker Newsnew | past | comments | ask | show | jobs | submit | cube2222's commentslogin

I’ve gone through this series of videos earlier this year.

In the past I’ve gone through many “educational resources” about deep neural networks - books, coursera courses (yeah, that one), a university class, the fastai course - but I don’t work with them at all in my day to day.

This series of videos was by far the best, most “intuition building”, highest signal-to-noise ratio, and least “annoying” content to get through. Could of course be that his way of teaching just clicks with me, but in general - very strong recommend. It’s the primary resource I now recommend when someone wants to get into lower level details of DNNs.


Karpathy has a great intuitive style, but sometimes it's too dumbed down. If you come from adjacent fields, it might be a bit dragging, but it's always entertaining

>Karpathy has a great intuitive style, but sometimes it's too dumbed down

As someone who has tried some teaching in the past, it's basically impossible to teach to an audience with a wide array of experience and knowledge. I think you need to define your intended audience as narrowly as possible, teach them, and just accept that more knowledgeable folk may be bored and less knowledgeable folk may be lost.


When I was an instructor for courses like "Intro to Programming", this was definitely the case. The students ranged from "have never programmed before" to "I've been writing games in my spare time", but because it was a prerequisite for other courses, they all had to do it.

Teaching the class was a pain in the ass! What seemed to work was to do the intro stuff, and periodically throw a bone to the smartasses. Once I had them on my side, it became smooth sailing.


I think this is where LLM-assisted education is going to shine.

An LLM is the perfect tool to fill the little gaps that you need to fill to understand that one explanation that's almost at your level, but not quite.


Spacelift | Remote (Europe) | Full-time | Senior Software Engineer | $80k-$110k+ (can go higher)

We're a VC-funded startup (recently raised $51M Series C) building an infrastructure orchestrator and collaborative management platform for Infrastructure-as-Code – from OpenTofu, Terraform, Terragrunt, CloudFormation, Pulumi, Kubernetes, to Ansible.

On the backend we're using 100% Go with AWS primitives. We're looking for backend developers who like doing DevOps'y stuff sometimes (because in a way it's the spirit of our company), or have experience with the cloud native ecosystem. Ideally you'd have experience working with an IaC tool, i.e. Terraform, Pulumi, Ansible, CloudFormation, Kubernetes, or SaltStack.

Overall we have a deeply technical product, trying to build something customers love to use, and have a lot of happy and satisfied customers. We promise interesting work, the ability to open source parts of the project which don't give us a business advantage, as well as healthy working hours.

If that sounds like fun to you, please apply at https://careers.spacelift.io/jobs/3006934-software-engineer-...

You can find out more about the product we're building at https://spacelift.io and also see our engineering blog for a few technical blog posts of ours: https://spacelift.io/blog/engineering

Additionally, we're hiring for a new product we're building, Flows. Mostly the same requirements and tech stack, without the devops bits. You can see a demo of Flows and apply for it here: https://careers.spacelift.io/jobs/6438380-product-software-e...


> If AI coding is so great and is going to take us to 10x or 100x productivity

That seems to be a strawman here, no? Sure, there exist people/companies claiming 10x-100x productivity improvements. I agree it's bullshit.

But the article doesn't seem to be claiming anything like this - it's showing the use of vibe-coding for a small personalized side-project, something that's completely valid, sensible, and a perfect use-case for vibe-coding.


That’s really cool, and a great use-case for vibe coding!

I’ve been vibe-coding a personalized outliner app in Rust based on gpui and CRDTs (loro.dev) over the last couple days - something just for me, and in a big part just to explore the problem space - and so far it’s been very nice and fun.

Especially exploring multiple approaches, because exploring an approach just means leaving the laptop working for an hour without my attendance and then seeing the result.

Often I would have it write up a design doc with todos for a feature I wanted based on its exploration, and then just launch a bash for loop that launches Claude with “work on phase $i” (with some extra boilerplate instructions), which would have it occupied for a while.


I agree with you as far as project size for vibe-coding goes - as-in often not even looking at the generated code.

But I have no issues with using Claude Code to write code in larger projects, including adapting to existing patterns, it’s just not vibe coding - I architect the modules, and I know more or less exactly what I want the end result to be. I review all code in detail to make sure it’s precisely what I want. You just have to write good instructions and manage the context well (give it sample code to reference, have agent.md files for guidance, etc.)


> I know more or less exactly what I want the end result to be

This is key.

And this is also why AI doesn't work that well for me. I don't know yet how I want it to work. Part of the work I do is discovering this, so it can be defined.


I've found this to be the case as well. My typical workflow is:

1. Have the ai come up with an implementation plan based on my requirements

2. Iterate on the implementation plan / tweak as needed, and write it to a markdown file

3. Have it implement the above plan based on the markdown file.

On projects where we split up the task into well defined, smaller tickets, this works pretty well. For larger stuff that is less well defined, I do feel like it's less efficient, but to be fair, I am also less efficient when building this stuff myself. For both humans and robots, smaller, well defined tickets are better for both development and code review.


Yeah, this exactly. And if the AI wanders in confusion during #3, it means the plan isn’t well-defined enough.

There actually is a term for this LLM-assisted coding/engineering. Unfortunately it has been pushed away by the fake influencer & PR term "vibe coding" which conflates coding with unknowledgeable people just jerking the slot machine.

Sounds like so much work just not to write it yourself.

Getting it right definitely takes some time and finesse, but when it works you spend 30 minutes to get 4-24+ hours of code.

And usually that code contains at least one or two insights you would not normally have considered, but that makes perfect sense, given the situation.


I’ve checked out codex after the glowing reviews here around September / October and it was, all in all, a letdown (this was writing greenfield modules in a larger existing codebase).

Codex was very context efficient, but also slow (though I used the highest thinking effort), and didn’t adapt do the wider codebase almost at all (even if I pointed it at the files to reference / get inspired by). Lots of defensive programming, hacky implementations, not adapting to the codebase style and patterns.

With Claude Code and starting each conversation by referencing a couple existing files, I am able to get it to write code mostly like I would’ve written it. It adapts to existing patterns, adjusts to the code style, etc. I can steer it very well.

And now with the new cheaper faster Opus it’s also quite an improvement. If you kick off sonnet with a long list of constraints (e.g. 20) it would often ignore many. Opus is much better at “keeping more in mind” while writing the code.

Note: yes, I do also have an agent.md / claude.md. But I also heavily rely on warming the context up with some context dumping at conversation starts.


All codex conversations need to be caveat with the model because it varies significantly. Codex requires very little tweaking but you do need to select the highest thinking model if you’re writing code and recommend the highest thinking NON-code model for planning. That’s really it, it takes task time up to 5-20m but it’s usually great.

Then I ask Opus to take a pass and clean up to match codebase specs and it’s usually sufficient. Most of what I do now is detailed briefs for Codex, which is…fine.


I will jump between a ChatGPT window and a VSCode window with the Codex plugin. I'll create an initial prompt in ChatGPT, which will ask the coding agent to audit the current implementation, then draft an implementation plan. The plan bounces between Chat and Codex about 5 times, with Chat telling Codex how to improve. Then Codex implements, creates an implementation summary, which I give to Chat. Chat then asks to add a couple of things fixes, then it's done.

Why non-thinking model? Also 5-20 minutes?! I guess I don’t know what kind of code you are writing but for my web app backends/frontends planning takes like 2-5 minutes tops with Sonnet and I have yet to feel the need to even try Opus.

I probably write overly detailed starting prompts but it means I get pretty aligned results. It does take longer but I try to think through the implementation first before the planning starts.

In my experience sonnet > opus, so it’s not surprise you don’t “need” opus. They charge a premium on sonnet now instead

Yeah, I uninstalled and reinstalled with homebrew, and it’s working well now.


Just to provide another datapoint - tried codex September / October after seeing the glowing reviews here, and it was, all in all, a huge letdown.

It seems to be very efficient context-wise, but at the same time made precise context-management much harder.

Opus 4.5 is quite a magnificent improvement over Sonnet 4.5, in CC, though.

Re tfa - I accidentally discovered the new lsp support 2 days ago on a side project in rust, and it’s working very well.


I was luke-warm about codex when I tried it 2-3 months ago, but just recently tried it again last week, running it against claude code, both of them running against the same todo list to build a docusign-like web service. I was using loops of "Look at the todo list and implement the next set of tasks" for the prompt (my prompt was ~3 sentences, but basically saying that):

    - Codex required around 30 passes on that loop, Claude did it in ~5-7.
    - I thought Codex's was "prettier", but both were functional.
    - I dug into Claude's result in more depth, and had to fix ~5-10 things.
    - Codex I didn't dig into testing quite as deeply, but it seemed to need less fixing.  Still not sure if that is because of a more superficial view.
    - Still a work in progress, have not completed a full document signing workflow in either.


Similar experience and timeline with codex, but tried it last week and it's gotten much better in the interim. Codex with 5.2 does a good job at catching (numerical) bugs that Opus misses. I've been comparing them and there's not a clear winner, GPT 5.2 misses things Opus finds and vice versa. But claude-code is still a much better experience and continues to just keep getting better but codex is following, just a few months behind.


Another anecdote/datapoint. Same experience. It seem to mask a lot of bad model issues by not talking much and overthinking stuff. The experience turns sour the more one works with it.

And yes +1 for opus. Anthropic delivered a winner after fucking up the previous opus 4.1 release.


What are some of the use cases for Claude Code + LSP ? What does LSP support let you do, or do better, that Claude Code couldn't do by itself ?


I checked the codex source code a few months ago and the implementation was very basic compared to opencode


It's so nice that skills are becoming a standard, they are imo a much bigger deal long-term than e.g. MCP.

Easy to author (at its most basic, just a markdown file), context efficient by default (only preloads yaml front-matter, can lazy load more markdown files as needed), can piggyback on top of existing tooling (for instance, instead of the GitHub MCP, you just make a skill describing how to use the `gh` cli).

Compared to purpose-tuned system prompts they don't require a purpose-specific agent, and they also compose (the agent can load multiple skills that make sense for a given task).

Part of the effectiveness of this, is that AI models are heavy enough, that running a sandbox vm for them on the side is likely irrelevant cost-wise, so now the major chat ui providers all give the model such a sandboxed environment - which means skills can also contain python scripts and/or js scripts - again, much simpler, more straightforward, and flexible than e.g. requiring the target to expose remote MCPs.

Finally, you can use a skill to tell your model how to properly approach using your MCP server - which previously often required either long prompting, or a purpose-specific system prompt, with the cons I've already described.


On top of everything you've described, one more advantage is that you can use the agents themselves to edit / improve / add to the skills. One easy one to do is something like "take the key points from this session and add the learnings as a skill". It works both on good sessions with new paths/functionality and on "bad" sessions where you had to hand-hold the agent. And they're pretty good at summarising and extracting tidbits. And you can always skim the files and do quick edits.

Compared to MCPs, this is a much faster and more approachable flow to add "capabilities" to your agents.


I think taking key points from a session and making a new skill is less useful than "precaching" by disseminating the key findings and updating related or affected skills, eliminating the need for a new skill (in most cases).

On the other hand, from a pure functional coding appeal, new skills that don't have leaking roles can be more atomic and efficient in the long run. Both have their pros/cons.


Add reinforcement learning to figure out which skills are actually useful, and you're really cooking.


DSPy with GEPA should work nicely, yeah. Haven't tried yet but I'll add it to my list. I think a way to share within teams is also low-hanging fruit in this space (outside of just adding them to the repo). Something more org-generic.


> DSPy with GEPA should work nicely

I think that would be a really really interesting thing to do on a bunch of different tasks involving developer tooling (e.g. git, jj, linters, etc.)


Combine that with retrying the same task again but with the improved skills in some sort of train loop that learns to bake in the skills natively to obviate the need for them.

The path to recursive self-improvement seems to be emerging.


Perhaps you could help me.

I'm having a hard time figuring out how could I leverage skills in a medium size web application project.

It's python, PostgreSQL, Django.

Thanks in advance.

I wonder if skills are more useful for non crud-like projects. Maybe data science and DevOps.


There’s nothing super special about it, it’s just handy if you have some instructions that you don’t need the AI to see all the time, but that you’d like it to have available for specific things.

Maybe you have a custom auth backend that needs an annoying local proxy setup before it can be tested—you don’t need all of those instructions in the primary agents.md bloating the context on every request, a skill would let you separate them so they’re only accessed when needed.

Or if you have a complex testing setup and a multi-step process for generating realistic fixtures and mocks: the AI maybe only needs some basic instructions on how to run the tests 90% of the time, but when it’s time to make significant changes it needs info about your whole workflow and philosophy.

I have a django project with some hardcoded constants that I source from various third party sites, which need to be updated periodically. Originally that meant sitting down and visiting a few websites and copy pasting identifiers from them. As AI got better web search I was able to put together a prompt that did pretty well at compiling them. With a skill I can have the AI find the updated info, update the code itself, and provide it some little test scripts to validate it did everything right.


Thanks. I think I could use skills as "instructions I might need but I don't want to clutter AGENTS.md with them".


Yes exactly. Skills are just sub agents.md files + an index. The index tells the agent about the content of the .md files and when to use them. Just a short paragraph per file, so it's token efficient and doesn't take much of your context.

Poor man's "skills" is just manually managing and adding different .md files to the context.

Importantly every time you instruct the agent to do something correctly that it did incorrectly before, you ask it to revise a relevant .md file/"skill", so it has that correction from now on. This is how you slowly build up relevant skills. Things start out as sections in your agents.md file, and then graduate to a separate file when they get large enough.


Yes but also because skills are a semi special construct, agents are both better at leveraging them when needed and you can easily tap into them explicitly (eg “use the PR skill to open a PR”)


you could for example create a skill to access your database for testing purposes and pass in your tables specifications so that the agent can easily retrieve data for you on the fly.


I made a small mcp script for database with 3 tools:

- listTables

- getTableSchema

- executeQuery (blocks destructive queries like anything containing DROP, DELETE, etc..)

I wouldn't trust a textual instructions to prevent LLMs from dropping a table.


That's why I give the LLM a readonly connection


This is much better than MCP, which also stuffs every session's precious context with potentially irrelevant instructions.


They could just make mcps dynamically loaded in the same way no?


It is still worse as it consumes more context giving instructions for custom tooling whereas the LLM already understands how to connect to and query a read-only SQL service with standard tools


Oooooo, woah, I didn't really "get it" thanks for spelling it out a bit, just thought of some crazy cool experiments I can run if that is true.


it’s also for (typically) longer context you don’t always want the agent to have in its context. if you always want it in context, use rules (memories)

but if it’s something more involved or less frequently used (perhaps some debugging methodology, or designing new data schemas) skills are probably a good fit


Skills are not useful for single-shot cases. They are for: cross-team standardization (for LLM generated code), and reliable reusability of existing code/learnings.


Skills are the matrix scene where neo learns kungfu. Imagine they are a database of specialized knowledge that can an agent can instantly tap into _on demand_.

The key here is “on demand”. Not every agent or convention needs to know kung fu. But when they do, a skill is waiting to be consumed. This basic idea is “progressive disclosure” and it composes nicely to keep context windows focused. Eg i have a metabase skill to query analytics. Within that I conditionally refer to how to generate authentication if they arent authenticated. If they are authenticated, that information need not be consumed.

Some practical “skills”: writing tests, fetching sentry info, using playwright (a lot of local mcps are just flat out replaced by skills), submitting a PR according to team conventions (eg run lint, review code for X, title matches format, etc)


Could you explain more about your metabase skill and how you use it? We use metabase (and generally love it) and I’m interested to hear about how other people are using it!


Its really just some rules around auth, some precached lookups (eg databases with ids and which to use), and some explanations around models and where to find them. Everything else it pretty much knows on it own.


Nice analogy!


I cant claim credit. Im pretty sure Ive seen anthropic themselves use it in the original explainers


There can be a Django template skill for example, which is just a markdown file which reminds the LLM the syntax of Django Templates and best practices for it. It could have an included script that the LLM can use to test a single template file for example.


So a skill is effectively use case / user story / workflow recipe caching


The general idea is not very new, but the current chat apps have added features that are big enablers.

That is, skills make the most sense when paired with a Python script or cli that the skill uses. Nowadays most of the AI model providers have code execution environments that the models can use.

Previously, you could only use such skills with locally running agent clis.

This is imo the big enabler, which may totally mean that “skills will go big”. And yeah, having implemented multiple MCP servers, I think skills are a way better approach for most use-cases.


I like the focus on python cli tools, using the standard argparse module, and writing good help and self documentation.

You can develop skills incrementally, starting with just one md file describing how to do something, and no code at first.

As you run through it for the first several times, testing and debugging it, you accumulate a rich history of prompts, examples, commands, errors, recovery, backing up and branching. But that chat history is ephemeral, so you need to scoop it up and fold it back into the md instructions.

While the experience is still fresh in the chat, have it uplift knowledge from the experience into the md instructions, refine the instructions with more details, give concrete examples of input and output, Add more detailed and explicit instructions, handle exceptions and prerequisites, etc.

Then after you have a robust reliable set of instructions and examples for solving a problem (with branches and conditionals and loops to handle different conditions, like installing prerequisite tools, or checking and handling different cases), you can have it rewrite the parts that don't require "thought" into python, as a self documenting cli tool that an llm, you, and other scripts can call.

It's great to end up with a tangible well documented cli tool that you can use yourself interactively, and build on top of with other scripts.

Often the whole procedure can be rewritten in python, in which case the md instructions only need to tell how to use the python cli tool you've generated, which cli.py --help will fully document.

But if it requires a mix of llm decision making or processing plus easily automated deterministic procedures, then the art is in breaking it up into one or more cli tools and file formats, and having the llm orchestrate them.

Finally you can take it all the way into one tool, turn it outside in, and have the python cli tool call out to an llm, instead of being called by an llm, so it can run independently outside of cursor or whatever.

It's a lot like a "just in time" compiler from md instructions to python code.

Anyone can write up (and refine) this "Self Optimizing Skills" approach in another md file of meta instructions for incrementally bootstrapping md instructions into python clis.


MCP servers are really just skills paired with python scripts, it's not really that different, MCP just lets you package them together for distribution.


But then those work only locally - not in the web ui’s, unless you make it a remote MCP, and then it’s back to being something somewhat different.

Skills also have a nicer way of working with the context, by default (and in the main web uis), with their overview-driven lazy loading.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: