More

d4rkp4ttern · 2026-02-13T22:28:11 1771021691

Indeed. Over a few days of iterations I had this TUI built for fast full-text search of Claude Code or Codex sessions using Ratatui (and Tantivy for the full-text search index). I would never have dreamed of this pre coding agents.

https://pchalasani.github.io/claude-code-tools/tools/aichat/...

d4rkp4ttern · 2026-02-13T15:33:02 1770996782

Attention is all everyone wants.

d4rkp4ttern · 2026-02-13T12:18:05 1770985085

I think there’s a level beyond 8: not reviewing AI-generated code.

There’s a lot of discussion about whether to let AI write most of your code (which at least in some circles is largely settled by now), but when I see hype-posts about “AI is writing almost all of our code”, the top question I’m curious about is, how much of the AI-written code are they reviewing ?

d4rkp4ttern · 2026-02-12T13:51:57 1770904317

Related: I used the amazing 100M-parameter Pocket-TTS [1] model to make a stop-hook based voice plugin [2] that lets Claude Code give a short voice update whenever it stops. The hook quietly inserts nudges to Claude Code to end its response with a short speakable summary, and in case it forgets, it uses a headless agent to create the summary.

It was trickier than I expected, to get it working well: FFMpeg pipe streaming for low-latency playback, a three-hook injection strategy because the agent forgets instructions mid-turn, mkdir-based locks to queue concurrent voice updates from multiple sessions, and /tmp sentinel files to manage async playback state and prevent infinite loops.

[1] Pocket-TTS: https://github.com/kyutai-labs/pocket-tts

[2] Claude-code voice plugin: https://pchalasani.github.io/claude-code-tools/plugins-detai...

d4rkp4ttern · 2026-02-10T13:07:01 1770728821

I use the open source Handy [1] app with Parakeet V3 for STT when talking to coding agents and I’ve yet to see anything that beats this setup in terms of speed/accuracy. I get near instant transcription, and the slight accuracy drop is immaterial when talking to AIs that can “read between the lines”.

I tried incorporating this Voxtral C implementation into Handy but got very slow transcriptions on my M1 Max MacBook 64GB.

[1] https://github.com/cjpais/Handy

I’ll have to try the other implementations mentioned here.

thethimble · 2026-02-10T19:41:27 1770752487

Handy is great but I wish the STT was realtime instead of batch

d4rkp4ttern · 2026-02-11T00:47:24 1770770844

There’s a tradeoff here. If you want streaming output, then you lose the opportunity to clean it up in post processing such as removing filler words or removing stutters, etc., or any other AI based cleanup.

The MacOS built-in dictation streams in real time and also does some cleanup, but it does awkward things, like the streaming text shows up at the bottom of the screen. Also I don’t think it’s as accurate as Parakeet V3, and there’s a start up lag of 1-2 secs after hitting the dictation shortcut, which kills it for me.

thethimble · 2026-02-11T18:12:40 1770833560

I feel like this is a solvable problem. If you emit an errant word that should be replaced, why not correspondingly emit backspaces to just rewrite the word?

I feel like this is the best of both worlds.

Perhaps a little janky with backspaces, but still technically feasible.

t0md4n · 2026-02-12T12:33:08 1770899588

Have you tried Hex?

https://github.com/kitlangton/Hex

Faster than handy and uses way less memory.

d4rkp4ttern · 2026-02-12T14:38:43 1770907123

Indeed it's extremely fast, now my go-to for STT on MacOS. I made a PR to allow single-tap toggle hotkey instead of double-tap. Unlike Handy which aims to be multi-platform, Hex is MacOS-native and leverages the CoreML + Apple Neural Engine for far speedier transcription.

d4rkp4ttern · 2026-02-12T12:39:40 1770899980

Nice, will try, thanks!

d4rkp4ttern · 2026-02-10T13:00:48 1770728448

Same here. I haven’t found an ASR/STT/transcription setup that beats Parakeet V3 on the speed/accuracy tradeoff spectrum: transcription is extremely fast (near instant for a couple sentences, 1-3 seconds for long ramblings), and the slight accuracy drop relative to heavier/slower models is immaterial for the use case of talking to AIs that can “read between the lines” (terminal coding agents etc).

I use Parakeet V3 in the excellent Handy [1] open source app. I tried incorporating the C-language implementation mentioned by others, into Handy, but it was significantly slower. Speed is absolutely critical for good UX in STT.

[1] https://github.com/cjpais/Handy

deskamess · 2026-02-10T17:37:34 1770745054

Can you use handy exclusive via the cli if you have a file to feed it?

d4rkp4ttern · 2026-02-10T18:09:02 1770746942

Not sure about that

sipjca · 2026-02-10T23:43:16 1770766996

Not currently

d4rkp4ttern · 2026-02-05T22:18:20 1770329900

This sounds very promising. Using multiple CC instances (or mix of CLI-agents) across tmux panes has always been a workflow of mine, where agents can use the tmux-cli [1] skill/tool to delegate/collaborate with others, or review/debug/validate each others work.

This new orchestration feature makes it much more useful since they share a common task list and the main agent coordinates across them.

[1] https://github.com/pchalasani/claude-code-tools?tab=readme-o...

vardalab · 2026-02-06T07:49:45 1770364185

Yeah, I've been using your tools for a while. They've been nice.

d4rkp4ttern · 2026-02-05T02:46:28 1770259588

I didn’t see that but I do get a lot of stutters (words or syllables repeated 5+ times), not sure if it’s a model problem or post processing issue in the Handy app.

d4rkp4ttern · 2026-02-05T02:44:08 1770259448

I’m curious about this too. On my M1 Max MacBook I use the Handy app on macOS with Parakeet V3 and I get near instant transcription, accuracy slightly less than slower Whisper models, but that drop is immaterial when talking to CLI coding agents, which is where I find the most use for this.

https://github.com/cjpais/Handy

d4rkp4ttern · 2026-02-04T22:00:54 1770242454

Since Llama.cpp/llama-server recently added support for the Anthropic messages API, running Claude Code with several recent open-weight local models is now very easy. The messy part is what llama-server flags to use, including chat template etc. I've collected all of that setup info in my claude-code-tools [1] repo, for Qwen3-Coder-next, Qwen3-30B-A3B, Nemotron-3-Nano, GLM-4.7-Flash etc.

Among these, I had lots of trouble getting GLM-4.7-Flash to work (failed tool calls etc), and even when it works, it's at very low tok/s. On the other hand Qwen3 variants perform very well, speed wise. For local sensitive document work, these are excellent; for serious coding not so much.

One caviat missed in most instructions is that you have to set CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC = 1 in your ~/.claude/settings.json, otherwise CC's telemetry pings cause total network failure because local ports are exhausted.

[1] claude-code-tools local LLM setup: https://github.com/pchalasani/claude-code-tools/blob/main/do...