More

bisonbear · 2026-02-07T21:02:20 1770498140

Intuitively makes sense, but in my experience, a more realistic workflow is using the main agent to sub-agent delegation pattern instead of straight 7x-ing token costs.

By delegating to sub agents (eg for brainstorming or review), you can break out of local maxima while not using quite as many more tokens.

Additionally, when doing any sort of complex task, I do research -> plan -> implement -> review, clearing context after each stage. In that case, would I want to make 7x research docs, 7x plans, etc.? probably not. Instead, a more prudent use of tokens might be to have Claude do research+planning, and have Codex do a review of that plan prior to implementation.

languid-photic · 2026-02-08T05:19:15 1770527955

Yes, understandable.

The question is which multi-agent architecture, hierarchical or competitive, yields the best results under some task/time/cost constraints.

In general, our sense is that competitive is better when you want breadth and uncorrelated solutions. Or when the failure modes across agents are unknown (which is always, right now, but may not be true forever).

girvo · 2026-02-07T22:06:17 1770501977

> straight 7x-ing token costs

You are probably right, but my work pays for as many tokens as I want, which opens up a bunch of tactics that otherwise would be untenable.

I stick with sub-agent approaches outside of work for this reason though, which is more than fair a point

darkerside · 2026-02-08T02:40:25 1770518425

Maybe an evolution based approach does make sense. 3x instead, and over time drop the least effective agents, replacing them with even random choices.

Edit: And this is why you should read the article before you post!

languid-photic · 2026-02-08T04:44:46 1770525886

Yes indeed, you get a big lift out of running just the few top agents.

We run big ensembles because we are doing a lot of analysis over the system etc

bisonbear · 2026-01-15T19:03:07 1768503787

curious how this is different from claude-mem?

https://github.com/thedotmack/claude-mem

AttentionBlock · 2026-01-15T19:13:53 1768504433

great question

claude-mem uses a compaction approach. It records session activity, compresses it, and injects summaries into future sessions. Great for replaying what happened.

A-MEM builds a self-evolving knowledge graph. Memories aren’t compressed logs. They’re atomic insights that automatically link to related memories and update each other over time. Newer memories impact past memories.

For example: if Claude learns “auth uses JWT” in session 1, then learns “JWT tokens expire after 1 hour” in session 5, A-MEM links these memories and updates the context on both. The older memory now knows about expiration. With compaction, these stay as separate compressed logs that don’t talk to each other.

bisonbear · 2026-01-03T05:39:15 1767418755

pretty cool. I've been testing claude/codex head to head, looks like you pass their security audit

would be cool to see/extend the ttl on the transcripts

https://agentexports.com/v/g92c990f0cfb9962a#lClk4hHKdmv52Nx... https://agentexports.com/v/ga07365c8abedbd2a#5boQrM0ZUz78LIF...

nicoritschel · 2026-01-03T16:05:34 1767456334

Right on, appreciate you pointing out those minor issues

TTLs can be extended if you configure things as such!

nicoritschel · 2026-01-05T17:30:12 1767634212

Just to add on; you can also use gists as a storage backend

bisonbear · 2026-01-01T17:28:19 1767288499

I've experimented with something similar - my flow is to have the subagents "initialize" a persona for the task at hand, and then have the main thread simulate a debate between the personas. Not sure if it's the best approach but it's helpful to get a diversity of perspectives on an issue

bisonbear · 2025-12-29T20:13:27 1767039207

I posted this to reddit and got a bunch of great additional reading recs from the community there! https://www.reddit.com/r/ArtificialInteligence/comments/1pyr...

barrenko · 2025-12-29T20:48:35 1767041315

You are the post author? What would you say led you to be interested in this?

bisonbear · 2025-12-29T20:53:42 1767041622

Yep! Honestly it was just a random rabbit hole - I recently read Lee Smolin's Time Reborn (highly recommend btw, super fascinating read) and was curious as to what his more recent work was about, which lead me to come across The Autodidactic Universe paper. With the AI hype train full steam ahead, the paper felt newly relevant, especially as it seems that we're starting to hit plateaus for model intelligence and looking to other areas (e.g. world models) for further advancement in the field

barrenko · 2025-12-30T21:55:47 1767131747

Looking forward to reading this.

bisonbear · 2025-12-19T18:44:55 1766169895

checked out the tool and think it's a cool idea! one piece of feedback though - I actually feel like the inverse product would be more helpful for me. What I mean is replacing ~95% of english text with words (Chinese in my case) that I can understand, and leaving the remaining ~5% (words I definitely don't know) in English.

At least for me, there's large value in consuming bigger volumes of Chinese to get me used to pattern-matching on the characters, as opposed to only reading a smaller amount of harder characters that I'm less likely to actually encounter

englishcat · 2025-12-21T01:45:16 1766281516

That makes a lot of sense, it really highlights the diffences in learning stages. My current tool if primarily designed for intermediate language learners who have already learned some basic words, but still in the 'accumulation phase' - their main bottleneck is vocabulary size, so they need to see new words frequently.

it sounds like you are at a more advanced stage of learning Chinese, you have moved past simple vocab building and are focusing on flow and fluency reading. For your use case, that 'inverse' approach (Chinese with English safety nets) is definitely superior for pattern-matching, it's a different problem set, but a very valid one.

Appreciate your feedback.

simedw · 2025-12-19T22:37:30 1766183850

That's a really cool concept. Naively replacing words might work, but sometimes the context is needed. Maybe a model like gemini 2.5 flash lite would be fast enough but still maintain better context awareness?

bisonbear · 2025-12-19T17:25:56 1766165156

I have personally had success with using Kimi for Chinese creative writing making the same assumption that Moonshot, as a Chinese company, has more/better Mandarin language pretraining data

bisonbear · 2025-12-19T17:24:29 1766165069

As a fellow Mandarin learner - this is super cool! Intuitively makes a lot of sense for the "full immersion" component of language. I love to see exciting uses of AI for language learning like this instead of just more slop generation :)

I haven't dug into the github repo but I'm curious if by "guided decoding" you're referring to logit bias (which I use), or actual token blocking? Interested to know how this works technically.

(shameless self plug) I've actually been solving a similar problem for Mandarin learning - but from the comprehensible input side rather than the dictionary side:

https://koucai.chat - basically AI Mandarin penpals that write at your level

My approach uses logit bias to generate n+1 comprehensible input (essentially artificially raising the probability of the tokens that correspond to the user's vocabulary). Notably I didn't add the concept of a "regeneration loop" (otherwise there would be no +1 in N+1) but think it's a good idea.

Really curious about the grammar issues you mentioned - I also experimented with the idea of an AI-enhanced dictionary (given that the free chinese-english dictionary I have is lacking good examples) but determined that the generated output didn't meet my quality standards. Have you found any models that handle measure words reliably?

bisonbear · 2025-12-18T20:58:25 1766091505

good question - however I don't think these are necessarily mutually exclusive.

I have repeatable workflows that harness the benefits of multiple agents. Repeatable workflows drive consistent results for single agents. Using multiple agents allows you to fully explore the problem space.

An example of using these concepts harmoniously would be creating a custom slash command that spawns sub-agents that each have custom prompts, causing them to do more exploration. The commands + agent prompts make the flow repeatable + improvable

bisonbear · 2025-12-15T17:52:13 1765821133

I've been exploring the "AI as conversation partner for immersion" use case for a project I'm building and find it pretty helpful for a few reasons

1. Effectively infinite engaging comprehensible input at your level 2. Fantastic way to practice new vocabulary and grammar patterns (AI can provide correction for mistakes) 3. Somewhat fun - if you view chat as a choose your own adventure, the experience becomes more interesting

Hammershaft · 2025-12-15T19:58:06 1765828686

I just opened chatGPT's voice mode and mocked the worse accented english I could muster asking for tips on pronunciation.

chatGPT just told me that my pronunciation was perfect over an over. It's transcribing audio into text and has no sense for details needed to improve conversational skills.

encom · 2025-12-16T02:47:41 1765853261

I've tried speaking danish to ChatGPT and asking it very simple questions. I even tried using complete words and pronouncing them properly (inb4 kamelåså)[1], but it didn't help. I didn't manage to have it transcribe a single sentence properly.

[1]https://www.youtube.com/watch?v=s-mOy8VUEBk

fragmede · 2025-12-16T04:29:29 1765859369

I believe you, but I'm surprised it doesn't do Danish. It manages Cantonese though, which I think is fairly niche (Google translate doesn't support it).

fn-mote · 2025-12-16T02:06:07 1765850767

I’m pretty sure the point is to have a conversation with someone (something) who is speaking correctly.

As another poster here noted, the effect of error correction is nowhere near the effect of having correct input. (See the “comprehensible input” poster.)