Hacker Newsnew | past | comments | ask | show | jobs | submit | eclark's commentslogin

No it's far from trivial for three reasons.

First being the hidden information, you don't know your opponents hand holdings; that is to say everyone in the game has a different information set.

The second is that there's a variable number of players in the game at any time. Heads up games are closer to solved. Mid ring games have had some decent attempts made. Full ring with 9 players is hard, and academic papers on it are sparse.

The third is the potential number of actions. For no limit games there's a lot of potential actions, as you can bet in small decimal increments of a big blind. Betting 4.4 big blinds could be correct and profitable, while betting 4.9 big blinds could be losing, so there's a lot to explore.


Text trained LLM's are likely not a good solution for optimal play, just as in chess the position changes too much, there's too much exploration, and too much accuracy needed.

CFR is still the best, however, like chess, we need a network that can help evaluate the position. Unlike chess, the hard part isn't knowing a value; it's knowing what the current game position is. For that, we need something unique.

I'm pretty convinced that this is solvable. I've been working on rs-poker for quite a while. Right now we have a whole multi-handed arena implemented, and a multi-threaded counterfactual framework (multi-threaded, with no memory fragmentation, and good cache coherency)

With BERT and some clever sequence encoding we can create a powerful agent. If anyone is interested, my email is: elliott.neil.clark@gmail.com


They would need to lie, which they can't currently do. To play at our current best, our approximation of optimal play involves ranges. Thinking about your hand as being any one of a number of cards. Then imagine that you have combinations of those hands, and decide what you would do. That process of exploration by imagination doesn't work with an eager LLM using huge encoded context.


I don't think this analysis matches the underlying implementation.

The width of the models is typically wide enough to "explore" many possible actions, score them, and let the sampler pick the next action based on the weights. (Whether a given trained parameter set will be any good at it, is a different question.)

The number of attention heads for the context is similarly quite high.

And, as a matter of mechanics, the core neuron formulation (dot product input and a non-linearity) excels at working with ranges.


No the widths are not wide enough to explore. The number of possible game states can explode beyond the number of atoms in the universe pretty easily, especially if you use deep stacks with small big blinds.

For example when computing the counterfactual tree for 9 way preflop. 9 players have up to 6 different times that they can be asked to perform an action (seat 0 can bet 1, seat 1 raises min, seat 2 calls, back to seat 0 raises min, with seat 1 calling, and seat 2 raising min, etc). Each of those actions has check, fold, bet min, raise the min (starting blinds of 100 are pretty high all ready), raise one more than the min, raise two more than the min, ... raise all in (with up to a million chips).

(1,000,000.00 - 999,900.00) ^ 6 times per round ^ 9 players That's just for pre flop. Postflop, River, Turn, Showdown. Now imagine that we have to simulate which cards they have and which order they come in the streets (that greatly changes the value of the pot).

As for LLMs being great at range stats, I would point you to the latest research by UChicago. Text trained LLMs are horrible at multiplication. Try getting any of them to multiply any non-regular number by e or pi. https://computerscience.uchicago.edu/news/why-cant-powerful-...

Don't get what I'm saying wrong though. Masked attention and sequence-based context models are going to be critical to machines solving hidden information problems like this. Large Language Models trained on the web crawl and the stack with text input will not be those models though.


Why would they need to lie? Where's the lying in Poker?

(Ignore for a moment that LLMs can lie just fine.)

What you are describing is exploring a range of counterfactuals. That's not lying.


Early game bluffs are essentially lies that you tell through the rest of the streets. In order to keep your opponents from knowing when you have premium starting hands, it's required to play some ranges, sometimes as if they were a different range. E.g., 10% of the time, I will bluff and act like I have AK, KK, AA, QQ. On the next street, I will need to continue that; otherwise, it becomes not profitable (opponents only need to wait one bet to know if I am bluffing). I have to evolve the lie as well. If cards come out that make my story more or less likely/profitable/possible, then I need to adjust the lie, not revert to the truth or the opponent's truth.

To see that LLMs aren't capable of this, I present all of the prompt jailbreaks that rely on repeated admonitions. And that makes sense if you think about the training data. There's not a lot of human writing that takes a fact and then confidently asserts the opposite as data mounts.

LLMs produce the most likely response from the input embeddings. Almost always, the easiest is that the next token is in agreement of the other tokens in the sequence. The problem in poker is that a good amount of the tokens in the sequence are masked and/or controlled by a villain who is actively trying to deceive.

Also, notice that I'm careful to say LLM's and not generalize to all attention head + MLP models. As attention with softmax and dot product is a good universal function. Instead, it's the large language model part that makes the models not great fits for poker. Human text doesn't have a latent space that's written about enough and thoroughly enough to have poker solved in there.


I wouldn't call a bluff a lie. In the sense that you can tell anyone who asks honestly about your general policy around bluffing and that would not diminish how well your bluffs work. In contrast with lying, where you going around and saying "Oh, yeah, I tend to lie around 10% of the time." would backfire quite a bit.

In game theory, the point of bluffing is not so much to make money from your bluff directly, but to mask when you are playing a genuinely good hand.

> [...] it's required to play some ranges, sometimes as if they were a different range; [...]

Why the mental gymnastics? Just say what the optimal play for 'some ranges' is, and then play that. The extra indirection in explanation might be useful for human intuition, but I'm not sure the machine needs that dressing up.

> LLMs produce the most likely response from the input embeddings. [...]

If I wanted to have my LLM play poker, I would ask it suggest me probabilities for what to play next, and then sample from there, instead of using the next-token sampler in the LLM to directly tell you the action you should take.

(But I'm not sure that's what the original article is doing.)

> The problem in poker is that a good amount of the tokens in the sequence are masked and/or controlled by a villain who is actively trying to deceive.

> Human text doesn't have a latent space that's written about enough and thoroughly enough to have poker solved in there.

I agree with both. Though it's still a fun exercise to pit contemporary off-the-shelf LLMs against each other here.

And perhaps add a purpose built poker bot to the mix as a benchmark. And also try with and without access to an external random sampler (like I suggested above). Or with and without access to eg being able to run freshly written Python code.


>They would need to lie, which they can't currently do

They lie better than most people lol.


I am the author/maintainer of rs-poker ( https://github.com/elliottneilclark/rs-poker ). I've been working on algorithmic poker for quite a while. This isn't the way to do it. LLMs would need to be able to do math, lie, and be random. None of which are they currently capable.

We know how to compute the best moves in poker (it's computationally challenging; the more choices and players are present, the more likely it is that most attempts only even try at heads-up).

With all that said, I do think there's a way to use attention and BERT to solve poker (when trained on non-text sequences). We need a better corpus of games and some training time on unique models. If anyone is interested, my email is elliott.neil.clark @ gmail.com


Why wouldn't something like an RL environment allow them to specialize in poker playing, gaining those skills as necessary to increase score in that environment?

E.g. given a small code execution environment, it could use some secure random generator to pick between options, it could use a calculator for whatever math it decides it can't do 'mentally', and they are very capable of deception already, even more so when the RL training target encourages it.

I'm not sure why you couldn't train an LLM to play poker quite well with a relatively simple training harness.


> Why wouldn't something like an RL environment allow them to specialize in poker playing, gaining those skills as necessary to increase score in that environment?

I think an RL environment is needed to solve poker with an ML model. I also think that like chess, you need the model to do some approximate work. General-purpose LLMs trained on text corpus are bad at math, bad at accuracy, and struggle to stay on task while exploring.

So a purpose built model with a purpose built exploring harness is likely needed. I've built the basis of an RL like environment, and the basis of learning agents in rust for poker. Next steps to come.


> None of which are they currently capable

what makes you say this? modern LLMs (the top players in this leaderboard) are typically equipped with the ability to execute arbitrary Python and regularly do math + random generations.

I agree it's not an efficient mechanism by any means, but I think a fine-tuned LLM could play near GTO for almost all hands in a small ring setting


To play GTO currently you need to play hand ranges. (For example when looking at a hand I would think: I could have AKs-ATs, QQ-99, and she/he could have JT-98s, 99-44, so my next move will act like I have strength and they don't because the board doesn't contain any low cards). We have do this since you can't always bet 4x pot when you have aces, the opponents will always know your hand strength directly.

LLM's aren't capable of this deception. They can't be told that they have some thing, pretend like they have something else, and then revert to gound truth. Their egar nature with large context leads to them getting confused.

On top of that there's a lot of precise math. In no limit the bets are not capped, so you can bet 9.2 big blinds in a spot. That could be profitable because your opponents will call and lose (eg the players willing to pay that sometimes have hands that you can beat). However betting 9.8 big blinds might be enough to scare off the good hands. So there's a lot of probiblity math with multiplication.

Deep math with multiplication and accuracy are not the forte of llm's.


Agreed. I tried it on a simple game of exchanging colored tokens from a small set of recipes. Challenged it to start with two red and end up with four white, for instance. I failed. It would make one or two correct moves, then either hallucinate a recipe, hallucinate the resulting set of tiles after a move, or just declare itself done!


If you could, theoretically, make a LLM that could actually excel at poker would that mean that it is good at lying to people?


> lie

LLMs are capable of lying. ChatGPT / gpt-5 is RL'd not to lie to you, but a base model RL'd to lie would happily do it.


I think 'Batteries Included' would interest you, then. Like this, it's installable on AWS. It's a whole platform PaaS + AI + more built on open source. So Kubernetes is at the core, but with tons of automation and UI. Dev environments are Kubernetes in Docker (Kind-based).

- https://github.com/batteries-included/batteries-included/ - https://www.batteriesincl.com/


> The feature is integrated with DJ software and hardware platforms AlphaTheta

They called out AlphaTheta, so here's hoping that it is. That would make my decision to move off of Spotify for personal streaming even easier


While I was at FB (it wasn't Meta then), I saw what a superpower the infrastructure is there. Product engineers build things of a scale in days. While I was there, I got to be tech lead for several different teams (2x distributed dbs, 1x Dev Efficiency, 1x Ads), some of which are called out by name here.

Shout out to the HBase and ZippyDB teams! This is the first public acknowledgment that ZippyDB was converged upon.

It's also super cool to see the Developer Efficiency pushes called out. 10,000 Services pushed daily, or every commit is so impressive.

When I left FB, I couldn't find anything close. So, I'm building the infra that I was missing as a startup. Batteries Included. https://www.batteriesincl.com/ https://github.com/batteries-included/batteries-included/


Good luck to you. Maybe you’ll be the next StatSig.


Thanks! They have built an impressive business and tool.


I work on a startup where the entire self-hosted SaaS is permissively licensed.

https://github.com/batteries-included/batteries-included https://www.batteriesincl.com/ https://www.batteriesincl.com/LICENSE-1.0

I started the company because I wanted to give the infrastructure team that FAANG companies have to smaller enterprises. Most of the best infrastructure is open source but too complicated to use or maintain. So we've built a full platform that will run on any Kubernetes cluster, giving a company a push-button infrastructure with everything built on open source. So you get Heroku with single sign-on and no CLI needed. Or you get a full RAG stack with model hosting on your EKS cluster.

Since most of the services and projects we're building on top of are open source, we wanted to give the code to the world while being sustainable in the long term as a team. I had also been a part of Cloudera, and I had seen the havoc that open core had on the long-term success of Hadoop. So, I wanted something different for licensing. We ended up with a license that somewhat resembles the FSL but fixes its major (in my opinion) problem. We don't use the competing use clause instead opting for a total install size requirement.

I'm happy to chat with anyone about this, my email is in my profile. Good Luck nd I hope it works for you.


I was at a conference where this was presented by John https://www.amazon.com/Journey-Profound-Knowledge-Altered-In...

It’s a fun little eye opener that starts conversations. I wish more of those conversations ended up moving decision makers


Open source has a lot of issues:

- The anchoring principal. Once you have set the that the software is free, humans expect it to be free forever and all related things are judged off the initial impressions' price bucket. Humans will never want to pay for it later. It's judged worthless.

Open-core and closed-source addons and support models have misaligned principles. The community wants things to be easy to use and opinionated, while the OSS company wants to include as many customers as possible who need or want your help with their niche choices.

- Sustainability is awful. If you start an open source project you're either going to burn yourself and the community out, or require funding to do it as a day job. So, if you want the project not to stop early, you need money to pay for developers to make software better.

- Larger companies want something opinionated but rarely what's good for most of the community. So eventually when big tech/big industry is paying for developers to work on the project, there's a point where the large company will want their cake and the community is hostage. Do that enough times and the large company forks internally and the community fractures or withers out.

Source: I was at Cloudera while the Big Data craze took off. Then, I did open source for large tech.


Hey Elliott, long time . I think the key issue is with the assumption that anything beyond the license will happen. Any assumption that there's another (moral?) contract is wrong. If the OSS is free, then the product can't be the OSS. Any unaligned incentives can put the community in conflict which is something that was common in the Apache ecosystem. So the problem is not with OSS itself, but the sustainability assumptions around some OSS efforts.


Hey Cosmin long time!

I agree the contract should be clear up front. Changing expectations later is a big problem. People want to give away the software for a while, using it as a loss leader to get attention while not being honest about their later need for money to fund the ongoing concern.

I tried to write a little bit about that in my post here: https://www.batteriesincl.com/posts/fairsource

I was starting Batteries Included and had been writing it in Elixir. I want to give back to the community, show how to use Phoenix/Live view, and be transparent about what users are running, etc. However, I also know that if this will work long-term, I can not give it away to everyone forever. So it's better to be honest about things as early as possible.

We paid a very smart lawyer to draft the best compromise we could as early as possible.

https://www.batteriesincl.com/LICENSE-1.0

This means we can develop in the open here: https://github.com/batteries-included/batteries-included while also giving it away long term and still being honest that this will require some long-term revenue stream. That revenue stream will come from the companies using it on larger installs.


> Humans will never want to pay for it later. It's judged worthless.

The latter sentence absolutely does not follow from the first. To illustrate with an example: Is Linux (the kernel, or any GNU/Linux distribution) worthless?

Plus, we should also remember that most people who use non-gratis software, don't pay for it; they just copy it. The most common examples are probably Microsoft Windows and Office.


Linux powers just about every major datacenter in there world. Every ML model was trained on Linux. However if you tried to make a company as powerful and successful as Microsoft you would fail.

Red Hat is the only company that has really made a living off of Linux. Even then, their contracts are orders of magnitudes less than the exact same customers will be paying Microsoft.

Linux is successful and remarkable and every company sees the value in having it around. So there's a shared mutual need. However, that doesn't mean that anyone can make it into more than barely scraping by as a going business.


To put this in terms of why.

When MS rolls up, they say we are charging for your usage of the MS database, Office, Outlook, Microsoft Windows 11, and the security promises. They are explicit that developing with and on Microsoft allows you access to the ecosystem. So the total bill is high, but part of that bill is a gateway into everyone else using Office, Outlook, Excel, Visual Studio, or SharePoint. The world runs on Excel and MS enterprise sales know that. They are negotiating a contract for one-of-a-kind software and access to the world of MS.

Redhat rolls up saying we want to charge you. They don't get to say that if you don't pay, the company will lose access to the software or the ecosystem. They don't get to say they are gatekeepers to other Linux users. Redhat can't claim to be giving the database or the development environment; everyone thinks they are free. If you stop paying Redhat, you probably can find an almost package for package compatible alternative in a rolling release (source: watched that happen multiple times CentOS, et al). So instead Redhat sells a contract for service, support, and indemnity. Those are great products and Red Hat will continue for a long time. They will just have very different staying power when contracts are renewed. They will have very different revenue growth.

It's not how I want it to be, just how I see it.

Source: Worked at MS and have friends who are former Redhat.


Unfortunate other fact is because RedHat is a bigger brand, its developers hold undue sway in the community and have a tendency to offload work onto others, while also resisting suggestions and particular contributions.

Happens every time they mess with a critical system and change a standard.


RedHat can be said to have foisted GNOME and systemd upon us. They don't have control, but their sway has been enough to put us in this pair of holes and we're far from getting out of them.


> Humans will never want to pay for it later. It's judged worthless.

I see classic business issue, people usually resistant to price grow and classic example to begin with free product, then ask to pay even small amount.

And unfortunately, world is constantly changing, and in large number of cases, inputs become more expensive (for example with inflation) and business have to somehow compensate costs and in many cases this just means raise price (or change something free to paid).

So most businesses constantly deal with this issue.

Have you talked about this with business analyst?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: