Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The creator admits it early on -- it's measuring rarity based on the specific notation everyone uses, which greatly influences the classification of rarity.

Fundamentally all chess moves are a piece moving from one source to another destination including:

- castling as a king move with a distance greater than 1

- pawn moves to the 8th or 1st rank with the additional datum of a new piece

- en passant is the same as a regular pawn capture, it just requires the victim pawn to have moved two squares previously.

Algebraic notation also has an arbitrary and reasonable amount of extraneous detail despite dropping the source location if it's unambiguous.

For example, the captures (x), check(+) and checkmate(#) symbols are all unnecessary given the previous state of the board is always known. With en passant it's also unnecessary to have a special symbol indicating an en passant capture, and indeed there isn't one.

I was initially hoping to get some insight on e.g. which pairs of squares had the fewest moves for a given piece etc.

That being said, I thoroughly enjoyed the video. It was beautifully illustrated and explained everything clearly.



"I was initially hoping to get some insight on e.g. which pairs of squares had the fewest moves for a given piece etc."

This may not be quite what you were asking for, but it's close, and has the advantage that I can link it right now. Tom7's Elo World chess video has where pieces start and end up, and their survival rates, as a chart: https://youtu.be/DpXy041BIlA?si=Zdh6Rh6mekatp2-q&t=815


He did find a move that occurred a single time including the specific game that included it. He also showed many moves that occurred zero times from every single game played on lichess.org.

So, depending on your definition either could reasonably qualify. Which you pick as the rarest is simply an arbitrary definition.

You could consider different notions, but run into the issue of defining what is unambiguous. IE You could say e2 to e4 is unambiguous for a given game state but that would imply game state must be included in the definition for of a move. Defining what the minimum game state is for an unambiguous move would be a video of its own.


What occurred once is not what chess programmers would call a “move”, but rather what chess players would call “move”.

His definition of a move is one ply of algebraic notation. From a chess programming perspective algebraic notation is just a data format and doesn’t have any greater significance.

In programming terms a move is a data structure that allows you to derive one position from another according to the rules of chess.

In Stockfish a move is a 14 bit number, the first 5 bits are the destination square, the next 5 bits are the origin square, the next 2 bits are the promotion piece, and the last 2 bits are the move type (normal, promotion, en passant or castle)


I was wondering on how Stockfish encodes the destination and origin square in 5 bits each. I think they don't, at least the code on github uses 6 bits each, which actually gives you 64 possible values, so works out fine:

https://github.com/official-stockfish/Stockfish/blob/master/...

Thats 16 bits. Thank you for sending me down that (small) rabbit hole!


Yes you are correct, I had 2^5 = 64 in my head for some reason.


Comparing games with that notation alone would make moving your queen e2-e3 in game A the same as moving a pawn e2-e3 in game 2.

I don’t think anyone would actually agree those are the same move.

What Stockfish considers a move includes the full board state, it simply doesn’t need to pass that information around internally. Thus removing that ambiguity but means there’s a great number of moves that have only occurred once.

Also, it’s 6bit + 6bit + 2bit + 2bit = 16 bits which isn’t an arbitrary number. There’s no need to actually encode that something is a promotion because it can be inferred from the board state, but there’s zero cost to pass an extra bit around so it’s included anyway.


Some sources do write ep after en passant captures. As you point out, it’s no more redundant than notating checks.


Notating checks is not even redundant; it can disambiguate which piece is to move without additional information (e.g. Rac1 and Rhc1; only one of them might give a discovered check, so Rc1+ could then be an unambiguous notation where the check is not redundant). The PGN spec is clear that SAN disambiguates legal moves and not pieces (if moving one of those rooks would put yourself in check, you should not disambiguate when you move the other one), but I don't know whether it considers the check part of the move for those purposes.


I see what you mean obviously, but neither of those moves could possibly give a discovered check, right? If the rook starts in the corner of the board, nothing can hide behind it or attack from behind it.


Point, I should have written Rbc2, not Rac1.


It'd be a particularly cool position if, say these three moves were legal and distinct:

- Re4

- Re4+

- Re4#

EDIT

Ok, this seems to do it:

8/3Q4/4R3/6pp/2R2Pk1/6P1/4R1K1/3B4 w - - 0 1

Annoyingly, Lichess uses the rank or file notation in all cases.


I looked up; PGN 8.2.3.5 says:

> Neither the appearance nor the absence of either a check or checkmating indicator is used for disambiguation purposes. This means that if two (or more) pieces of the same type can move to the same square the differences in checking status of the moves does not allieviate the need for the standard rank and file disabiguation described above. (Note that a difference in checking status for the above may occur only in the case of a discovered check.)


Right, you'd need to look at board state transitions as opposed to move notation.

I'd imagine remarkably foolish moves from board states that only quite sophisticated users would get to would be up there


Presumably there are a mind-bogglingly-huge number of unique board state transitions. It's virtually impossible for the same game of chess to be played twice, except for silly scholar's-mate type games. Almost every single game in a chess database will have many unique board state transitions.


I bet it's way less than you think - orders of magnitude.

What could happen versus what does happen are entirely different.

Doing some algebraic permutations computation here would be like claiming a 50,000 letter English document has 27^50,000 possibilities.

I mean no, there's words, and they only go in certain orders and there's all these rules.

Here's another approach: humans are pretty lousy when remembering large amounts of anything so let's say there's in practice, only been 100,000 unique games played over and over. Without the help of a computer or careful tabulation, I'm pretty sure no human would realize it because no human can remember 100,000 unique games.

Anyways, it's worth digging into the data to see what the variation really is. I bet the 90th percentile is embarrassingly small with a long tail that's far shorter then most think

edit: so I actually took 7.7 million games from https://www.ficsgames.org/download.html and did some basic processing on them. These are people ostensibly with ELOs over 2000 which is pretty decent just to see if I'll eat crow on this one.

Going in, I was expecting a uniqueness level to be something like 50-70%. Actual percentage of unique games over 7.7 million? 98.7%.

Alright fine.

Although I could try to do 1 billion games, I expected the distribution to be readily visible around 7 million.

Now as an artifact of the data, I made the games as compact as possible, potentially leading to ambiguity maybe. So a game might look like so

    a3a5b3b6Bb2Bb7c4e6Nc3Nf6d4Be7Nf3O-Oe3c5d5exd5Nxd5d6Nxe7+Qxe7Bd3d5O-Odxc4Bxc4Nc6Qe2Rad8Rad1Ne4h3Nd6Ba6Bxa6Qxa6Qb7Qd3f6Qc3Rfe8Rd3Ne4Qc4+Kh8Rxd8Nxd8Rd1Qe7Qd5Nf7b4axb4axb4cxb4Qc6h6Rd7Qf8Qxb6Rd8Rxd8Qxd8Qxb4Qe8Bd4Kg8Qc4Nd6Qd5Kh8Nh4Kh7Qb3Qe4f3Qe6Qd3+f5e4g6exf5Nxf5Nxf5Qxf5Qxf5gxf5Bf2Kg6g4fxg4hxg4h5Kg2hxg4fxg4Ne5Kg3Nf7Kh4Ne5Be3Nxg41/2-1/2
Given this we can just run uniq with incrementing numbers and find out how things increase. I'm doing this on a pretty old laptop (3rd gen intel) so excuse me for cutting things a bit short

number of characters / unique entries / percentage duplicates

         99 7470034 0.039416039621657628
         98 7462496 0.040385363441781119
         97 7454437 0.041421683508957363
         96 7446241 0.042475620631500677
         95 7437517 0.043597454142611958
         94 7428583 0.044746291899176449
         93 7419379 0.045929849399894973
         92 7409325 0.047222709798876217
         91 7399016 0.048548361067336399
         90 7388091 0.049953224789125783
         89 7376939 0.051387278814333581
         88 7364995 0.052923177422393386
         87 7352785 0.054493281408027117
         86 7340118 0.056122151775432672
         85 7326671 0.057851323625950024
         84 7312421 0.059683754567414482
         83 7297272 0.061631789397747494
         82 7281670 0.063638076243272224
         80 7247260 0.068062914748240111
         79 7228770 0.07044057426456829
         78 7209233 0.072952869233227302
         77 7187947 0.075690070989017588
         76 7166719 0.07841981442939705
         75 7144539 0.081271977115830896
         74 7119854 0.084446261873027284
         73 7094951 0.087648579608837096
         72 7068178 0.091091363720824936
         71 7039566 0.094770627867995505
         70 7009607 0.098623104961001351
         69 6978335 0.10264442288391196
         68 6944221 0.10703119826195528
         67 6909408 0.11150785919986417
         66 6871309 0.1164070722832925
         65 6831848 0.12148142718723132
         64 6790088 0.12685141428305979
         63 6744878 0.13266504255419009
         62 6698251 0.13866088518630681
         61 6648086 0.14511168505848671
         60 6594883 0.15195314634822232
         59 6540926 0.15889156573829932
         58 6481469 0.16653723917595897
         57 6419070 0.17456122923325301
         56 6355249 0.18276807660975847
         55 6282195 0.19216221064468775
         54 6209896 0.20145925798763076
         53 6133584 0.21127234360201919
         52 6048154 0.22225792783565479
         51 5963420 0.23315401228435984
         50 5870519 0.24510030469790289
         49 5770633 0.25794480975187595
         48 5668750 0.27104611232094422
         47 5558836 0.28518013439112821
         46 5443035 0.30007117547551587
         45 5323607 0.31542861845637304
         44 5192507 0.33228698311784588
         43 5064129 0.34879532132158775
         42 4925937 0.36656565792950735
         41 4777302 0.38567887708631909
         40 4626169 0.40511331817237839
         39 4465105 0.42582480288508218
         38 4301802 0.44682420429097458
         37 4126391 0.46938059333470927
         36 3948981 0.49219403707682896
         35 3770310 0.5151696348833128
         34 3581328 0.5394711411415466
         33 3387960 0.5643366503548165
         32 3202702 0.58815928132701434
         31 3008299 0.61315788289287476
         30 2810171 0.63863548833641626
         29 2617450 0.66341779875536144
         28 2413377 0.68965988152851743
         26 2044415 0.73710531205655982
         25 1849362 0.76218749819167997
         24 1674682 0.78464988674290859
         23 1499682 0.80715342462054207
         22 1324053 0.82973784664289008
         21 1167329 0.84989124361622848
         20 1005747 0.87066933880105002
         19 862832 0.88904701374837569
         18 733254 0.90570966192613567
         17 599187 0.9229495579983682
         16 489782 0.93701812692123954
         15 395141 0.94918816879710877
         14 300280 0.9613865008348812
         13 233128 0.97002168698093183
         12 167915 0.97840753392729818
         11 115118 0.98519678700915769
         10 78474 0.98990889924908909
         9 46588 0.9940091724420389
         8 26581 0.99658190548385495
         7 14316 0.99815908200996462
         6 5390 0.99930689103336889
         5 2659 0.99965807481590496
         4 408 0.9999475346088339
         3 180 0.99997685350389731
         2 20 0.99999742816709969
         1 9 0.9999988426751949
It would be interesting to see later, when I crunch larger numbers on a more capable machine, if these distributions generally hold. Of course it won't, it's not possible. But I'm wondering if it's greater than what the Shannon limit would predict.

An ancillary analysis would be to compare it to the possible legal permutations at a character count although this would of course require a board and rule model.

I would expect those percentages to decrease as the length increases and perhaps such a function can give more predictive heft to the actual "language" of chess in practice


It's also worth noting that unique string != board state.

Proof: Both black and white could move the left rook pawn as their first move and right rook pawn as the second move.

Now reset the board and do right rook first and left rook second. Same board state, different game string.

In practice unique board states is a strict subset of number of moves but given how far off I was on my first assessment ... I wonder if we're talking about another < 2% hit.

All of this is dependent on an actual engine that can process the notation. There's apparently lots of options for pgn.

I'd also like to develop a heatmap based on statistical analysis. I'd imagine this would not only be way less than equally distributed but there'd be no way to slice the data to make it appear equally distributed


Funny enough, en passant is the only capture that takes a piece not in the destination square.


It would be fun to have a chess variant where en passant applied to every move:

- you play bishop a3×e7 taking my queen

- I reply with bishop a7×c5, taking your bishop en passant, getting my queen back (your bishop got taken before it reached my queen)

- you reply with knight a4×b6, taking my bishop while it’s on route to intercept your bishop that took my queen. You get back your bishop, it does end up on e7, and I do lose my queen again

- I reply, taking your knight while it moves through a5. Your bishop dies again, I get back my queen.

- etc.

For knight’s moves, I think you’d have to either make a hard rule as to what square they move over, or let the player say how they moved on every move. Also, two pieces could be taken in one move (a piece on the target square and the knight that just hopped over it)

Standard chess already has some of this in the rules for castling. There, you aren’t allowed to move your king through a position where it would be attacked by an enemy piece. That’s like saying it can be taken en passant.



> I reply, taking your knight while it moves through a5

Knights don't move through intermediate spaces. They are unlike every other chess piece in that regard.


That's arguable. The motivation for the rule, and especially the name of the rule, suggest the pawn is not all the way there yet.


In what sense is the pawn not all the way there? It occupies the square, prevents any other piece from occupying the square, can deliver check or checkmate from the square, and can be captured on the square.


The OP refers to the fact that “en passant” is french for “in passing”, so the move sort-of refers to the idea that the pawn takes the other pawn while it is passing through the third or seventh row, as if the capture starts while the previous move still is in progress.

Also, the pawn can’t deliver checkmate, can it, if it can be taken en passant? It probably is possible to construct a position where taking en passant would bring the king into check in another way, but in those cases, the en passant move isn’t possible.


> Also, the pawn can’t deliver checkmate, can it, if it can be taken en passant? It probably is possible to construct a position where taking en passant would bring the king into check in another way, but in those cases, the en passant move isn’t possible.

I believe I've managed to construct a situation where this is the case. The key is that the pawn that would be able to take en passant is being pinned (e.g. by a queen or rook) with the king directly behind it, such that the pawn cannot perform any captures. Then, you just need to make sure all of the squares adjacent to the king are threatened, and finally actually put the king into check via a pawn advancing two squares.

Technically, the c4 pawn cannot be taken en passant (i.e. this is an illegal move) because it would expose the black king to a different check. But I think this is in the spirit of your question.

Contrived game (which you should be able to import into https://www.chess.com/analysis or https://lichess.org/analysis):

1. e4 d6 2. e5 Kd7 3. h4 Kc6 4. Rh3 d5 5. a4 d4 6. Raa3 Kc5 7. Rhg3 Kb4 8. Rg4 Qe8 9. Rag3 Qd8 10. Rg5 Kxa4 11. Rd3 Kb5 12. Ra3+ Kb6 13. d3 Kb5 14. Be2 Kc5 15. Bg4 Qd5 16. Rb3 Qe4+ 17. Ne2 Qxd3 18. Qxd3 Kd5 19. Rb6 a6 20. Nbc3+ Kc5 21. Na4+ Kd5 22. c4#

Or, if you just want to see the end state: blob:https://www.chess.com/20486c7e-a582-41ef-9f1a-bb6fdea2ca36 (seems to work in Chrome, not in Firefox)

(also: https://lichess.org/editor/rnb2bnr/1pp1pppp/pR6/3kP1R1/N1Pp2...)


Idioms do not have to be interpreted as literal, so I don't understand why anyone would think this. As I understand it:

https://lichess.org/editor/rnbqkbnr/pp3ppp/8/3pp3/2pPP3/4K3/...

the King is never in check, for the purposes of the game. The piece removed isn't in the destination square.


It's because pawns used to be able to move only one square. En passant was created when they were allowed to move two squares, sort of pretending that it only moved one square and is why you can only do it immediately after the first pawn move, kind of where the pawn "should" be.


In the sense that a pawn that's in the perfect position can strike while it is "passing", but if that doesn't happen then it finishes the move and it's too late, the opportunity is gone forever.


Sure, but we don't have the same rule for a rook or bishop "passing" through a square where a pawn could strike it in passing.

It's just it's own unique rule, born from a period of transition between pawns only being able to take one step and being able to take two.


Though castling does take all kinds of strikes in passing into account for the king.


True. It's as if any piece could kill the king in transition.

And yet other pieces can't kill that pawn in passing. Only another pawn.

We shouldn't try to find too much logic.


Clearly there's some Heisenberg uncertainty principle where the pawn occupies both the third (or sixth) and fourth (or fifth) rank, in a kind of superposition that only an opponent pawn situated in a certain position would be able to observe.


I think the logic is based on pragmatism. A different piece has a chance of capturing the pawn later, but a pawn would never be able to since it can't go backwards.


That's a good way to look at it.

Practically, game engines model this with an extra state variable in the tree indicating the capturable square.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: