Thank you so much for this feedback! Indeed, this is definitely confusing in the notebook. I pushed a small commit to make it a little bit more clear that the non-determinism comes from the probabilistic nature of the environment dynamics (and not b/c the agent chooses a different action by mistake).
As a side note, initially I meant to go through it in a video to fill the gaps in the text with my voice. But given that I didn't have time for those, I am fixing those gaps first :) Thanks again!
As a side note, initially I meant to go through it in a video to fill the gaps in the text with my voice. But given that I didn't have time for those, I am fixing those gaps first :) Thanks again!