I modeled part of my company's business problem as a MAB problem and saved my company 10% off their biggest cost and, just as important, showcased an automated truth signal that helped us understand what was, and wasn't, working in several of our features. Like all tools, finding the right place to use RL concepts is a big deal. I think one thing that is often missed in a classroom setting is pushing more real world examples of where powerful ideas can be used. Talking about optimal policies is great, but if you don't help people understand where those ideas can be applied then it is just a bunch of fun math. (which is often a good enough reason on its own :)
For those not in the know, "MAB" is short for Multi-Armed Bandit [1], which is a decision-making framework that is often discussed in the broader context of reinforcement learning.
In my limited understanding, MAB problems are simpler than those tackled by Deep Reinforcement Learning (DRL), because typically there is no state involved in bandit problems. However, I have no idea about their scale in practical applications, and would love to know more about said business problem.
There are often times when you have n possible providers of service y, each with strengths and weaknesses. If you have some ultimate truth signal (like follow on costs which are linked to quality, which was what I used) then you can model the providers as bandits and use something like UCB1 to choose which to use. If you then apply this to every individual customer what you end up doing is learning the optimal vendor for each customer which gives you a higher efficiency than had you picked just one 'best all around' vendor for all customers. So the pattern here is: If you have n_service_providers and n_customers and a value signal to optimize then maybe MAB is the place to go for some possible quick gains. Of course if you have a huge state space to explore instead of just n_service_providers, for instance you want to model combinations of choices, using something like a NN to learn the state space value function is also a great way to go.