I modeled part of my company's business problem as a MAB problem and saved my co...

smokel · on May 6, 2024

For those not in the know, "MAB" is short for Multi-Armed Bandit [1], which is a decision-making framework that is often discussed in the broader context of reinforcement learning.

In my limited understanding, MAB problems are simpler than those tackled by Deep Reinforcement Learning (DRL), because typically there is no state involved in bandit problems. However, I have no idea about their scale in practical applications, and would love to know more about said business problem.

[1] https://en.wikipedia.org/wiki/Multi-armed_bandit

jmward01 · on May 6, 2024

There are often times when you have n possible providers of service y, each with strengths and weaknesses. If you have some ultimate truth signal (like follow on costs which are linked to quality, which was what I used) then you can model the providers as bandits and use something like UCB1 to choose which to use. If you then apply this to every individual customer what you end up doing is learning the optimal vendor for each customer which gives you a higher efficiency than had you picked just one 'best all around' vendor for all customers. So the pattern here is: If you have n_service_providers and n_customers and a value signal to optimize then maybe MAB is the place to go for some possible quick gains. Of course if you have a huge state space to explore instead of just n_service_providers, for instance you want to model combinations of choices, using something like a NN to learn the state space value function is also a great way to go.