Avinash Royyuru

@roy100

Machine learning: preventing unintended consequences

August 23rd 2017

TGIF, I’m heading home from work and looking forward to the weekend. I find a seat on the train, bite into my sandwich and open Candy Crush. I haven’t played it since Monday. I have been stuck at level 76 and had given up.

I start playing and one of three things can happen:

1. This tough and once challenging level is now easy to pass on the first attempt
2. Yep, still difficult. I lose three times in a row
3. I play once and lose. The second time, it’s much easier and I win, woo!

In the first case, I feel like the rules have changed. Because I lost several times, “they made it easier” when I came back so that I keep playing the game.

In the second one, either the level configuration is unsolvable (even with perfect play) or the level is solvable and I don’t play a perfect game. Either way, I’m a bit frustrated.

The third case is like the first one, with an increasing win-rate gradient. The level gets easier each time I play it — I feel cheated.

Players are well aware that this is a game and someone else sets the rules. Yet, they may not take it too kindly that “they” are messing with the game.

When using machine learning to dynamically change game difficulty, drastic changes in difficulty may occur. Ensuring a good, consistent player experience is ahem..difficult.

The Contract

Games have an implicit contract with players. Players know that they are operating within someone else’s rulebook — usually the game designers’ — and games don’t abuse this trust. This is especially tricky in games that are inherently probabilistic, like casino games and match-3 games. Game designers have a tough balancing act to perform between the ‘right’ amount of difficulty to present to the player and preserving their trust.

Dynamic game difficulty can feel like changing the rules of the game. Such changes in rules can break trust with players and compromise retention, which was the purpose of dynamic game difficulty in the first place. No one likes to be manipulated.

Drastic changes in difficulty leads to bad user experience.

It is tempting to try to anticipate unintended model behavior and attempt to change the model itself to cater for this behavior. This can be done by leaving out undesirable behavior in training data. In general, this is not a good idea because this can lead to unintended consequences in other predictions which are much more difficult to detect.

Policy Layer

One approach is to introduce a policy layer with rules to prevent situations like this. A policy to prevent a wide variance in difficulty could be:

‘Don’t vary win rate over a 7 day period by more than 10% for a user’

The policy layer is only used to prevent undesirable user experience, it is not a substitute for model predictions. If the output of the model violates a policy, the output is discarded and a fallback is used instead. The policy layer may feel like arid sentinels to the kingdom who pull up the drawbridge at the first sign of threat. However, instrumenting the policy layer gives us important information. Measuring policy violations and the circumstances leading up to each violation helps inform future product decisions and policies.

The models, along with the policy layer, inform core product experience and are responsible for meeting product goals such as retention and monetization. The policy layer and the models are independent of each other and can be updated separately. This gives us additional flexibility to control product behavior.

Policies encode product decisions.

It is important to keep the policy layer light. The policy layer can quickly devolve into a complex set of hand crafted rules and before you know it, the policy layer is responsible for most of the product decisions, instead of the models.

The policy layer is the smile on the barista’s face as they hand you the cappuccino. The coffee comes from machine learning. The smile makes customers feel good, but the coffee is your core product. You need a healthy balance of both to keep customers coming back.

At Product ML, we believe that all products in the future will be dynamic. We’re building a platform that redefines product management and user experience, starting with dynamic difficulty in games using machine learning.

More by Avinash Royyuru

More Related Stories