A naive way of creating something towards AI. It’s pretty cool.
Do you remember that vintage video game pong? Just two pads and a ball that wasn’t actually round? It looked a bit like this.
Starting positions of the three main objects in Pong. RetroFun.
Some years ago, DeepMind built a super cool model that could play 7 different Atari classics. What they did was awesome mainly because they used raw pixel data. This is a very purist approach, skipping the handcrafting feature phase in traditional Machine Learning, on Deep Learning that I find fascinating. With the right tools, this is a more powerful way than building handcrafted features.
This breakthrough got me wondering: “This super deep complex model can actually play a video game and outperform humans. But! Can a very simple model be enough to play decently against another human?”
Back when I had this idea, my skills were not enough to bring this to code but now things are different and thanks to amazing Python I could tackle it in little under seven days.
Let’s think simple. A very simple game: Pong. A very simple, if not the most simple, model: Linear Regression. Good. It actually seemed pretty straight forward:
Model: ElasticNet* Linear Regression * (fancy way of saying linear combination of Lasso and Ridge regularization)
In my head this sounded like a classical supervised learning problem. So, how should I go about it?
Simple? Yes and no. Indeed I did all that but there were some interesting plot twists I found interesting and worth sharing.
While requesting the pixel raw data using pygame is quite easy, gathering the right data required a bit more thought. After gathering raw data out of a few games. I set myself to train the linear regression. Even though the model was ‘big’, flattened images of 400 x 400, the cool SciKit-Learn module was up for the challenge of fitting the data and even do cross-validation. But as I would realize after releasing it on the dynamic environment, this first model would not be enough to have a smart opponent. The opponent would actually move very little from it’s original position. Why? The answer in the following plot twists.
Given that the first model did not succeed (I wasn’t expecting it to but I wasn’t expecting to basically stay put in the initial position!), I conditioned the data gathering a little bit. You see, raw data and the purist approach is good but not for these traditional ML techniques, you have to do a bit of work for them. So, the tricks I implemented to help it out:
These conditions seemed fair enough so that I could tell myself I wasn’t building handcrafted features but also enough so that the model could fairly play. While my first point could be true, the latter one wasn’t. The model would, again, stay very close to the initial position in the dynamic environment! What’s going on?! This was really confusing me since, on the static environment, outside of the game, the mean absolute error wasn’t big enough for the pad not to hit the ball.
In ML you always hope that the answer/variable you are trying to predict is ‘inside’ the data you’ll use as predictors or independent variables (if that wasn’t the case then why use them at all?). And while that might be true, sometimes, like this time, you don’t always want to provide it in a way that it’s actually biasing your model. That was my case until this point, the variable to predict was the position of the pad and the ‘answer’ was always encoded in the data: the pixels provided. So when deployed on the dynamic environment (predict position, draw frame, predict position, draw frame…) this was causing the model to stay really close to the initial position. It’s kind of counter intuitive but it makes sense to why the pad wouldn’t move enough to bounce the ball away.
By removing the pixels that included both pads from the input data, caeteris paribus, the model would move more ‘freely’ in the dynamic environment. Success! Only not yet, it still didn’t play well enough so that you could play against it and not beat it like you would a one year old.
After thinking what else did the model need to play fairly good (hit the ball in most angles) I realized that by feeding a single frame as input data, it’s difficult (more like impossible, even for you) to tell if the ball is moving towards or away from the end zone. This was causing the model to make a best guess but since it’s basically learning from previous behaviors (my behaviors) it can only do so much.
Away or towards the pad?
What does the model need to capture that sense of direction? More data, sequential data: windows of frames. Instead of feeding a single frame to the model, I transformed the dataset into a sequential one: providing the model three consecutive frames per training example.
This really hit the nail on the head, this last conditioning of the data made the model work quite ok for my purposes: demonstrate that a simple model can play against a human. It’s a bit choppy but does the job. As you can imagine, I’m the one on the right playing agains the guy on the left.
Me vs a Linear Regression
This was a very interesting and complete problem from a Data Science / Machine Learning point of view:
You can find the code here to play, gather data, train a model and play against an artificial player that imitates your pong style!
PS: PR’s are welcomed (: