This all started when I was asked to speak at an forum in July. It represented a great opportunity for me to talk about to a new audience. But there was a problem. I don’t know anything about financial technology! AI FinTech Machine Box My first thought was, “Google machine learning use cases in fintech”. So I did. The results were mostly about anomaly detection and fraud prevention. Great use cases for machine learning, but it is a bit of a solved problem. Given that this was a forum on AI in financial technology, I figured there would already be lots of talks from experts in anomaly detection. One of Machine Box’s using only a single photo of a face to train upon. We already have a couple of customers using to verify people, so I figured I’d draw a link between that and securing in-person credit card transactions. key use cases is face detection Facebox https://www.webroi.ca/blog/tag/boring-industry/ BOOORRINNNNGG! Yes, face recognition can be used to secure transactions, and I did end up talking about how you could accomplish this using and other methods, but I didn’t think it really spoke to the point of simplifying machine learning with better tools, which is what we’re all about. Facebox , then you know that I frequently use as a prime example of how not to use machine learning. The stock market is a highly complex, multi-dimensional monstrosity of complexity and interdependencies. Not a good use case to try machine learning on. If you follow my posts predicting the stock market But… what if you predict the stock market with machine learning? could The first step in tackling something like this is to simplify the problem as much as possible. I decided to make it a two-class problem; given some input, the market either goes up or down. And I limited to the Dow Jones Industrial Average. the market What would make a good input? I decided, somewhat arbitrarily, upon news headlines as the input. I would train a on as many news headlines for a given day using natural language processing, sorting them into one of two classes; the market went up after the headlines, or, the market stayed the same or went down. classification model Now comes the most difficult part; gathering the data. VERY fortunately, a quick Google search revealed excellent dataset. It is a giant table of news headlines, labeled with their the Dow Jones’ performance that day. this So, 5 minutes into the process, I had a glorious dataset and a plan. Next came the execution. Because my developer skills are extremely limited, I decided to make life easy on myself and use , which iterates through folders looking for labeled data, and then trains automatically. But in order to run it on files and folders, I had to first convert the dataset into many tiny text files containing the text of the headlines, and put them into folders labeled either 0 or 1, indicating upward or downward movement in the Dow Jones. This is the script I used to perform that probably unnecessary task for real developers. this tool Classificationbox Modifying this script for the dataset took about 10 minutes. Running it took less than 30 seconds. The next step is to run on the folder with all the data in it. This will train with a random selection of 80% of the data. It will then verify the model with the remaining 20%. Textclass Classificationbox The result; 54% accuracy. Normally, an accuracy that low means your model isn’t useful. You need something like 80% to get to a place where the model starts to make sense for use in the real word. But when I told a room full of financial people that the model had a 54% accuracy, I expected a chuckle, instead, I got very straight faces. A few seconds later, someone said, somewhat under their breath, “You could sell 4%”. only Perhaps. But my conclusion was that news headlines can’t predict the Dow Jones, at least, with the dataset I had. I highly recommend you give it a try and see what results you get. Maybe I made an error in my script!