Diving deep into Supervised Learning š This is Part 2 of the ongoing series . Hereās . Machine Learning with JavaScript Part 1 Itās kNN time. kNN stands for , which is a Supervised learning algorithm. It can be used for classification, as well as regression problems. First, we are gonna say hello to kNN but if you want, you can skip ahead to the code. GitHub Repository: . k-Nearest-Neighbours Machine Learning with JS How does kNN Algorithm work? kNN decides the class of the new data point based on the maximum number of neighbors the data point has that belong to the same class. If the neighbors of a new data point are as follows, : , : , : , then the class of the new data point will be . NY 7 NJ 0 IN 4 NY Letās say you work at a Post Office and your job is to organize and distribute letters among Postmen so as to minimize the number of trips to the different . And since we are just imagining stuff, we can assume that there are only seven different neighborhoods. This is a kind of classification problem. You need to divide the letters into classes, where classes here refer to Upper East Side, Downtown Manhattan, and so on. neighborhoods Photo credit If you love wasting time and resources, you might give one letter from every neighborhood to each Postman, and hope that they meet each other in the same neighborhood and discover your corrupt plan. Thatās the worst kind of distribution you could achieve. On the other hand, you could organize the letters based on which addresses are close to each other. You might start with āIf itās within a block range, give it to the same Postman.ā That is where comes from. You can keep increasing the number of blocks until you hit an efficient distribution. Thatās the most efficient value of k for your classification problem. three number of nearest blocks **k** So, based on some parameter(s), like the address of the house here, you classified whether a letter belongs to Downtown Manhattan, Times Square, et cetera. (I am not good with names, so) kNN in practice |Ā Code As we did in the last tutorial, we are going to use ml.jsās KNN module to train our kNearestNeighbors classifier. Every Machine Learning problem needs data, and we are gonna use the IRIS dataset in this tutorial. The consists of 3 different types of irisesā (Setosa, Versicolour, and Virginica) petal and sepal length, along with a field signifying their respective type. Iris dataset Install the libraries Step 1. $ yarn add ml-knn csvtojson prompt Or if you like npm $ npm install ml-knn csvtojson prompt : k Nearest Neighbors [ml-knn](https://github.com/mljs/knn) : Parse data [csvtojson](https://github.com/Keyang/node-csvtojson) : To allow user prompts for predictions [prompt](https://github.com/flatiron/prompt) Initialize the library and load the Data Step 2. The Iris dataset is provided by the University of California, Irvine and is available . However, because of the way itās organized, you are gonna have to copy the content in the browser ( ) and paste it into a file named You can name it whatever you want, except that the extension must be . here Select All | Copy iris.csv. .csv Now, initialize the library and load the data. I am assuming you already have an empty npm project set-up, but if you are not familiar with it, a quick intro. hereās The are used for visualization and understanding. They will be removed later. header names Also, is used to split the data into training and test datasets. seperationSize Cool, eh? We imported the package, and now we are going to use its method to load the data. (Since our data doesnāt have a header row, we are providing our own header names.) csvtojson fromFile We are pushing each row to the data variable, and when the process is done, we are setting the to times the in our dataset. Note that, if the size of the training samples is too low, the classifier may not perform as well as it would with a larger set. seperationSize 0.7 number of samples Since our dataset is sorted with respect to types( to confirm), the function is used to, well, shuffle the dataset to allow splitting. (If you donāt shuffle, you might end up with a model which works fine for the first two classes, but fails with the third.) console.log shuffleArray Hereās how it is defined. I got it from an answer over at verflow. StackO Dress Data (yet again) Step 3. Our data is organized as follows: {sepalLength: ā5.1ā,sepalWidth: ā3.5ā,petalLength: ā1.4ā,petalWidth: ā0.2ā,type: āIris-setosaā} There are two things we need to do to our data before we serve it to the kNN classifier: Turn the String values to floats. ( ) parseFloat Turn the into numbered classes. (Computers like numbers, you know?) type If you are not familiar with Sets, they are just like their mathematical counterparts, as in they canāt have duplicate elements, and their elements do not have an index. (As opposed to Arrays.) And they can be easily converted to Arrays using the operator or the by using the Set constructor. spread Step 4. Train your model and then testĀ it Data has been dressed, wands at the readyāāāExpelliarmus: The method takes two mandatory arguments, the input data, such as the Petal Length, Sepal Width, and itās actual class, such as Iris-setosa, and so on. It also takes an optional options parameter, which is just a JS object that can be passed to tweak the internal parameters of the algorithm. We are passing the value of as an option. The default value of is . train **k** k 5 Now that our model has been trained, letās see how it performs on the test set. Mainly, we are interested in the number of misclassifications that occur. (That is, number of times it predicts the input to be , even though itās actually .) something something else The error is calculated as follows. We use the humble for-loop to loop over the dataset, and see if the predicted output is equal to the actual output. Thatās a . not misclassification Step 5. (Optional) Start Predicting Itās time to have some prompts and predictions. Feel free to skip this step, if you donāt want to test out the model on new input. Step 6. Boom-shaw-shey-Done. š If you followed the steps, this is how your index.js should look: Go fire a terminal š», and run node index.js. $ node index.js Test Set Size = 45 and number of Misclassifications = 2prompt: Sepal Length: 1.7prompt: Sepal Width: 2.5prompt: Petal Length: 0.5prompt: Petal Width: 3.4With 1.7,2.5,0.5,3.4 -- type = 2 Well done. Thatās your kNN algorithm at work, classifying like a charm. š¹ All the code is on Github: machine-learning-with-js A huge aspect of the kNN algorithm is the value of , and it is referred to as a . Hyperparameters are a, and I paraphrase from answer on Quora, ākind of parameters that cannot be directly learned from the regular training process. These parameters express āhigher-levelā properties of the model such as its complexity or how fast it should learn. They are called .ā k hyperparameter this hyperparameters k defines how many blocks in the neighborhood of the address should be considered to classifyĀ it. Photo credit I am working on the module and hopefully, the process of choosing k will be automated pretty soon. ml-knn If you are kinda excited and want to see what this can do, you can go to and use your classifier on a different dataset. (That repository has ) UC Irvine Machine Learning Repository hundreds . To get the latest articles in this series, keep an eye on my profile, or you could cut yourself some slack and follow me. PS: š ! If you liked it, hit the to let others know about how powerful is and why it when it comes to Machine Learning. Thanks for reading green button ā¤ļø JS shouldnāt be lagging behind
Share Your Thoughts