Hello, in the first article of this series, I will examine the KNIME data science software. I will also explain how to install the KNIME application, which is written in Java and built on Eclipse. This application can prepare data science and artificial intelligence applications with workflow-style visual programming, and it can be configured on widely used GNU/Linux systems. I will review its usage with a machine learning application and save the application I prepared in a Github repo. KNIME KNIME Analytics is a data science environment written in Java. This software allows visual programming in the form of a workflow using various nodes and is an application that allows to development of a data mining application even without knowing advanced coding. It has a very diverse and rich plugin center and is used quite often in academia as well. It is an extensible data science platform where user-created scripts and codes can be used as well as visual programming. KNIME is a cross-platform software that supports installation on many operating systems. You can click the to download. In this article, I will review KNIME for GNU/Linux distributions only. link Configuration You can extract the archive of the KNIME application to the desired location. tar.gz ~$ tar xvf knime-analytic-?.?.?.tar.gz -C /opt/knime All libraries and plugins for KNIME are available in the directory. There is a binary file named and you can start the application using it. But it needs to work. Knime java JAVA 11 KNIME Analytics works with Java 11 and higher. For this reason, you can install either or packages for KNIME. Depending on the distribution you use, you can easily download and install these JDKs from the main repository. openjdk-11 openjdk-latest For example, for RHEL based ones; ~$ sudo dnf/yum install java-11-openjdk # or java-latest-openjdk Or Debian based ones; ~$ sudo apt install openjdk-11-jdk # or default-jre Another option I can suggest here is the liberica JDK versions published by Belsoft. These are open-source and freely distributed Java JDK versions. You can download and install JDK with many options such as standard and full package. You can find installation scripts or archives [on this page] ( ) for many operating systems. For KNIME Analytics, standard JDK-11-lts is sufficient. For Linux, we need to download the tar.gz package and extract it to a directory. Then the and variables are updated and the Java installation is completed. https://bell-sw.com/pages/downloads/#mn JAVA_HOME PATH ~$ mkdir -p ~/libertica-jdks ~$ tar xvf /path/to/downloaded-jdk.tar.gz -C ~/libertica-jdks/ ~$ export JAVA_HOME=~/libertica-jdks/jdk-11.?.? ~$ export PATH=$PATH:JAVA_HOME/bin ## after this ~$ java -version JDK is available in this session. If you want to use it in this location all the time, you should add the lines exported for JAVA_HOME and PATH to your or file if you are using zsh. .bashrc .zshrc After the Java installation, it is sufficient to go to the directory where KNIME is installed and run the script. If you have more than one JDK installed on your system, you can specifically give the path to JDK-11 with . ./knime ./knime -v /path/to/jdk-11/bin If you want to create a for this; dektop entry [Desktop Entry] Type=Application Name=KNIME Analytic Platform Description=Data Science Environment Exec=/path/to/knime-folder/knime_4.?.?/knime Icon=/rpath/to/knime-folder/knime_4.?.?/icon.svg Categories=Development;Science; terminal=false You can create a file named and save the entry to the relevant location. If you have a problem with the exec command. You can give a shell script that runs the ./knime script as a parameter in the command. If you are using Wayland as a display server, you may have problems with the application. For this reason, you can add the line to your configuration file or or whatever shell you are using, or inside your script file that starts KNIME. If you're already using x11, you won't have any problems. For example, the shell script that starts KNIME; ~/.local/share/application/knime.desktop exec export GDK_BACKEND=x11 bashrc zshrc #/bin/bash export GDK_BACKEND=x11 ./knime -v /path/to/java-jdk-11/bin may form. Extension Installations The KNIME application had many extensions. The high number of users and its extensibility are the most important factors that bring it to the top, among other data mining applications. You can open the interface by selecting from the menu in the application. You can install it by searching for the add-on you want. For example, you can install the plugin that should be used to connect to the Twitter API and pull data. It will be installed with the necessary dependency packages. The application will need to be restarted before it can read new plugins. Install KNIME Extensions... File KNIME twitter connector Then you can search for the node you want from the section and add it to your project. Node Repository Machine Learning Application with KNIME In this section, a ready-made dataset will be used to prepare a machine learning application. A data mining workflow will be created with KNME nodes by using the dataset named in the . The project is created with from the menu. HCV Data Dataset UCI Machine Learning Repository new project File Node Repository I add my file to the project by searching or from the section. Similarly, I search for the transactions I want to do and add them to my project and run them, and at the end of each operation, I give the output as the input of the next node. file reader csv reader Node Repository When I right-click on the nodes, I configure the operation and then run it. When I right-click again, I can see the output of the node as a result of the operation under the menu. The node below outputs the normalized table and model. Normalize The operations performed in this application can be seen as follows. The operations performed in this application can be seen as follows. In the application, the data in the CSV file was first read with the node and the categorical values such as were given to the node to be converted into numbers. The output of this node is given to the node to extract the Id and categorical gender values. The output dataset is given to the node. The dataset with 0-1 normalization was given to the node to be divided into 70% training and 30% testing. A model was created by giving the training dataset from the parts to the node. This model and test dataset was tested by giving it to the node. And the output of the node has been transformed into a table with the comparison matrix and score values related to the performance in the node. csv reader sex category to number column filter Normalize Partitioning RProp MLP Learner MultiLayerPerceptron Predicter Scorer When our perceptron model was tested with 1 hidden layer and 10 neurons, it was seen that our comparison matrix and performance values, respectively; It seems to be a very successful model, here the dataset was not raw data. The performance of the data preprocessing on the data set was 98% accurate. Preprocessing is an important factor for model performance. If you create a model with the raw data set, you can see that you cannot achieve high success. Due to the preprocessing of this dataset; Reviewed with R Sample dataset review with Smote Outlier cleaning with R boxplot Second-time dataset with SMOTE balancing You can find this project and its dataset in the repo in the link. Resources and Project Github repo HCV Data Dataset Originally published . here