Data science now ranks among the most in-demand skills and resources for any data-heavy company — which, let’s face it, is most companies. There are myriad tools to help you create, ingest, process, analyze, and store vast quantities of data. As these options have increased, so have the companies that promise to make setting up using these tools easier.
Skymind combines open-source and custom tooling to create two main toolchains that integrate well with other frequently used tools in the data science tool bag.
Eclipse Deeplearning4j is a distributed deep learning library written for any JVM-based language (but also including Python APIs), primarily meant for large-scale business users.
It consists of sub-libraries, each with many sub-features:
Available as a self-hosted community and a SaaS option, SKIL is a comprehensive offering that’s hard to cover in its entirety, as it’s a flexible platform with multiple inputs, outputs, and combinations.
Unsurprisingly, it supports Deeplearning4j as well as TensorFlow, Keras, DataVec, ND4J, and RL4J, which cover a mixture of importing, normalization, and reinforcement learning.
The differences between the community and enterprise editions come at the platform level, with the community edition allowing for a limited number of models and workspaces, making the community edition suited to a single developer but not ideal for teams. The community edition also lacks the somatic feature for controlling robotic sensors.
For any of you familiar with data science and machine learning, some of the concepts I mentioned above are already familiar to you. But for those of you new to the field or wondering how Skymind implements them, here’s a rough breakdown.
Skymind requires Red Hat Linux, but for experimental purposes works well in a Docker container with the following command:
docker run --rm -it -p 9008:9008 -p 8080:8080 skymindops/skil-ce bash /start-skil.sh
You contain every project within a workspace that you create with Skymind’s GUI (and there’s a CLI tool).
Each workspace contains neural net model and data pipeline experiments, plus a Zeppelin notebook that helps you test with setups work best on a specific problem.
Create an experiment by clicking the New Experiment button.
Zeppelin could warrant an entire article in itself, but in summary, it’s an interactive notebook that allows you to experiment with code and see instant results. An initial run might take some time, but results, including logging and error information, appear below the code.
It’s at this point you start to use the DL4J API to configure and work with the neural network. As you might expect, the options available are numerous, including how it reads, normalizes, and transforms the data and the optimization algorithms used in the process. Whatever options you use, this should result in a final trained model of your data source(s) that you then save to SKIL.
You might find it useful at this point to visualize the data for an overview of its patterns. SKIL provides a handful of options to consider, including TensorBoard.
With a model defined, you can use SKIL to serve it via a REST API endpoint and mark your models as deployable or still in an experimental stage. As part of this step, you can also expose the ETL (Extract, Transform, Load) process.
With the model deployed, it’s now up to you how you use it — likely, with a client application (and SKIL is particularly good with JVM-based clients) that queries the data live and reacts accordingly to the results.
I scratched the surface of what’s possible with the Skymind platform, so for more details, I recommend you dig into the documentation. For a little more insight into the company, their origin story, and industry trends, I interviewed their CEO, Chris Nicholson.
Originally published at dzone.com.