Today we don’t need much data scientists! When we talk about data science it literally means doing experiments with data. Data is growing enormously and frankly we don’t know what could come out of it most of the time. Imagine Facebook analytic team when they run a complex query on all network interactions, they really don’t have a clue what causes are most important on playing a video by a user but they run the complex query to find out and then they test their hypothesis.
But that’s the problem of Google and Facebook. What about the rest of us?
They don’t have the platform to analyze, no connected data, no interface .. So how they’re going to use their precious asset — data?
I believe we need more engineering and development to build tools for other companies too. Then yes to data engineering! But the data engineer term is being used for connecting data management platforms, scaling, disaster recovery and etc. It doesn’t reflect the need for developing and prototyping data products with the focus on business value creation.
So I want to name our team members as “Data Developers”.
Words are pretty strong medium which cut through the noise and carefully picking them will help you build your community based on what you want to accomplish. We distance from science because there’s so much science producing each second in academic community around the world and I believe there’s much to DO. I Also don’t pick data engineer because that term is taken!
Data Developer main concern is to use “Analytic Sphere” to provide analytic insights as “ Analytic Service” or as a visualization. This requires a thorough knowledge in many computer science and software programming topics. So in this aspect, Data Developer is similar to a data scientist who must be familiar with math, AI, statistics and software development. The difference is their point of view and how they tackle the problems.
A “Data Developer” always looks for building a product, not an experiment. With current wave of producing new data analytic platforms and tools, there’s a great repository of solutions and frameworks accessible for everyone. Some people might go for doing experiments, but Data Developer aimed for building products based on this great repository. Lately open-source community has contributed so much to take software to the next level but one of its greatest influences was bringing university, business and industry closer than ever. Data Developer believes that much have been done in the scientific and academic aspect to move barriers of science but much more can be done in realizing these scientific achievements and neglecting this opportunity means missing great “could-be life moments”. Data Developer is curious to see what will happen if those scientific achievements come to life.
So a Data Developer will practically looks for building data products from bottom to top: Platform Architecture, Connecting Data, Accessing Analytics.
Data Developer is a data engineer who also is concerned about business value and is engaged in user interface. Data Developer understands business requirements and is not limited to concerns about store/retrieve/flow of data. Data Developer has tools to evaluate new ideas in his production line. What Data Developer is building is called Data Product so Data Developer needs to understands dynamics of product development and should choose carefully each feature or analytic and it’s representation. In this sense Data Developer is very concerned about how data is being used, to deliver the best experience of using analytics.
Data Developer builds prototypes and incrementally develops it.
“Analytic Sphere” is like a toolbox for Data Developer. Each Analytic Sphere has three main layer:
Platform Architecture, Connecting Data, Accessing Analytics
Platform Architecture: Is the core of Analytic Sphere.it stores data and runs processing on it. It could be in-memory or persistent storage, CPU or GPU.
Connecting Data: This layer loads and transforms data to best possible shape in core. You can find deep learning and AI applications here that are operating on data in this layer to extract meaning or to validate and clean them. This layer can be a web crawler or a data flow, batch or stream. It also gathers feedback on analytics (Recommendations) for further analytics or personalizing analytics.
Accessing Analytics: This is the surface of the Analytic Sphere and provides “Analytic Service”, the analytics as a web service for other applications. It can be a distributed queue, a REST service or just a file e.g. Excel or Powerpoint. It also visualizes the data for direct access through desktop, web, mobile or VR
Data Developer will not engage in tuning a SVM algorithm to perform 2% faster or more precise, but aims for building and connecting all layers in a data product.
Data Developer will not build a competitor for Map/Reduce, but utilizes M/R to see how it can deliver business value, or how it’s useless in case and what’s the alternative.
Data Developer is not concerned about fail-over in Hadoop at least in the first steps of development, but is familiar with the concept and guides users through their options.
Miras is building teams in different areas for delivering value by incorporating Big Data. We need clear definition of responsibilities for each person, also clear definition for recruit so we came up with this term for bringing focus and building a clear vision. We thought maybe this is a useful term for others too. Let us know if you used it too.