8 Lessons For Building Data Companies On Solid Ground

Written by sbalnojan | Published 2023/09/20
Tech Story Tags: startup-advice | startup-lessons | business-strategy | data-companies | building-a-data-company | apache-spark | firebolt | esteban-sosnik

TLDRSuccessful data companies need to get a lot of things right. They need to build on solid ground. Solid ground can mean marrying research and tech; it often means leveraging unique approaches to data, solving systems instead of problems, going all in on data acquisition, and building explanations before the product. via the TL;DR App

"Startups die by suicide, not by homicide." - Paul Graham (probably)

Startups might fail for a variety of reasons, all the time.

But that doesn't mean nothing else matters, it just means, if you don't kill yourself, you have time to figure out the other stuff, the stuf you need to get right.

There are lessons hidden inside the stories of successful data companies, lessons about the data market and data products.

You can learn about financing and in general running start ups everywhere, but the following eight lessons are specific to the data market, be it your infrastructure databricks-like company, your generative AI-using shag or your whatever you have up your sleeves that relies heavily on data.

Here are the lessons

(1) Marry research and tech

Google built its flagship product for the first decade on a single academic publication about the so called "PageRank".

Databricks was founded right out of university by the creators of Apache Spark as a commercial wrapping around it. Apache Spark in itself was an impressive research project at Berkley in parallel computing.

Both companies married smart and heavy research with technology. Both of them kept on pushing scientific boundaries while building out a business. And both of them keep on profiting from their scientific advantage over the competition.

It's the strategy companies like Firebolt are trying to follow.

If you're building a new product in the data space, don't build what others are building. Build up a competitive advantage by basing it on smart research, by making smart research available to the people.

Note: This only works if you keep the edge, that means if you keep pushing the edge. It's not clear that firebolt is doing that, but it is clear, that Google and databricks continually push forward on scientific new research propelling them forward.

(2) Leverage unique approaches to data

Esteban Sosnik, investor in companies like Mainstay, TeachFX and many other data powered educational start ups makes a clear point:

"Simply building an API to GPT-3 or another out-of-the-box model, and adding some UX wrapper around it, is not a strong differentiator or value creator."

Esteban has a few suggestions for unique approaches to data:

"For startups, it’s important to consider how you can access unique, specific data sets. One possibility is to capture it in real time as people use your product, and include a “human in the loop” to ensure that the data is accurate and valuable to your needs. Some companies use a mix of different data models to fine-tune their output.

If the largest part of the value your product offers is derived from openly available data OR models, you're not going to make it. Larger competitors will be able to crunch that data faster, scale those models quicker, and integrate them into existing products and sell them to their existing user base.

You need to find a unique angle, as unique as possible in terms of data and models.

This applies to all kinds of data startups. If you build notebook company number 1000 with "easy SQL cells" bringing "analytics to everyone", well, you're number 1000 on my list of alternatives.

(3) Go fast on data acquisition

Even if you find your unique angle to data, you need to speed up your data acquisition as much as possible.

If you don't get data fast enough, you're not able to build out your product, and won't get funding. It's as simple as that.

Eddie Pease, co-founder of the failed AI startup PharmaForesight describes it in plain terms:

" Raising money for AI startups is hard[...]First, AI startups typically take longer to get off the ground than SaaS startups. AI algorithms rely on data and large data holders are typically big companies. As discussed above, getting any sort of access to data held by large companies is time-consuming. Even when you have access to data, you not only need to focus on business development and your software platform (like in a SaaS startup) but also your AI algorithm." (Eddie Pease, Three crucial lessons for launching an AI startup)

Think that goes just for AI startups? Think again. If your new shiny data infrastructure tool isn't able to connect to lots of data sources and lets its users acquire data in a unique and easy way, then no-one will be able to use your tool.

(4) Build explanation

No human, no AI. If human users won't adopt your solution, you don't need to build it.

In that, it makes sense to focus on explainability first, and only later on the depths of your model itself.

There's a reason notebook start ups are all over the place. Because they allow data analysts to provide more than just shiny dashboards, they allow them to tell a story, to add explainability.

It's the reason why Amazons most powerful tool is "What other bought" and why Netflix added a "because this movie is similar to Moana" title to their recommendations.

This is true across the board, and Eddie Peasereemphasises it for AI startups:

"A good rule of thumb is that the more important each individual prediction, the more important explainability."


(5) Don't wait for a finished product

Yes yes, we all know, you should prototype. But this is not what this lesson is about.

Even if you should go fast on data acquisition, and build explanation first, you still don't have to build anything before starting to sell.

The company Datahut started sales before finishing their product.

"By the time we had finished talking to 20 people, we had a clear idea of what potential customers wanted. Out of sheer luck, one of our connections revealed his business’s big problem and luckily we knew how to help him. This turned out to be our first big opportunity. This happened on the 18th day of starting sales and I still remember waking up my co-founders to share the good news. We closed that deal promising the customer that we will deliver a solution in two months and we did it in half the time, working day and night." (Bootstrapping a Tech Startup)

Actually selling your product is the easiest way to convince investors your idea is worth something.

(6) Take the complexity out of data

Data is and is going to stay complex. You need to take that complexity out of it for your customers.

Datahut did this by building excellent and scalable customer service.

"We always maintained a high customer retention rate by offering customer support whenever they needed it. In the initial days, the founders themselves attended to customer support and later brought in a team. Even after two years, the founders still handle all the major accounts. If you are providing good service, you build a loyal customer base. We have accounts which started at a few hundred dollars a month and grown to really big $$’s over time.

Always exceed the customer’s expectation when it comes to customer support. You get three things in return: customer retention, referrals, and upsells." (Bootstrapping a Tech Startup)

There are lots of other ways of taking out complexity, but great customer support is the easiest solution you can go with. And should.

You can add great docs, videos, easier user interfaces etc. later on.

(7) Own the complete pipe

The startup Kite is one of the big losses of 2022. Kite offered an AI-powered coding helper tool.

While the company itself makes the case that they are 10 years too early to the game and the technology isn't ready yet, the existence and success of GitHub CoPilot could make a different case.

The one big difference between Kite and GitHub CoPilot is not the use of their models.

It is that GitHub owns the whole pipe of what they are trying to do.

GitHub owns Codespaces, GitHub owns the repositories. GitHub already integrates with every single dev tool out there. Kite didn't.

GitHub literally owns the complete pipe of going from "here's data on code" to "here's my better code running in production!". Kite didn't own anything.

That doesn't mean GitHub CoPilot will be successful, but it means, that the benefits of itdirectly translate into better and faster running software for their users.

Kite failed because they tried to build one fat great pieces of technology to solve one completely valueless task "getting code suggestions". Whereas GitHub continually works on "getting great code into a running production system".

(8) Solve systemic problems, not just problems

On top of owning the complete pipe, GitHub also focuses on a system problem, not just one tiny piece of the puzzle.

The data world moves fast and is completely new if you compare it to any body of knowledge out there.

Don't waste your time solving "your own little problems" and bring the solution to market. Think big and solve grande systemic problems.

As Esteban Sosnik puts it

"While there are many exciting use cases and possibilities for AI, successful companies will be those that solve big hairy problems. Claiming to be “AI for X” may undermine the complexity of the problem you are trying to solve. As we’ve seen from the rollercoasters of crypto, metaverse and VR, adoption hinges on whether or not the product delivers better experiences and outcomes. More simply put: How is AI making lives better?"

How do you feel about your company? Are you solving for systems? Building on cutting edge research? There is a reason many start ups go under, but with these lessons under your belt, I hope you have a better chance of building a brighter future for us all.


Written by sbalnojan | Head of Marketing @ Meltano | Data PM | “Data Mesh in Action” | Three Data Point Thursday
Published by HackerNoon on 2023/09/20