SiaSearch is a Berlin-based AI startup on a mission to accelerate computer vision application development. In this Slack AMA with the SiaSearch team, we discuss computer vision technology, the future of autonomous vehicles, and the SiaSearch data management platform. This discussion occurred in 's official channel and has been edited for readability. Slogging #amas : SiaSearch will be holding a free session at NVIDIA #GTC21 on April 13. For more details, check out: Note https://www.siasearch.io/blog/nvidia-gtc21 Hey everyone! My name is Mark Pfeiffer and I’m the Co-Founder and CTO of SiaSearch ( ), a Berlin-based AI startup that provides a data management tool for engineers working on self driving cars and other computer vision applications. Our Head of Product Armaghan Khan and I will be hosting an AMA later today, April 8 at 12pm MT. We’re eager to hear any questions you might have about: 🖼 Computer vision model development 🤖 Autonomous vehicles 💾 Data bottlenecks in machine learning 💾Data selection & curation Looking forward to an interactive session! Mark Pfeiffer Apr 8, 2021, 7:02 AM https://www.siasearch.io/ Hey Mark! Thanks to you and the team for doing this AMA on Slogging. To start us off, why is a separate tool built just for data management purposes necessary? Also, is this tool targeted just at engineers working on computer vision applications or is that just the largest side of the market? Limarc Ambalina Apr 8, 2021, 3:16 PM First of all, thanks for having us! Really looking forward to this session and we’ll try to cover the questions as fast as possible! 🙂 Mark Pfeiffer Apr 8, 2021, 6:02 PM That’s a very important question. Of course, there’s a lot of ML tooling around. Many tools however focus on model training, versioning, monitoring, deployment or annotation. However we’ve seen that in most real-world applications the data selection makes up for a large amount of the actual model performance rather than small model adjustments or large amounts of hyperparameter tuning. Therefore with SiaSearch we wanted to provide a tool which makes it as easy as possible for the users to select the right data for the right applications and build better models in less time. Currently the tool is quite tailored to the computer vision use case. A lot of other ML applications deal with more structured data which is also challenging but easier to handle and select. For computer vision applications the data selection is particularly hard as the content of images is hard to access and therefore a lot of manual screening is required to select the right data. Currently we see the largest value add of SiaSearch in this application and focus on that with our team! Mark Pfeiffer Apr 8, 2021, 6:10 PM Hello Mark 👋 One thing I've read about with ML/AI, is that it is difficult to trace back the why a model came up with its final verdict or weights.  Like that it can be hard to debug the layers/network to know why, as an example, the image of the speed limit sign with tape over a part of it made the model determine the number it saw.  I imagine users want to know which parameters to tweak (without too much trial and error) that'll nudge the probabilities in the right direction. Is this something your software addresses or is this an ongoing challenge in the field? richard-kubina Apr 8, 2021, 3:58 PM Thanks for the question Richard! In general in ML we need to get 2 elements right: (1) The model and (2) the data. With SiaSearch we heavily focus on the latter one. We still focus on model performance though. However, instead of analyzing which model elements to tune we try to make it easy for the user to select the right data. With SiaSearch you can easily figure out under which conditions the model still has problems. With these insights you can then adapt training datasets in order to improve overall model performance. Also adding an example regarding your question: You might realize that your model has problems to detect traffic lights under sunny conditions while it works well in the dark or rain. This is an interesting insight and tells you that you should probably get more data of sunny intersections with traffic lights annotated. So as a summary, improving model performance is a core element of SiaSearch, however we rather look at it from an I/O perspective rather than raw model weights. Mark Pfeiffer Apr 8, 2021, 6:22 PM Hey Mark! Wow cool stuff - thanks for doing this AMA! 👏 I'm curious about how you got to that product market fit for your software. Which came first - the data management tool or the drive to help power the development of autonomous vehicles? Curious to hear a little more about SiaSearch's origin story. Natasha Nel Apr 8, 2021, 4:20 PM Great question! Mark and I both worked in the domain of self-driving and faced this data management challenge first hand. Mark while at the self-driving lab at Berkeley and later during his PhD, I was working in consulting projects with big German automotive companies. The goal was definitely to get these systems to work better, but it turned out to be very complex and manual to work with the raw data, which became a big bottleneck to improve models. For data driven development, of which computer vision and self-driving are a subset, there are just very few tools so far that make the work of developers simple and easy, we wanted to change that. We envision a future where building data driven products is as easy as building software today. And I believe the industry is currently taking on a similar perspective. During a recent conference, Andrew Ng urged developers and companies to take on a more data centric approach to ML ( )  One of the big challenges to do so is better tooling, and that is our mission! 🛠 🛠 🛠 Clemens Viernickel Apr 8, 2021, 6:14 PM https://scale.com/events/transform/videos/big-data-to-good-data?validation=big-data-to-good-data Hello Mark! Thanks for the AMA. Autonomous driving tool is a really helpful one but can you give a little more insight as to how it exactly works and whats the basic principle behind it. radhikaa kapoor Apr 8, 2021, 4:33 PM Thanks for the question, Radhikaa. Applications like autonomous driving (e.g. robotics, aerial imagery) produce tons of raw data. (Fun fact: an autonomous vehicle can produce up to 15TBs of data per hour). Here’s how SiaSearch helps manage this raw data: 1. Intelligent algorithms are applied to extract useful information e.g. whether the car was making a turn, what was the weather like, how many people were in view 2. This information (which we call metadata) is populated into a proprietary database which allows super fast queries on PB scale data 3. To make it super easy for the user an SDK and GUI interface is provided, where they can easily search, select and visualize data as needed You can dive into more depth and can also experience the product for yourself here: Armaghan Khan Apr 8, 2021, 6:18 PM https://www.siasearch.io/product https://public.sia-search.com/ Your application within retail is quite a new concept for me, but I find it very interesting. Would love to know more about how you are able to improve consumer experiences. Katarina Apr 8, 2021, 5:39 PM Hi Katarina! Retail is indeed a super interesting use case. While SiaSearch isn’t directly used in a consumer facing role, it does empower the emerging self-checkout technology (similar to amazon go). The most popular approach to self checkout technologies involves the use of multiple cameras. Using the video feeds the self-checkout software stack recognizes inventory, buyers and can associate the two. Naturally these algorithms need data to be trained, which is where SiaSearch comes in. Using our product a developer can easily get a subset of situations e.g. a buyer fetching a yoghurt pack from the refrigerator. They can use this subset to train the right model and improve their performance quicker. Armaghan Khan Apr 8, 2021, 6:25 PM Armaghan Khan you said "Intelligent algorithms are applied to extract useful information e.g. whether the car was making a turn, what was the weather like, how many people were in view" That's super interesting. So in a way, SiaSearch, can provide some of the initial annotation itself without the need for human annotators? If so, I'd see that as a huge value-add. Have you been marketing it as both an automatic annotation platform + data management platform? Limarc Ambalina Apr 8, 2021, 6:27 PM Great question! Yes the algorithms can indeed be used to auto-annotate data but we don’t see this as a replacement of high quality, low error human annotations. The auto-tagging, as we call it, is a step before the human annotation which helps to make the job of the annotator faster and simpler. Armaghan Khan Apr 8, 2021, 6:34 PM For example, you get 100 hours of video recording from a car and you are interested in left turns. There are two ways to go about it: 1. Without SiaSearch: send all data for human annotation i.e. time and cost intensive 2. With SiaSearch: extract the left turns and only get those portion annotated i.e. faster and cheaper Armaghan Khan Apr 8, 2021, 6:35 PM Great observation though. There are lot’s of synergies with data annotation, which is why we’ll soon add this to our offering as well! Clemens Viernickel Apr 8, 2021, 6:36 PM Ah I get it. So the tool itself provides more of an automated filtering mechanism (which is incredibly useful of course), meaning if you want to annotate stop signs, the tool can return all of the video frames that have a stop sign in it, but we still need a human annotator to actually draw the bounding box around the stop sign. Am I sort of understanding that correctly? Limarc Ambalina Apr 8, 2021, 6:39 PM Precisely! Clemens Viernickel Apr 8, 2021, 6:40 PM you can think of this as a cycle: 1 train model, 2 identify model failures, 3 find better data to improve failures, 4 annotate, back to train model. SiaSearch helps with 2 and 3, which we sometimes summarize as training data management Clemens Viernickel Apr 8, 2021, 6:42 PM Sorry I don't mean to hijack this AMA and ask all the questions, but I did content writing for a year and a half in the machine learning/training data space so a lot of this is coming back to me and reigniting my interest. So let's say we do steps 1 - 3, for step 4, does SiaSearch have a built-in data annotation tool or does the engineer need to then import that data into a separate tool for annotating? If not, is that a feature you're looking to add in the future or have you purposefully stayed away from that feature as not to compete with the already existing tools? Limarc Ambalina Apr 8, 2021, 6:44 PM Haha, great to go deeper there! So far, we just easily connect to many common annotation companies via API. This makes it still easy for the developer to get from the data they collected in SiaSearch to trigger annotation. However, step 4 is definitely a feature we’re looking to add going forward! Clemens Viernickel Apr 8, 2021, 6:47 PM So going in a more speculative direction...SiaSearch can automate the filtering of data. We also have some early-stage tools that can automate some data annotation tasks. But as you said before, we still can't beat the low error rate of human annotation. Since your company has worked to solve the data filtering problem, how long do you think it'll be before we are able to solve the data annotation problem? When do you think we'll have algorithms that can annotate data as well as humans can? Now that the training data industry has become quite huge, with millions around the world contributing to data annotation projects, I imagine the answer to that question could change the entire industry. Limarc Ambalina Apr 8, 2021, 6:55 PM That’s kind of the million dollar question 🙂 Clemens Viernickel Apr 8, 2021, 7:01 PM That’s an interesting question. Of course it would be ideal to automate the whole process, but if we’d already have models which can annotate, then the major part of finding such models would be done already, right? So I think there will always be some human labor required. Of course we can use models which have no real-time requirements for annotation, but ultimately a human will be more precise. So we really have to focus on building the right tooling in order to use human labor as efficiently as possible. Mark Pfeiffer Apr 8, 2021, 7:01 PM When will autonomous vehicles become reality in India? Afifa Apr 8, 2021, 7:01 PM Afifa Isn’t the answer to this question always “Next year”? 😉 Mark Pfeiffer Apr 8, 2021, 7:02 PM I think the past couple of years have taught us to be careful with estimates in that domain, but we’re working hard to make it happen as soon possible! Let us know if you’re working on a self-driving project in India, we might be able to help! Clemens Viernickel Apr 8, 2021, 7:02 PM Mark Pfeiffer I'm excited about it. Hope for the best. Afifa Apr 8, 2021, 7:06 PM Thanks everyone for all the great questions! If you have any more coming up, don’t hesitate to reach out to us either here or contact me under ! Also, if you wanna try out SiaSearch you can sign up for our research version . Mark Pfeiffer Apr 8, 2021, 7:06 PM mailto:mark@siasearch.io https://www.siasearch.io/open-data Thanks Mark Pfeiffer Clemens Viernickel and Armaghan Khan for joining us here today! We wish you the best of luck throughout the rest of 2021. Limarc Ambalina Apr 8, 2021, 7:07 PM

Amazon

NVIDIA

Slack

Super

Trace

Hacker Noon Partners with Den

Hacker Noon Partners with Digg to Improve Story Distribution

Check Out ISNation

Too Long; Didn't Read

How Data Selection Impacts Model Performance: An AMA with SiaSearch

How Data Selection Impacts Model Performance: An AMA with SiaSearch

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 AI and ML Apps, Games, and Tools for Android Phones

AMA With Musician Turned Growth Marketer, Gaetano DiNardi

Ask Me How To Lose Your Company Without Losing Your Faith

Bringing Augmented Reality to the Tourism Industry with Social Bee

Building and Managing IPs with Jeff McKinnon

Building Blockchain's Entrepreneurship Mindset | Slogging AMA with Lisk Founder Max Kordek

10 AI and ML Apps, Games, and Tools for Android Phones

AMA With Musician Turned Growth Marketer, Gaetano DiNardi

Ask Me How To Lose Your Company Without Losing Your Faith

Bringing Augmented Reality to the Tourism Industry with Social Bee

Building and Managing IPs with Jeff McKinnon

Building Blockchain's Entrepreneurship Mindset | Slogging AMA with Lisk Founder Max Kordek

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps