. Let’s take, for instance, the following list of strings: Context matters If we assume all items in the list above have the same semantic value, what is it exactly? The obvious answer to this question is “geographical place”, right? Let’s look at it in a wider context. Now we know that we were looking at the list of surnames, not cities. We had no way of telling the difference without extending the context. In this particular case, larger sample size would barely help, as nearly any city name can be a surname for someone and a lot of cities are named after some people. In normal situations, our brain doesn’t even notice how tricky this kind of distinction is, because, as humans, we rarely operate without a rich context. NLP techniques have some of the required context for token classification from the text which surrounds any particular token and use word embeddings to make a “best guess” of what this word might mean. When we analyze columns of data (let’s say, from a CSV file), we don’t have any sentence to derive a context from. Instead, we have other columns and, more often than not, other files. datuum.ai technology actually utilizes all the context which is available in a data source and takes a full set of columns, their order, full set of files/tables, etc. into account in order to determine a semantic type of the data. It took a lot of time and countless iterations to come up with neural network architecture and feature set which would allow us to achieve what we did achieve. There are still a lot of improvements ahead. And this is just the tip of the iceberg, one rather simple problem of many we are solving to make our platform work. Thanks to Dmytro Zhuk, Founder & CTO at @datuum for the story! Previously published here.

Date and Time Values are a Mess - Here's Why

Context Matters in Semantic Ambiguity

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Untitled Story

Can Big Data Solutions Be More Accessible And Affordable?

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

Can Big Data Solutions Be More Accessible And Affordable?

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

Light-Mode

Classic

Newspaper

Dark-Mode

Neon Noir

Minty

HN StartUps