Getting Started with the Weaviate Vector Search Engine

Written by semi-technologies | Published 2020/05/19
Tech Story Tags: weaviate | natural-language-processing | semantic-search | search | search-engine | latest-tech-stories | good-company | adding-data-to-weaviate

TLDR Weaviate is an open-source, GraphQL-based, search graph based on a build-in embedding mechanism. It indexes your data based on the context rather than keywords alone. In this article, we will learn within 10 minutes how to use Weaviates to build your own semantic search engine. The easiest way to get started is by running the Docker compose setup. The first thing you need to do is create a Schema and create a search engine for photos. Add data to the GraphQL interface and query the data using the REST API.via the TL;DR App

Everybody who works with data in any way shape or form knows that one of the most important challenges is searching for the correct answers to your questions. There is a whole set of excellent (open source) search engines available but there is one thing that they can’t do, search and related data based on context.
Weaviate is an open-source, GraphQL-based, search graph based on a build in embedding mechanism.
Before we get started, some further reading while exploring Weaviate.

Getting Started with Weaviate

Let look at the following data object that one might store in a search engine:
{
    "title": "African bush elephant",
    "photoUrl": "https://en.wikipedia.org/wiki/African_bush_elephant"
}
You can retrieve the data object from any search engine by searching for “elephant” or “african”. But what if you want to search for “animal”, “savanna” or “trunk”?
This is the problem the Weaviate search graph solves, because of its build-in natural language model, it indexes your data based on the context rather than keywords alone.
In this article, you will learn within 10 minutes how to use Weaviate to build your own semantic search engine and how this GraphQL query:
{
  Get{
    Things{
      Photo(
        explore: {
          concepts: ["animal with a trunk"]
        }
        limit:1
      ){
        title
        photoUrl
      }
    }
  }
}
Will result in the following response:
{
  "data": {
    "Get": {
      "Things": {
        "Photo": [
          {
            "photoUrl": "https://upload.wikimedia.org/wikipedia/commons/b/bf/African_Elephant_%28Loxodonta_africana%29_male_%2817289351322%29.jpg",
            "title": "African bush elephant"
          }
        ]
      }
    }
  },
  "errors": null
}
If you want to learn more (outside this article), you can watch this FOSDEM video or this interview at Google Cloud’s Slack Chat. All documentation for Weaviate can be found here. You can also sign up for the update newsletter here and follow the development on Github here (and while you are there, don’t forget to become a stargazer 😉🙏)

Running Weaviate

The easiest way to get started with Weaviate is by running the Docker compose setup.
In this demo we will be using the English version (other available languages) of Weaviate which you can run with the following commands:
# Download the Weaviate configuration file
$ curl -O https://raw.githubusercontent.com/semi-technologies/weaviate/0.22.7/docker-compose/runtime/en/config.yaml
# Download the Weaviate docker-compose file
$ curl -O https://raw.githubusercontent.com/semi-technologies/weaviate/0.22.7/docker-compose/runtime/en/docker-compose.yml
# Run Docker Compose
$ docker-compose up
When Weaviate is running, you can simply check if it is up by using the following command:
$ curl http://localhost:8080/v1/meta
We will be creating a mini-search engine for photos by taking the three following steps:
  1. Create a Weaviate schema.
  2. Add data to Weaviate.
  3. Query the data with Weaviate's GraphQL interface.

Create a Weaviate Schema

The first thing you need to do when working with Weaviate is create a schema, Weaviate makes a distinction between “things” and “actions”, in this getting started guide, we will only work with things, but the distinction is often made between nouns (things) and verbs (actions). The schema will later be used when querying and exploring your dataset. As a good rule of thumb, Weaviate uses the RESTful API to add data and the GraphQL API to fetch data.
The schema is in graph format, meaning that you can create (huge) networks (i.e., knowledge graphs) of your data if you so desire, but if you are building a simple search engine, one class with a few properties can already be enough.
You can learn more about creating a schema here. But for now, we will dive in and create a super simple schema for a photo dataset.
In the example below, we are going to use the command line to add a schema, but you can also use the Python library, Postman, or any other way you like to send out HTTP requests.
$ curl \
  --header "Content-Type: application/json" \
  --request POST \
  --data '{
    "class": "Photo",
    "description": "A photo",
    "vectorizeClassName": false,
    "keywords": [],
    "properties": [
        {
            "dataType": [
                "string"
            ],
            "name": "title",
            "description": "Title of the Photo",
            "vectorizePropertyName": false,
            "index": true
        }, {
            "dataType": [
                "string"
            ],
            "name": "photoUrl",
            "description": "URL of the Photo",
            "vectorizePropertyName": false,
            "index": false
        }
    ]
  }' \
  http://localhost:8080/v1/schema/things
You can now examine the class like this:
$ curl http://localhost:8080/v1/schema
# or with jq
$ curl http://localhost:8080/v1/schema | jq .
Let’s examine the JSON object to understand what we just added to Weaviate (learn more in the documentation):
Let’s add another class that represents a user and the photos this user owns.
$ curl \
  --header "Content-Type: application/json" \
  --request POST \
  --data '{
    "class": "User",
    "description": "A user",
    "keywords": [],
    "properties": [
        {
            "dataType": [
                "string"
            ],
            "name": "name",
            "description": "Name of the user"
        }, {
            "dataType": [
                "Photo"
            ],
            "name": "ownsPhotos",
            "description": "Photos this user owns",
            "cardinality": "many"
        }
    ]
  }' \
  http://localhost:8080/v1/schema/things
We now have a super simple graph that looks like this:
Now let's populate Weaviate with some data!

Adding Data

Like creating classes, adding data can be through the RESTful API as well. For advanced users, you can use the batching import functionality or available Python library. But for this example, we are going to add one user and two photos manually.
# Add the elephant
$ curl \
    --header "Content-Type: application/json" \
    --request POST \
    --data '{
        "class": "Photo",
        "schema": {
            "title": "African bush elephant",
            "photoUrl": "https://upload.wikimedia.org/wikipedia/commons/b/bf/African_Elephant_%28Loxodonta_africana%29_male_%2817289351322%29.jpg"
        }
    }' \
    http://localhost:8080/v1/things
Make sure to save the UUID that is returned as a result(!)
# Add Brad Pitt
$ curl \
    --header "Content-Type: application/json" \
    --request POST \
    --data '{
        "class": "Photo",
        "schema": {
            "title": "Brad Pitt at the 2019 premiere of Once Upon a Time in Hollywood",
            "photoUrl": "https://upload.wikimedia.org/wikipedia/commons/4/4c/Brad_Pitt_2019_by_Glenn_Francis.jpg"
        }
    }' \
    http://localhost:8080/v1/things
Make sure to also save the UUID that is returned in this result as well. We will be using them to add the two photos to the user.
# First, add the user
$ curl \
    --header "Content-Type: application/json" \
    --request POST \
    --data '{
        "class": "User",
        "schema": {
            "name": "John Doe"
        }
    }' \
    http://localhost:8080/v1/things
We can now add the photos to the user by setting references (Weaviate uses the term "beacon", learn more about setting graph references in the documentation). Make sure to use the UUID's that relate to the photo's and the user.
$ curl \
    --header "Content-Type: application/json" \
    --request PUT \
    --data '[{
        "beacon": "weaviate://localhost/things/b81e530f-f8db-41b6-910f-0469c8b7884e"
    }, {
        "beacon": "weaviate://localhost/things/127c8bcb-99bf-4c8d-94d4-f67cd2323548"
    }]' \
    http://localhost:8080/v1/things/0b70b628-377b-4b4d-85c8-89b0dacd4209/references/ownsPhotos
You can now validate the added data via:
$ curl http://localhost:8080/v1/things 
# or with jq
$ curl http://localhost:8080/v1/things | jq .

Query Data

Now that we have all data in, we are getting to the juicy part of Weaviate, search. Searching is done with GraphQL. You can learn more about all the possible functions in the documentation or you can get into the nitty-gritty details of the Weaviate GraphQL API by reading this Hackernoon article.
But for now, we are going to keep it simple.
You can use any GraphQL client you like, but to play around with the available queries, you can use the Weaviate Playground. If you go to the Playground, fill in http://localhost:8080/v1/graphql as the location and click “GraphQL Querying” in the right-hand corner.
To find the photo of the elephant you can do the following:
{
  Get{
    Things{
      Photo(
        explore: {
          concepts: ["animal"]
        }
        limit:1
      ){
        title
        photoUrl
      }
    }
  }
}
And to find the photo of Brad Pitt, you can search for:
{
  Get{
    Things{
      Photo(
        explore: {
          concepts: ["actor"]
        }
        limit:1
      ){
        title
        photoUrl
      }
    }
  }
}
The model can even make relation for concepts, this query also finds the photo of Brad Pitt:
{
  Get{
    Things{
      Photo(
        explore: {
          concepts: ["angelina", "jolie"]
        }
        limit:1
      ){
        title
        photoUrl
      }
    }
  }
}
There are many more semantic filters that you can play around with! Check out the documentation for filters here and keep exploring!

More information about Weaviate

If this article piqued your interest, you can find some links below so that you can get started with Weaviate today!
By Bob van Luijt - Co-Founder & CEO at SeMI Technologies

Written by semi-technologies | We maintain the open source Vector Search Engine Weaviate
Published by HackerNoon on 2020/05/19