paint-brush
Jina, a Deep Learning-Powered Search Framework, Can Help You Build Your Neural Searchby@alexcg
257 reads

Jina, a Deep Learning-Powered Search Framework, Can Help You Build Your Neural Search

by Alex C-GMay 2nd, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Jina is an open-source deep learning-powered search framework for building cross-/multi-modal search systems on the cloud. It lets you build a search engine for any data with any kind of data. Jina's Streamlit component is a front-end for end-users, so it doesn't worry about the indexing part. The component parses out the useful information (e.g. text or image matches) and displays them to the user. It offers flexibility and, being written in Python, it can be easier for data scientists to get up to speed.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - Jina, a Deep Learning-Powered Search Framework, Can Help You Build Your Neural Search
Alex C-G HackerNoon profile picture

Do you ever think, “Darn this stupid cloud. Why can’t there be an easier way to build a neural search on it?”

Well, if you have, this article is for you. I’m going to walk through how to use Jina's new Streamlit component to search text or images to build a neural search front end. Want to jump right in? Check out our text search app or image search app, and here's the component's repo.

Why use Jina to build a neural search?

Jina is an open-source deep learning-powered search framework for building cross-/multi-modal search systems (e.g. text, images, video, audio) on the cloud. Essentially, it lets you build a search engine for any kind of data with any kind of data.

So you could build your own text-to-text search engine ala Google, a text-to-image search engine ala Google Images, a video-to-video search engine, and so on. Companies like Facebook, Google, and Spotify build these searches powered by state-of-the-art AI-powered models like FAISS, DistilBERT, and Annoy.

Why use Streamlit with Jina?

I was a big fan of Streamlit before I even joined Jina. I used it on a project to create terrible Star Trek scripts that later turned into a front-end for text generation with Transformers. So I'm over the moon to be using this cool framework to build something for our users.

Building a Streamlit component helps the data scientists, machine learning enthusiasts, and all the other developers in the Streamlit community build cool stuff powered by neural search. It offers flexibility and, being written in Python, it can be easier for data scientists to get up to speed.

Out of the box, the streamlit-jina component has text-to-text and image-to-image search, but Jina offers a rich search experience for any kind of data with any kind of data so there's plenty more to add to the component!

How does it work?

Every Jina project includes two Flows:

Indexing: for breaking down and extracting rich meaning from your dataset using neural network models

Querying: for taking a user input and finding matching results

  1. Our Streamlit component is a front end for end-users, so it doesn't worry about the indexing part.
  2. Admin spins up a Jina Docker image:
    docker run -p 45678:45678 jinahub/app.example.wikipedia-sentences-30k:0.2.9-1.0.1
  3. User enters a query into the Streamlit component (currently either a text input or an image upload) and hits 'search'
  4. The input query is wrapped in JSON and sent to Jina's query API
  5. The query Flow does its thing and returns results in JSON format (along with lots of metadata)
  6. The component parses out the useful information (e.g. text or image matches) and displays them to the user

Example code

Let's look at our text search example since it's easier to see what's going on there:

import streamlit as st
from streamlit_jina import jina
st.set_page_config(page_title="Jina Text Search",)

endpoint = "http://0.0.0.0:45678/api/search"

st.title("Jina Text Search")
st.markdown("You can run our [Wikipedia search example](https://github.com/jina-ai/examples/tree/master/wikipedia-sentences) to test out this search")

jina.text_search(endpoint=endpoint)

As you can see, the above code:

  • Imports streamlit and streamlit_jina
  • Sets the REST endpoint for the search
  • Sets the page titleDisplays some explanatory text
  • Displays the Jina text search widget with endpoint defined

For the Jina Streamlit widgets, you can also pass in other parameters to define the number of results you want back or if you want to hide certain widgets.

Behind the scenes

The source code for our module is just one file,

__init__.py
. Let's just look at the high-level functionality for our text search example for now:

Set configuration variables

headers = {
    "Content-Type": "application/json",
}

# Set default endpoint in case user doesn't specify and endpoint
DEFAULT_ENDPOINT = "http://0.0.0.0:45678/api/search"

Render component

class jina:
    def text_search(endpoint=DEFAULT_ENDPOINT, top_k=10, hidden=[]):
        container = st.beta_container()
        with container:
            if "endpoint" not in hidden:
                endpoint = st.text_input("Endpoint", endpoint)

            query = st.text_input("Enter query")

            if "top_k" not in hidden:
                top_k = st.slider("Results", 1, top_k, int(top_k / 2))

            button = st.button("Search")

            if button:
                matches = text.process.json(query, top_k, endpoint)
                st.write(matches)

        return container

In short, the

jina.text_search()
method:

  • Creates a Streamlit container to hold everything, with sane defaults if not specified
  • If widgets aren't set to hidden, present them to user
  • [User types query]
  • [User clicks button]
  • Sends query to Jina API and returns results
  • Displays results in the component

Our method's parameters are:

jina.text_search()
calls upon several other methods, all of which can find in
__init__.py
. For image search there are some additional ones:

  • image.encode.img_base64()
    encodes a query image to base64 and wraps it in JSON before passing to Jina API
  • Jina's API returns matches in base64 format. The
    image.render.html()
    method wraps these in
    <IMG>
    tags so they'll display nicely

Use it in your project

In your terminal:

Create a new folder with a virtual environment and activate it. This will prevent conflicts between your system libraries and your individual project libraries:

mkdir my_project
virtualenv env
source env/bin/activate

Install the Streamlit and Streamlit-Jina packages:

pip install streamlit streamlit-jina

Index your data in Jina and start a query Flow. Alternatively, use a pre-indexed Docker image:

docker run -p 45678:45678 jinahub/app.example.wikipedia-sentences-30k:0.2.9-1.0.1

Create your app.py:

import streamlit as st
from streamlit_jina import jina
st.set_page_config(page_title="Jina Text Search",)

endpoint = "http://0.0.0.0:45678/api/search" # This is Jina's default endpoint. If your Flow uses something different, switch it out

st.title("Jina Text Search")

jina.text_search(endpoint=endpoint)

Run Streamlit:

streamlit run app.py

And there you have it – your very own text search!

For image search, simply swap out the text code above for our image example code and run a Jina image (like our Pokemon example.)

What to do next

Thanks for reading the article and looking forward to hearing what you think about the component! If you want to learn more about Jina and Streamlit here are some helpful resources:

A big thank you!

Major thanks to Randy Zwitch, TC Ricks and Amanda Kelly for their help getting our component live. And thanks to all my colleagues at Jina for building the backend that makes this happen!

Previously published at https://blog.streamlit.io/streamlit-jina-neural-search/