Do you ever think, “Darn this stupid cloud. Why can’t there be an easier way to build a neural search on it?”
Well, if you have, this article is for you. I’m going to walk through how to use Jina's new Streamlit component to search text or images to build a neural search front end. Want to jump right in? Check out our text search app or image search app, and here's the component's repo.
Jina is an open-source deep learning-powered search framework for building cross-/multi-modal search systems (e.g. text, images, video, audio) on the cloud. Essentially, it lets you build a search engine for any kind of data with any kind of data.
So you could build your own text-to-text search engine ala Google, a text-to-image search engine ala Google Images, a video-to-video search engine, and so on. Companies like Facebook, Google, and Spotify build these searches powered by state-of-the-art AI-powered models like FAISS, DistilBERT, and Annoy.
I was a big fan of Streamlit before I even joined Jina. I used it on a project to create terrible Star Trek scripts that later turned into a front-end for text generation with Transformers. So I'm over the moon to be using this cool framework to build something for our users.
Building a Streamlit component helps the data scientists, machine learning enthusiasts, and all the other developers in the Streamlit community build cool stuff powered by neural search. It offers flexibility and, being written in Python, it can be easier for data scientists to get up to speed.
Out of the box, the streamlit-jina component has text-to-text and image-to-image search, but Jina offers a rich search experience for any kind of data with any kind of data so there's plenty more to add to the component!
Every Jina project includes two Flows:
Indexing: for breaking down and extracting rich meaning from your dataset using neural network models
Querying: for taking a user input and finding matching results
docker run -p 45678:45678 jinahub/app.example.wikipedia-sentences-30k:0.2.9-1.0.1
Example code
Let's look at our text search example since it's easier to see what's going on there:
import streamlit as st
from streamlit_jina import jina
st.set_page_config(page_title="Jina Text Search",)
endpoint = "http://0.0.0.0:45678/api/search"
st.title("Jina Text Search")
st.markdown("You can run our [Wikipedia search example](https://github.com/jina-ai/examples/tree/master/wikipedia-sentences) to test out this search")
jina.text_search(endpoint=endpoint)
As you can see, the above code:
For the Jina Streamlit widgets, you can also pass in other parameters to define the number of results you want back or if you want to hide certain widgets.
The source code for our module is just one file,
__init__.py
. Let's just look at the high-level functionality for our text search example for now:Set configuration variables
headers = {
"Content-Type": "application/json",
}
# Set default endpoint in case user doesn't specify and endpoint
DEFAULT_ENDPOINT = "http://0.0.0.0:45678/api/search"
Render component
class jina:
def text_search(endpoint=DEFAULT_ENDPOINT, top_k=10, hidden=[]):
container = st.beta_container()
with container:
if "endpoint" not in hidden:
endpoint = st.text_input("Endpoint", endpoint)
query = st.text_input("Enter query")
if "top_k" not in hidden:
top_k = st.slider("Results", 1, top_k, int(top_k / 2))
button = st.button("Search")
if button:
matches = text.process.json(query, top_k, endpoint)
st.write(matches)
return container
In short, the
jina.text_search()
method:Our method's parameters are:
jina.text_search()
calls upon several other methods, all of which can find in __init__.py
. For image search there are some additional ones:image.encode.img_base64()
encodes a query image to base64 and wraps it in JSON before passing to Jina APIimage.render.html()
method wraps these in <IMG>
tags so they'll display nicelyIn your terminal:
Create a new folder with a virtual environment and activate it. This will prevent conflicts between your system libraries and your individual project libraries:
mkdir my_project
virtualenv env
source env/bin/activate
Install the Streamlit and Streamlit-Jina packages:
pip install streamlit streamlit-jina
Index your data in Jina and start a query Flow. Alternatively, use a pre-indexed Docker image:
docker run -p 45678:45678 jinahub/app.example.wikipedia-sentences-30k:0.2.9-1.0.1
Create your app.py:
import streamlit as st
from streamlit_jina import jina
st.set_page_config(page_title="Jina Text Search",)
endpoint = "http://0.0.0.0:45678/api/search" # This is Jina's default endpoint. If your Flow uses something different, switch it out
st.title("Jina Text Search")
jina.text_search(endpoint=endpoint)
Run Streamlit:
streamlit run app.py
And there you have it – your very own text search!
For image search, simply swap out the text code above for our image example code and run a Jina image (like our Pokemon example.)
Thanks for reading the article and looking forward to hearing what you think about the component! If you want to learn more about Jina and Streamlit here are some helpful resources:
Major thanks to Randy Zwitch, TC Ricks and Amanda Kelly for their help getting our component live. And thanks to all my colleagues at Jina for building the backend that makes this happen!
Previously published at https://blog.streamlit.io/streamlit-jina-neural-search/