CocoIndex is officially supporting Qdrant!  This integration combines high performance RUST 🦀 stack with real-time ETL to vector store: CocoIndex Qdrant CocoIndex is an open-source ETL to turn data AI-ready - with real-time incremental processing for performance and low-latency on source updates. https://github.com/cocoindex-io/cocoindex/
Qdrant is the leading open-source vector database designed to handle high-dimensional vectors for performance and massive-scale AI applications. https://github.com/qdrant/qdrant CocoIndex is an open-source ETL to turn data AI-ready - with real-time incremental processing for performance and low-latency on source updates. https://github.com/cocoindex-io/cocoindex/ https://github.com/cocoindex-io/cocoindex/ Qdrant is the leading open-source vector database designed to handle high-dimensional vectors for performance and massive-scale AI applications. https://github.com/qdrant/qdrant https://github.com/qdrant/qdrant It is simple to export exports data to a Qdrant collection. Qdrant The spec takes the following fields: collection_name (type: str, required): The name of the collection to export the data to.
grpc_url (type: str, optional): The gRPC URL of the Qdrant instance. Defaults to http://localhost:6334/.
api_key (type: str, optional). API key to authenticate requests with. collection_name (type: str, required): The name of the collection to export the data to. collection_name str grpc_url (type: str, optional): The gRPC URL of the Qdrant instance. Defaults to http://localhost:6334/. grpc_url str gRPC URL http://localhost:6334/ api_key (type: str, optional). API key to authenticate requests with. api_key str Before exporting, you must create a collection with a vector name that matches the vector field name in CocoIndex, and set setup_by_user=True during export. vector name setup_by_user=True doc_embeddings.export(
    "doc_embeddings",
    cocoindex.storages.Qdrant(
        collection_name="cocoindex",
        grpc_url="https://xyz-example.cloud-region.cloud-provider.cloud.qdrant.io:6334/",
        api_key="<your-api-key-here>",
    ),
    primary_key_fields=["id_field"],
    setup_by_user=True,
) doc_embeddings.export(
    "doc_embeddings",
    cocoindex.storages.Qdrant(
        collection_name="cocoindex",
        grpc_url="https://xyz-example.cloud-region.cloud-provider.cloud.qdrant.io:6334/",
        api_key="<your-api-key-here>",
    ),
    primary_key_fields=["id_field"],
    setup_by_user=True,
) 🚀 Getting started (with example code!) with less than 50 lines of python!: https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant https://github.com/cocoindex-io/cocoindex/tree/main/examples/text_embedding_qdrant @cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(
    flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
):
    """
    Define an example flow that embeds text into a vector database.
    """
    data_scope["documents"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(path="markdown_files")
    )

    doc_embeddings = data_scope.add_collector()

    with data_scope["documents"].row() as doc:
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown",
            chunk_size=2000,
            chunk_overlap=500,
        )

        with doc["chunks"].row() as chunk:
            chunk["embedding"] = text_to_embedding(chunk["text"])
            doc_embeddings.collect(
                id=cocoindex.GeneratedField.UUID,
                filename=doc["filename"],
                location=chunk["location"],
                text=chunk["text"],
                # 'text_embedding' is the name of the vector we've created the Qdrant collection with.
                text_embedding=chunk["embedding"],
            )

    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Qdrant(
            collection_name="cocoindex", grpc_url="http://localhost:6334/"
        ),
        primary_key_fields=["id"],
        setup_by_user=True,
    ) @cocoindex.flow_def(name="TextEmbedding")
def text_embedding_flow(
    flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope
):
    """
    Define an example flow that embeds text into a vector database.
    """
    data_scope["documents"] = flow_builder.add_source(
        cocoindex.sources.LocalFile(path="markdown_files")
    )

    doc_embeddings = data_scope.add_collector()

    with data_scope["documents"].row() as doc:
        doc["chunks"] = doc["content"].transform(
            cocoindex.functions.SplitRecursively(),
            language="markdown",
            chunk_size=2000,
            chunk_overlap=500,
        )

        with doc["chunks"].row() as chunk:
            chunk["embedding"] = text_to_embedding(chunk["text"])
            doc_embeddings.collect(
                id=cocoindex.GeneratedField.UUID,
                filename=doc["filename"],
                location=chunk["location"],
                text=chunk["text"],
                # 'text_embedding' is the name of the vector we've created the Qdrant collection with.
                text_embedding=chunk["embedding"],
            )

    doc_embeddings.export(
        "doc_embeddings",
        cocoindex.storages.Qdrant(
            collection_name="cocoindex", grpc_url="http://localhost:6334/"
        ),
        primary_key_fields=["id"],
        setup_by_user=True,
    ) We are constantly improving and adding new examples and blogs. Please drop a star at our github repo https://github.com/cocoindex-io/cocoindex for the latest updates! https://github.com/cocoindex-io/cocoindex

This story contains new, firsthand information uncovered by the writer.

The code in this story is for educational purposes. The readers are solely responsible for whatever they build with it.

Build Your Own Semantic Search Engine in Under 50 Lines—No Joke

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

AI Native Data Pipeline - What Do We Need?

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

AI Native Data Pipeline - What Do We Need?

The Noonification: How Often Do NFTs Pass The Howey Test? (1/13/2023)

Darwin's Hybrid Intelligence to Align AI & Human Goals for Startups & VCs

The Noonification: White Man (11/26/2022)

The Noonification: The Metaverse is a Sh*tshow (11/2/2022)

100 Days of AI Day 1: From Newsletter to Podcast, Leveraging AI for Audio Transformation

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps