CocoIndex now provides native support for Kuzu as a target graph data store. This integration features a high performance knowledge graph stack with real-time updates. CocoIndex What is Kuzu Kuzu is a graph database that is designed to be fast, scalable, and easy to use. We love Kuzu because it is high performant, lightweight, and open source. Kuzu CocoIndex is an ultra performant real-time data transformation framework, with dataflow programming model, CocoIndex simplifies building and maintaining knowledge graphs with continuous source updates. You can read the official CocoIndex Documentation for Property Graph Targets here. here We understand preparing data is highly use-case based and there is no one-size-fits-all solution. We take the composition approach, and instead of building everything, we provide native plugins to embrace the ecosystem and make it easier to plug in and swap any module by standardizing the interface - exactly like LEGO. If you are using CocoIndex to build your knowledge graph, you can use Kuzu as a target graph data store. How to map to Kuzu in CocoIndex The GraphDB interface in CocoIndex is standardized, if you are already using Neo4j, you just need to switch the configuration to export to Kuzu as below. CocoIndex supports exporting to Kuzu through its API server. You can bring up a Kuzu API server locally by running: KUZU_DB_DIR=$HOME/.kuzudb
KUZU_PORT=8123
docker run -d --name kuzu -p ${KUZU_PORT}:8000 -v ${KUZU_DB_DIR}:/database kuzudb/api-server:latest KUZU_DB_DIR=$HOME/.kuzudb
KUZU_PORT=8123
docker run -d --name kuzu -p ${KUZU_PORT}:8000 -v ${KUZU_DB_DIR}:/database kuzudb/api-server:latest In your CocoIndex flow, you need to add the Kuzu connection spec to your flow. kuzu_conn_spec = cocoindex.add_auth_entry(
    "KuzuConnection",
    cocoindex.storages.KuzuConnection(
        api_server_url="http://localhost:8123",
    ),
) kuzu_conn_spec = cocoindex.add_auth_entry(
    "KuzuConnection",
    cocoindex.storages.KuzuConnection(
        api_server_url="http://localhost:8123",
    ),
) What does it look like to build an indexing flow with CocoIndex + Kuzu A CocoIndex knowledge graph example that got the most love is to build knowledge graph with LLM, here is a detailed step-by-step blog. In the project, we process a list of documents,and use LLM to extract relationships between the concepts in each document. step-by-step blog We will generate two kinds of relationships from the documents: Relationships between subjects and objects. E.g., "CocoIndex supports Incremental Processing"
Mentions of entities in a document. E.g., "core/basics.mdx" mentions CocoIndex and Incremental Processing. Relationships between subjects and objects. E.g., "CocoIndex supports Incremental Processing" Mentions of entities in a document. E.g., "core/basics.mdx" mentions CocoIndex and Incremental Processing. The indexing flow looks like this for Kuzu: The code is available here. here Ingest the documents into CocoIndex


Process the documents, for each document:

Map document nodes: Use LLM to generate summary, and map the documents to Graph nodes in Kuzu.
Map relationship nodes: Use LLM to extract relationships, and export the relationships to Kuzu. Ingest the documents into CocoIndex Ingest the documents into CocoIndex Process the documents, for each document:

Map document nodes: Use LLM to generate summary, and map the documents to Graph nodes in Kuzu.
Map relationship nodes: Use LLM to extract relationships, and export the relationships to Kuzu. Process the documents, for each document: Map document nodes: Use LLM to generate summary, and map the documents to Graph nodes in Kuzu.
Map relationship nodes: Use LLM to extract relationships, and export the relationships to Kuzu. Map document nodes: Use LLM to generate summary, and map the documents to Graph nodes in Kuzu. Map relationship nodes: Use LLM to extract relationships, and export the relationships to Kuzu. Notably, it only takes ~200 lines of python to have a production ready knowledge graph; including class definitions, prompts, and configs. To highlight how the relationship extraction works, you will define a python class for structured extraction. @dataclasses.dataclass
class Relationship:
    """
    Describe a relationship between two entities.
    Subject and object should be Core CocoIndex concepts only, should be nouns. For example, `CocoIndex`, `Incremental Processing`, `ETL`,  `Data` etc.
    """

    subject: str
    predicate: str
    object: str @dataclasses.dataclass
class Relationship:
    """
    Describe a relationship between two entities.
    Subject and object should be Core CocoIndex concepts only, should be nouns. For example, `CocoIndex`, `Incremental Processing`, `ETL`,  `Data` etc.
    """

    subject: str
    predicate: str
    object: str If you have a predefined set of ontology, you can skip the entity extraction use existing entities. Call a transformation in the flow to extract the relationships from the document. with data_scope["documents"].row() as doc:
   
    # extract relationships from document
    doc["relationships"] = doc["content"].transform(
        cocoindex.functions.ExtractByLlm(
            llm_spec=cocoindex.LlmSpec(
                # Supported LLM: https://cocoindex.io/docs/ai/llm
                api_type=cocoindex.LlmApiType.OPENAI,
                model="gpt-4o",
            ),
            output_type=list[Relationship],
            instruction=(
                "Please extract relationships from CocoIndex documents. "
                "Focus on concepts and ignore examples and code. "
            ),
        )
    ) with data_scope["documents"].row() as doc:
   
    # extract relationships from document
    doc["relationships"] = doc["content"].transform(
        cocoindex.functions.ExtractByLlm(
            llm_spec=cocoindex.LlmSpec(
                # Supported LLM: https://cocoindex.io/docs/ai/llm
                api_type=cocoindex.LlmApiType.OPENAI,
                model="gpt-4o",
            ),
            output_type=list[Relationship],
            instruction=(
                "Please extract relationships from CocoIndex documents. "
                "Focus on concepts and ignore examples and code. "
            ),
        )
    ) You could use CocoInsight to verify each pair of the relationships. and then collect the relationship use entity_relationship collector. entity_relationship with doc["relationships"].row() as relationship:
    # relationship between two entities
    entity_relationship.collect(
        id=cocoindex.GeneratedField.UUID,
        subject=relationship["subject"],
        object=relationship["object"],
        predicate=relationship["predicate"],
    ) with doc["relationships"].row() as relationship:
    # relationship between two entities
    entity_relationship.collect(
        id=cocoindex.GeneratedField.UUID,
        subject=relationship["subject"],
        object=relationship["object"],
        predicate=relationship["predicate"],
    ) CocoIndex follows a dataflow programming model. Rather than defining data operations like creations, updates or deletions, developers only need to focus on transformations or formulas based on source data. The framework takes care of the data operations such as when to create, update, or delete. Once you have collected the relationships, you can directly map it to Kuzu as below. entity_relationship.export(
        "entity_relationship",
         cocoindex.storages.Kuzu(
            connection=conn_spec,
            mapping=cocoindex.storages.Relationships(
                rel_type="RELATIONSHIP",
                source=cocoindex.storages.NodeFromFields(
                    label="Entity",
                    fields=[
                        cocoindex.storages.TargetFieldMapping(
                            source="subject", target="value"
                        ),
                    ],
                ),
                target=cocoindex.storages.NodeFromFields(
                    label="Entity",
                    fields=[
                        cocoindex.storages.TargetFieldMapping(
                            source="object", target="value"
                        ),
                    ],
                ),
            ),
        ),
        primary_key_fields=["id"],
    ) entity_relationship.export(
        "entity_relationship",
         cocoindex.storages.Kuzu(
            connection=conn_spec,
            mapping=cocoindex.storages.Relationships(
                rel_type="RELATIONSHIP",
                source=cocoindex.storages.NodeFromFields(
                    label="Entity",
                    fields=[
                        cocoindex.storages.TargetFieldMapping(
                            source="subject", target="value"
                        ),
                    ],
                ),
                target=cocoindex.storages.NodeFromFields(
                    label="Entity",
                    fields=[
                        cocoindex.storages.TargetFieldMapping(
                            source="object", target="value"
                        ),
                    ],
                ),
            ),
        ),
        primary_key_fields=["id"],
    ) Amazingly, while working on this Kuzu example, I had a previous flow that I ran locally with Neo4j. It was instant to export to Kuzu. CocoIndex is based on incremental processing, and if you have already run this flow before and just switched targets, the intermediate transformation results can be reused. flow To run the Kuzu Explorer - an open source UI for Kuzu, you need to first bring down the Kuzu API server. Kuzu Explorer And then you can run the following command to start the Kuzu Explorer: KUZU_EXPLORER_PORT=8124
docker run -d --name kuzu-explorer -p ${KUZU_EXPLORER_PORT}:8000  -v ${KUZU_DB_DIR}:/database -e MODE=READ_ONLY  kuzudb/explorer:latest KUZU_EXPLORER_PORT=8124
docker run -d --name kuzu-explorer -p ${KUZU_EXPLORER_PORT}:8000  -v ${KUZU_DB_DIR}:/database -e MODE=READ_ONLY  kuzudb/explorer:latest We could then access the explorer at http://localhost:8124. We could run a Cypher query to explore the graph. http://localhost:8124 MATCH p=()-->() RETURN p MATCH p=()-->() RETURN p We are constantly improving, and more features and examples are coming soon. If this article is helpful, please drop us a star ⭐ at GitHub to help us grow. We are constantly improving, and more features and examples are coming soon. If this article is helpful, please drop us a star ⭐ at GitHub to help us grow. GitHub Thanks for reading!

This story contains new, firsthand information uncovered by the writer.

The code in this story is for educational purposes. The readers are solely responsible for whatever they build with it.

The Graph

This Open Source Tool Could Save Your Data Team Hundreds of Hours

Real-Time S3 Processing Arrives on CocoIndex via AWS SQS Integration

This Real-Time Graph Framework Now Lets You Switch from Neo4j to Kuzu in One Line

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

AI Native Data Pipeline - What Do We Need?

Goldman Sachs, Data Lineage, and Harry Potter Spells

10 Key Skills Every Data Engineer Needs

3 Essential Concepts Data Scientists Should Learn From MLOps Engineers

4 Critical Steps To Build A Large Catalog Of Connectors Remarkably Well

5 Most Important Tips Every Data Analyst Should Know

AI Native Data Pipeline - What Do We Need?

Goldman Sachs, Data Lineage, and Harry Potter Spells

10 Key Skills Every Data Engineer Needs

3 Essential Concepts Data Scientists Should Learn From MLOps Engineers

4 Critical Steps To Build A Large Catalog Of Connectors Remarkably Well

5 Most Important Tips Every Data Analyst Should Know

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps