Most companies sit on an ocean of meeting notes, and treat them like static text files. But inside those documents are decisions, tasks, owners, and relationships ā basically an untapped knowledge graph that is constantly changing. We just published a full walkthrough showing how to turn meeting notes in Drive into a š„š¢šÆš-š®š©šššš¢š§š Neo4j š¤š§šØš°š„ššš š š š«šš©š” using CocoIndex + ššš šš±šš«šššš¢šØš§. CocoIndex If this resonates, star https://github.com/cocoindex-io/cocoindex Meeting notes are goldmines of organizational intelligence. They capture decisions, action items, participant information, and the relationships between people and tasks. Yet most organizations treat them as static documentsāsearchable only through basic text search. Imagine instead being able to query your meetings like a database:Ā "Who attended meetings where the topic was 'budget planning'?"Ā orĀ "What tasks did Sarah get assigned across all meetings?" "Who attended meetings where the topic was 'budget planning'?" "What tasks did Sarah get assigned across all meetings?" This is where knowledge graphs shine. By extracting structured information from unstructured meeting notes and building a graph representation, you unlock powerful relationship-based queries and insights that would be impossible with traditional document storage. In this post, we'll explore a practical CocoIndex example that demonstrates exactly thisābuilding a knowledge graph of meetings from Markdown documents stored in Google Drive, powered by LLM-based extraction, and persisted in Neo4j. CocoIndex The source code open sourced and available atĀ Meeting Notes Graph Code. Meeting Notes Graph Code The Problem: Unstructured Meeting Data at Enterprise Scale Even for a conservative estimate,Ā 80% of enterprise dataĀ resides in unstructured files, stored in data lakes that accommodate heterogeneous formats. Organizations holdĀ 62-80 millionĀ meetings per day in the US. 80% of enterprise data 62-80 million At enterprise scale, document processing complexity explodes beyond the simple "write once, read many" model. Meeting note management reality involves three brutal challenges: Scale problemĀ - Enterprise meeting corpora span tens of thousands to millions of documents distributed across departments, teams, and fiscal years. Processing this volume naively means re-running expensive LLM calls and graph mutations on unchanged data. Scale problem Mutation problemĀ - Meeting notes are mutable state, not immutable logs. Attendees fix typos in names, tasks get reassigned between people, decisions get revised as context changes. A single 10-person team can generate dozens of document edits weekly. Mutation problem Fragmentation problemĀ - Meeting data doesn't live in one canonical source. It's scattered across Google Drive, Notion, SharePoint, Confluence, email threads, and Slack canvases. Building a unified graph requires incremental sync from heterogeneous sources without full re-ingestion. Fragmentation problem The math is punishing: in a 5,000-person enterprise with conservative 1% daily document churn, you're looking at 50+ documents changing per day, 1,500+ per month. Without incremental processing, you either: Burn LLM budget re-extracting unchanged content (unsustainable cost) Accept stale graphs that lag reality by days or weeks (unacceptable for operational use) Burn LLM budget re-extracting unchanged content (unsustainable cost) Burn LLM budget re-extracting unchanged content (unsustainable cost) Accept stale graphs that lag reality by days or weeks (unacceptable for operational use) Accept stale graphs that lag reality by days or weeks (unacceptable for operational use) Architecture Overview The pipeline follows a clear data flow with incremental processing built in at every stage: Google Drive (Documents - with change tracking) ā Identify changed documents ā Split into meetings ā Extract structured data with LLM (only for changed documents) ā Collect nodes and relationships ā Export to Neo4j (with upsert logic) Google Drive (Documents - with change tracking) ā Identify changed documents ā Split into meetings ā Extract structured data with LLM (only for changed documents) ā Collect nodes and relationships ā Export to Neo4j (with upsert logic) Prerequisites Prerequisites InstallĀ Neo4jĀ and start it locally Default local browser:Ā http://localhost:7474 Default credentials used in this example: usernameĀ neo4j, passwordĀ cocoindex Configure your OpenAI API key Prepare Google Drive: Create a Google Cloud service account and download its JSON credential Share the source folders with the service account email Collect the root folder IDs you want to ingest SeeĀ Setup for Google DriveĀ for details InstallĀ Neo4jĀ and start it locally Default local browser:Ā http://localhost:7474 Default credentials used in this example: usernameĀ neo4j, passwordĀ cocoindex Neo4j Default local browser:Ā http://localhost:7474 Default credentials used in this example: usernameĀ neo4j, passwordĀ cocoindex Default local browser:Ā http://localhost:7474 http://localhost:7474 Default credentials used in this example: usernameĀ neo4j, passwordĀ cocoindex neo4j cocoindex Configure your OpenAI API key Configure your OpenAI API key Prepare Google Drive: Create a Google Cloud service account and download its JSON credential Share the source folders with the service account email Collect the root folder IDs you want to ingest SeeĀ Setup for Google DriveĀ for details Create a Google Cloud service account and download its JSON credential Share the source folders with the service account email Collect the root folder IDs you want to ingest SeeĀ Setup for Google DriveĀ for details Create a Google Cloud service account and download its JSON credential Share the source folders with the service account email Collect the root folder IDs you want to ingest SeeĀ Setup for Google DriveĀ for details Setup for Google Drive Environment Environment Set the following environment variables: export OPENAI_API_KEY=sk-... export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2 export OPENAI_API_KEY=sk-... export GOOGLE_SERVICE_ACCOUNT_CREDENTIAL=/absolute/path/to/service_account.json export GOOGLE_DRIVE_ROOT_FOLDER_IDS=folderId1,folderId2 Notes: GOOGLE_DRIVE_ROOT_FOLDER_IDSĀ accepts a comma-separated list of folder IDs The flow polls recent changes and refreshes periodically GOOGLE_DRIVE_ROOT_FOLDER_IDSĀ accepts a comma-separated list of folder IDs GOOGLE_DRIVE_ROOT_FOLDER_IDS The flow polls recent changes and refreshes periodically Let's break down each component: Flow Definition Overview Add source and collector @cocoindex.flow_def(name="MeetingNotesGraph") def meeting_notes_graph_flow( flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope ) -> None: """ Define an example flow that extracts triples from files and builds knowledge graph. """ credential_path = os.environ["GOOGLE_SERVICE_ACCOUNT_CREDENTIAL"] root_folder_ids = os.environ["GOOGLE_DRIVE_ROOT_FOLDER_IDS"].split(",") data_scope["documents"] = flow_builder.add_source( cocoindex.sources.GoogleDrive( service_account_credential_path=credential_path, root_folder_ids=root_folder_ids, recent_changes_poll_interval=datetime.timedelta(seconds=10), ), refresh_interval=datetime.timedelta(minutes=1), ) @cocoindex.flow_def(name="MeetingNotesGraph") def meeting_notes_graph_flow( flow_builder: cocoindex.FlowBuilder, data_scope: cocoindex.DataScope ) -> None: """ Define an example flow that extracts triples from files and builds knowledge graph. """ credential_path = os.environ["GOOGLE_SERVICE_ACCOUNT_CREDENTIAL"] root_folder_ids = os.environ["GOOGLE_DRIVE_ROOT_FOLDER_IDS"].split(",") data_scope["documents"] = flow_builder.add_source( cocoindex.sources.GoogleDrive( service_account_credential_path=credential_path, root_folder_ids=root_folder_ids, recent_changes_poll_interval=datetime.timedelta(seconds=10), ), refresh_interval=datetime.timedelta(minutes=1), ) The pipeline starts by connecting to Google Drive using a service account. CocoIndex's built-in source connector handles authentication and providesĀ incremental change detection. TheĀ recent_changes_poll_intervalĀ parameter means the source checks for new or modified files every 10 seconds, while theĀ refresh_intervalĀ determines when the entire flow re-runs (every minute). CocoIndex's incremental change detection recent_changes_poll_interval refresh_interval This is one of CocoIndex's superpowers:Ā incremental processing with automatic change tracking. Instead of reprocessing all documents on every run, the framework: CocoIndex's incremental processing with automatic change tracking Lists files from Google Drive with last modified time Identifies only the files that have been added or modified since the last successful run Skips unchanged files entirely Passes only changed documents downstream Lists files from Google Drive with last modified time Identifies only the files that have been added or modified since the last successful run Skips unchanged files entirely Passes only changed documents downstream The result? In an enterprise with 1% daily churn, only 1% of documents trigger downstream processing. Unchanged files never hit your LLM API, never generate Neo4j queries, and never consume compute resources. Add collector meeting_nodes = data_scope.add_collector() attended_rels = data_scope.add_collector() decided_tasks_rels = data_scope.add_collector() assigned_rels = data_scope.add_collector() meeting_nodes = data_scope.add_collector() attended_rels = data_scope.add_collector() decided_tasks_rels = data_scope.add_collector() assigned_rels = data_scope.add_collector() The pipeline then collects data into specialized collectors for different entity types and relationships: Meeting NodesĀ - Store the meeting itself with its date and notes Attendance RelationshipsĀ - Capture who attended meetings and whether they were the organizer Task Decision RelationshipsĀ - Link meetings to decisions (tasks that were decided upon) Task Assignment RelationshipsĀ - Assign specific tasks to people Meeting NodesĀ - Store the meeting itself with its date and notes Meeting Nodes Attendance RelationshipsĀ - Capture who attended meetings and whether they were the organizer Attendance Relationships Task Decision RelationshipsĀ - Link meetings to decisions (tasks that were decided upon) Task Decision Relationships Task Assignment RelationshipsĀ - Assign specific tasks to people Task Assignment Relationships Process each document Extract meetings with data_scope["documents"].row() as document: document["meetings"] = document["content"].transform( cocoindex.functions.SplitBySeparators( separators_regex=[r"\n\n##?\ "], keep_separator="RIGHT" ) ) with data_scope["documents"].row() as document: document["meetings"] = document["content"].transform( cocoindex.functions.SplitBySeparators( separators_regex=[r"\n\n##?\ "], keep_separator="RIGHT" ) ) Meeting documents often contain multiple meetings in a single file. This step splits documents on Markdown headers (## or #) preceded by blank lines, treating each section as a separate meeting. TheĀ keep_separator="RIGHT"Ā means the separator (header) is kept with the right segment, preserving context. keep_separator="RIGHT" Extract meeting Define Meeting schema @dataclass class Person: name: str @dataclass class Task: description: str assigned_to: list[Person] @dataclass class Meeting: time: datetime.date note: str organizer: Person participants: list[Person] tasks: list[Task] @dataclass class Person: name: str @dataclass class Task: description: str assigned_to: list[Person] @dataclass class Meeting: time: datetime.date note: str organizer: Person participants: list[Person] tasks: list[Task] The LLM uses the schema of this dataclass as its "extraction template," automatically returning structured data that matches the Python types. This provides direct guidance for the LLM about what information to extract and their schema. This is far more reliable than asking an LLM to generate free-form output, from which we cannot get structured information to build a knowledge graph. Extract and collect relationship with document["meetings"].row() as meeting: parsed = meeting["parsed"] = meeting["text"].transform( cocoindex.functions.ExtractByLlm( llm_spec=cocoindex.LlmSpec( api_type=cocoindex.LlmApiType.OPENAI, model="gpt-5" ), output_type=Meeting, ) ) with document["meetings"].row() as meeting: parsed = meeting["parsed"] = meeting["text"].transform( cocoindex.functions.ExtractByLlm( llm_spec=cocoindex.LlmSpec( api_type=cocoindex.LlmApiType.OPENAI, model="gpt-5" ), output_type=Meeting, ) ) Importantly, this step also benefits from incremental processing. SinceĀ ExtractByLlmĀ is a heavy step, we keep the output in cache, and as long as inputs (input data text, model, output type definition) have no change, we reuse the cached output without re-running the LLM. ExtractByLlm Collect relationship meeting_key = {"note_file": document["filename"], "time": parsed["time"]} meeting_nodes.collect(**meeting_key, note=parsed["note"]) attended_rels.collect( id=cocoindex.GeneratedField.UUID, **meeting_key, person=parsed["organizer"]["name"], is_organizer=True, ) with parsed["participants"].row() as participant: attended_rels.collect( id=cocoindex.GeneratedField.UUID, **meeting_key, person=participant["name"], ) with parsed["tasks"].row() as task: decided_tasks_rels.collect( id=cocoindex.GeneratedField.UUID, **meeting_key, description=task["description"], ) with task["assigned_to"].row() as assigned_to: assigned_rels.collect( id=cocoindex.GeneratedField.UUID, **meeting_key, task=task["description"], person=assigned_to["name"], ) meeting_key = {"note_file": document["filename"], "time": parsed["time"]} meeting_nodes.collect(**meeting_key, note=parsed["note"]) attended_rels.collect( id=cocoindex.GeneratedField.UUID, **meeting_key, person=parsed["organizer"]["name"], is_organizer=True, ) with parsed["participants"].row() as participant: attended_rels.collect( id=cocoindex.GeneratedField.UUID, **meeting_key, person=participant["name"], ) with parsed["tasks"].row() as task: decided_tasks_rels.collect( id=cocoindex.GeneratedField.UUID, **meeting_key, description=task["description"], ) with task["assigned_to"].row() as assigned_to: assigned_rels.collect( id=cocoindex.GeneratedField.UUID, **meeting_key, task=task["description"], person=assigned_to["name"], ) CollectorsĀ in CocoIndex act like ināmemory buffers: you declare collectors for different categories (meeting nodes, attendance, tasks, assignments), then as you process each document you ācollectā relevant entries. Collectors This blockĀ collects nodes and relationshipsĀ from parsed meeting notes to build a knowledge graph in Neo4j using CocoIndex: collects nodes and relationships Person ā Meeting (ATTENDED) Links participants (including organizers) to the meetings they attended. Meeting ā Task (DECIDED) Links meetings to tasks or decisions that were made. Person ā Task (ASSIGNED_TO) Links tasks back to the people responsible for them. Person ā Meeting (ATTENDED) Links participants (including organizers) to the meetings they attended. Person ā Meeting (ATTENDED) Person ā Meeting (ATTENDED) Links participants (including organizers) to the meetings they attended. Meeting ā Task (DECIDED) Links meetings to tasks or decisions that were made. Meeting ā Task (DECIDED) Meeting ā Task (DECIDED) Links meetings to tasks or decisions that were made. Person ā Task (ASSIGNED_TO) Links tasks back to the people responsible for them. Person ā Task (ASSIGNED_TO) Person ā Task (ASSIGNED_TO) Links tasks back to the people responsible for them. Map to graph database Overview We will be creating a property graph with following nodes and relationships: To learn more about property graph, please refer to CocoIndex'sĀ Property Graph TargetsĀ documentation. Property Graph Targets Map Meeting Nodes meeting_nodes.export( "meeting_nodes", cocoindex.targets.Neo4j( connection=conn_spec, mapping=cocoindex.targets.Nodes(label="Meeting") ), primary_key_fields=["note_file", "time"], ) meeting_nodes.export( "meeting_nodes", cocoindex.targets.Neo4j( connection=conn_spec, mapping=cocoindex.targets.Nodes(label="Meeting") ), primary_key_fields=["note_file", "time"], ) This uses CocoIndexāsĀ Neo4j targetĀ to export data to a graph database. TheĀ mapping=cocoindex.targets.Nodes(label="Meeting")Ā part tells CocoIndex: "Take each row collected inĀ meeting_nodesĀ and map it to aĀ nodeĀ in the Neo4j graph, with labelĀ Meeting." primary_key_fields=["note_file", "time"]Ā instructs CocoIndex which fields uniquely identify a node. That way, if the same meeting (sameĀ note_fileĀ andĀ time) appears in different runs/updates, it will map to the same node ā avoiding duplicates. This uses CocoIndexāsĀ Neo4j targetĀ to export data to a graph database. Neo4j target TheĀ mapping=cocoindex.targets.Nodes(label="Meeting")Ā part tells CocoIndex: "Take each row collected inĀ meeting_nodesĀ and map it to aĀ nodeĀ in the Neo4j graph, with labelĀ Meeting." mapping=cocoindex.targets.Nodes(label="Meeting") meeting_nodes node Meeting primary_key_fields=["note_file", "time"]Ā instructs CocoIndex which fields uniquely identify a node. That way, if the same meeting (sameĀ note_fileĀ andĀ time) appears in different runs/updates, it will map to the same node ā avoiding duplicates. primary_key_fields=["note_file", "time"] note_file time What ānode exportā means in CocoIndex ā Neo4j context Collector rows Graph entities Each collected row (meeting with its fields) One node in Neo4j with labelĀ Meeting Fields of that row Properties of the node (e.g.Ā note_file,Ā time,Ā note)Ā CocoIndex Collector rows Graph entities Each collected row (meeting with its fields) One node in Neo4j with labelĀ Meeting Fields of that row Properties of the node (e.g.Ā note_file,Ā time,Ā note)Ā CocoIndex Collector rows Graph entities Collector rows Collector rows Graph entities Graph entities Each collected row (meeting with its fields) One node in Neo4j with labelĀ Meeting Each collected row (meeting with its fields) Each collected row (meeting with its fields) One node in Neo4j with labelĀ Meeting One node in Neo4j with labelĀ Meeting Meeting Fields of that row Properties of the node (e.g.Ā note_file,Ā time,Ā note)Ā CocoIndex Fields of that row Fields of that row Properties of the node (e.g.Ā note_file,Ā time,Ā note)Ā CocoIndex Properties of the node (e.g.Ā note_file,Ā time,Ā note)Ā CocoIndex note_file time note CocoIndex Declare Person and Task Nodes flow_builder.declare( cocoindex.targets.Neo4jDeclaration( connection=conn_spec, nodes_label="Person", primary_key_fields=["name"], ) ) flow_builder.declare( cocoindex.targets.Neo4jDeclaration( connection=conn_spec, nodes_label="Task", primary_key_fields=["description"], ) ) flow_builder.declare( cocoindex.targets.Neo4jDeclaration( connection=conn_spec, nodes_label="Person", primary_key_fields=["name"], ) ) flow_builder.declare( cocoindex.targets.Neo4jDeclaration( connection=conn_spec, nodes_label="Task", primary_key_fields=["description"], ) ) TheĀ declare(...)Ā methodĀ onĀ flow_builderĀ lets youĀ preādeclareĀ node labels that may appear as source or target nodes in relationships ā even if you donāt have an explicit collector exporting them as standalone node rows. Neo4jDeclarationĀ is the specification for such declared nodes: you give it the connection, the node label (type), and theĀ primary_key_fieldsĀ that uniquely identify instances of that node TheĀ declare(...)Ā methodĀ onĀ flow_builderĀ lets youĀ preādeclareĀ node labels that may appear as source or target nodes in relationships ā even if you donāt have an explicit collector exporting them as standalone node rows. declare(...) method flow_builder preādeclare Neo4jDeclarationĀ is the specification for such declared nodes: you give it the connection, the node label (type), and theĀ primary_key_fieldsĀ that uniquely identify instances of that node Neo4jDeclaration primary_key_fields For example, for theĀ PersonĀ Declaration, Person You tell CocoIndex: āWe expectĀ Personālabeled nodes to exist in the graph. They will be referenced in relationships (e.g. a meetingās organizer or attendees, task assignee), but we donāt have a dedicated collector exporting Person rows.ā By declaringĀ Person, CocoIndex will handle deduplication: multiple relationships referencing the sameĀ nameĀ will map to the sameĀ PersonĀ node in Neo4j (becauseĀ nameĀ is the primary key). You tell CocoIndex: āWe expectĀ Personālabeled nodes to exist in the graph. They will be referenced in relationships (e.g. a meetingās organizer or attendees, task assignee), but we donāt have a dedicated collector exporting Person rows.ā Person By declaringĀ Person, CocoIndex will handle deduplication: multiple relationships referencing the sameĀ nameĀ will map to the sameĀ PersonĀ node in Neo4j (becauseĀ nameĀ is the primary key). Person name Person name How declaration works with relationships & export logic When you later export relationship collectors (e.g. ATTENDED, DECIDED, ASSIGNED_TO), those relationships will reference nodes of typeĀ PersonĀ orĀ Task. CocoIndex needs to know how to treat those node labels so it can create or match the corresponding nodes properly.Ā declare(...)Ā gives CocoIndex that knowledge. CocoIndex handlesĀ matching & deduplicationĀ of nodes by checking primaryākey fields. If a node with the same primary key already exists, it reuses it rather than creating a duplicate. When you later export relationship collectors (e.g. ATTENDED, DECIDED, ASSIGNED_TO), those relationships will reference nodes of typeĀ PersonĀ orĀ Task. CocoIndex needs to know how to treat those node labels so it can create or match the corresponding nodes properly.Ā declare(...)Ā gives CocoIndex that knowledge. When you later export relationship collectors (e.g. ATTENDED, DECIDED, ASSIGNED_TO), those relationships will reference nodes of typeĀ PersonĀ orĀ Task. CocoIndex needs to know how to treat those node labels so it can create or match the corresponding nodes properly.Ā declare(...)Ā gives CocoIndex that knowledge. Person Task declare(...) CocoIndex handlesĀ matching & deduplicationĀ of nodes by checking primaryākey fields. If a node with the same primary key already exists, it reuses it rather than creating a duplicate. CocoIndex handlesĀ matching & deduplicationĀ of nodes by checking primaryākey fields. If a node with the same primary key already exists, it reuses it rather than creating a duplicate. matching & deduplication Map ATTENDED Relationship ATTENDED relationships ATTENDED relationships attended_rels.export( "attended_rels", cocoindex.targets.Neo4j( connection=conn_spec, mapping=cocoindex.targets.Relationships( rel_type="ATTENDED", source=cocoindex.targets.NodeFromFields( label="Person", fields=[ cocoindex.targets.TargetFieldMapping( source="person", target="name" ) ], ), target=cocoindex.targets.NodeFromFields( label="Meeting", fields=[ cocoindex.targets.TargetFieldMapping("note_file"), cocoindex.targets.TargetFieldMapping("time"), ], ), ), ), primary_key_fields=["id"], ) attended_rels.export( "attended_rels", cocoindex.targets.Neo4j( connection=conn_spec, mapping=cocoindex.targets.Relationships( rel_type="ATTENDED", source=cocoindex.targets.NodeFromFields( label="Person", fields=[ cocoindex.targets.TargetFieldMapping( source="person", target="name" ) ], ), target=cocoindex.targets.NodeFromFields( label="Meeting", fields=[ cocoindex.targets.TargetFieldMapping("note_file"), cocoindex.targets.TargetFieldMapping("time"), ], ), ), ), primary_key_fields=["id"], ) This call ensures thatĀ ATTENDED relationshipsĀ ā i.e. āPerson ā Meetingā (organizer or participant ā the meeting) ā are explicitly encoded as edges in the Neo4j graph. It linksĀ PersonĀ nodes withĀ MeetingĀ nodes viaĀ ATTENDEDĀ relationships, enabling queries like āwhich meetings did Alice attend?ā or āwho attended meeting X?ā. By mappingĀ PersonĀ andĀ MeetingĀ nodes correctly and consistently (using unique keys), it ensures a clean graph with no duplicate persons or meetings. Because relationships get unique IDs and are exported with consistent keys, the graph remains stable across incremental updates: re-runs wonāt duplicate edges or nodes. This call ensures thatĀ ATTENDED relationshipsĀ ā i.e. āPerson ā Meetingā (organizer or participant ā the meeting) ā are explicitly encoded as edges in the Neo4j graph. ATTENDED relationships It linksĀ PersonĀ nodes withĀ MeetingĀ nodes viaĀ ATTENDEDĀ relationships, enabling queries like āwhich meetings did Alice attend?ā or āwho attended meeting X?ā. Person Meeting ATTENDED By mappingĀ PersonĀ andĀ MeetingĀ nodes correctly and consistently (using unique keys), it ensures a clean graph with no duplicate persons or meetings. Person Meeting Because relationships get unique IDs and are exported with consistent keys, the graph remains stable across incremental updates: re-runs wonāt duplicate edges or nodes. Map DECIDED Relationship DECIDED relationships DECIDED relationships decided_tasks_rels.export( "decided_tasks_rels", cocoindex.targets.Neo4j( connection=conn_spec, mapping=cocoindex.targets.Relationships( rel_type="DECIDED", source=cocoindex.targets.NodeFromFields( label="Meeting", fields=[ cocoindex.targets.TargetFieldMapping("note_file"), cocoindex.targets.TargetFieldMapping("time"), ], ), target=cocoindex.targets.NodeFromFields( label="Task", fields=[ cocoindex.targets.TargetFieldMapping("description"), ], ), ), ), primary_key_fields=["id"], ) decided_tasks_rels.export( "decided_tasks_rels", cocoindex.targets.Neo4j( connection=conn_spec, mapping=cocoindex.targets.Relationships( rel_type="DECIDED", source=cocoindex.targets.NodeFromFields( label="Meeting", fields=[ cocoindex.targets.TargetFieldMapping("note_file"), cocoindex.targets.TargetFieldMapping("time"), ], ), target=cocoindex.targets.NodeFromFields( label="Task", fields=[ cocoindex.targets.TargetFieldMapping("description"), ], ), ), ), primary_key_fields=["id"], ) This call ensures thatĀ DECIDED relationshipsĀ ā i.e., āMeeting ā Taskā ā are explicitly encoded as edges in the Neo4j graph. It linksĀ MeetingĀ nodes withĀ TaskĀ nodes viaĀ DECIDEDĀ relationships, enabling queries like: āWhich tasks were decided in Meeting X?ā āFrom which meeting did Task Y originate?ā By mappingĀ MeetingĀ andĀ TaskĀ nodes consistently (usingĀ note_file + timeĀ for meetings andĀ descriptionĀ for tasks), it prevents duplicate tasks or meeting nodes in the graph. Because relationships have unique IDs and are exported with consistent keys, the graph remains stable across incremental updates: re-running the pipeline wonāt create duplicate edges or nodes. This call ensures thatĀ DECIDED relationshipsĀ ā i.e., āMeeting ā Taskā ā are explicitly encoded as edges in the Neo4j graph. DECIDED relationships It linksĀ MeetingĀ nodes withĀ TaskĀ nodes viaĀ DECIDEDĀ relationships, enabling queries like: āWhich tasks were decided in Meeting X?ā āFrom which meeting did Task Y originate?ā Meeting Task DECIDED āWhich tasks were decided in Meeting X?ā āFrom which meeting did Task Y originate?ā āWhich tasks were decided in Meeting X?ā āFrom which meeting did Task Y originate?ā By mappingĀ MeetingĀ andĀ TaskĀ nodes consistently (usingĀ note_file + timeĀ for meetings andĀ descriptionĀ for tasks), it prevents duplicate tasks or meeting nodes in the graph. Meeting Task note_file + time description Because relationships have unique IDs and are exported with consistent keys, the graph remains stable across incremental updates: re-running the pipeline wonāt create duplicate edges or nodes. Map ASSIGNED_TO Relationship ASSIGNED_TO relationships ASSIGNED_TO relationships assigned_rels.export( "assigned_rels", cocoindex.targets.Neo4j( connection=conn_spec, mapping=cocoindex.targets.Relationships( rel_type="ASSIGNED_TO", source=cocoindex.targets.NodeFromFields( label="Person", fields=[ cocoindex.targets.TargetFieldMapping( source="person", target="name" ), ], ), target=cocoindex.targets.NodeFromFields( label="Task", fields=[ cocoindex.targets.TargetFieldMapping( source="task", target="description" ), ], ), ), ), primary_key_fields=["id"], ) assigned_rels.export( "assigned_rels", cocoindex.targets.Neo4j( connection=conn_spec, mapping=cocoindex.targets.Relationships( rel_type="ASSIGNED_TO", source=cocoindex.targets.NodeFromFields( label="Person", fields=[ cocoindex.targets.TargetFieldMapping( source="person", target="name" ), ], ), target=cocoindex.targets.NodeFromFields( label="Task", fields=[ cocoindex.targets.TargetFieldMapping( source="task", target="description" ), ], ), ), ), primary_key_fields=["id"], ) It takes all theĀ task assignment dataĀ you collected (assigned_rels) ā i.e., which person is responsible for which task. task assignment data assigned_rels This explicitly encodesĀ task ownershipĀ in the graph, linking people to the tasks they are responsible for. It enables queries like: "Which tasks is Alice assigned to?" "Who is responsible for Task X?" By using consistent node mappings (nameĀ forĀ Person,Ā descriptionĀ forĀ Task), it prevents duplicate person or task nodes. Unique IDs on relationships ensure the graph remains stable across incremental updates ā re-running the flow won't create duplicate edges. This explicitly encodesĀ task ownershipĀ in the graph, linking people to the tasks they are responsible for. task ownership It enables queries like: "Which tasks is Alice assigned to?" "Who is responsible for Task X?" "Which tasks is Alice assigned to?" "Who is responsible for Task X?" "Which tasks is Alice assigned to?" "Who is responsible for Task X?" By using consistent node mappings (nameĀ forĀ Person,Ā descriptionĀ forĀ Task), it prevents duplicate person or task nodes. name Person description Task Unique IDs on relationships ensure the graph remains stable across incremental updates ā re-running the flow won't create duplicate edges. The Resulting Graph After running this pipeline, your Neo4j database contains a rich, queryable graph: Nodes: Nodes: MeetingĀ - Represents individual meetings with properties like date and notes PersonĀ - Represents individuals involved in meetings TaskĀ - Represents actionable items decided in meetings MeetingĀ - Represents individual meetings with properties like date and notes Meeting PersonĀ - Represents individuals involved in meetings Person TaskĀ - Represents actionable items decided in meetings Task Relationships: Relationships: ATTENDEDĀ - Connects people to meetings they attended DECIDEDĀ - Connects meetings to tasks that were decided ASSIGNED_TOĀ - Connects people to tasks they're responsible for ATTENDEDĀ - Connects people to meetings they attended ATTENDED DECIDEDĀ - Connects meetings to tasks that were decided DECIDED ASSIGNED_TOĀ - Connects people to tasks they're responsible for ASSIGNED_TO Importantly, in the final step to export to the knowledge graph, CocoIndex also does this incrementally. CocoIndex only mutates the knowledge graph for nodes or relationships that have changes, and it's a no-op for unchanged stuff. This avoids unnecessary churning on the target database and minimizes the cost of target write operations. Run Build/update the graph Build/update the graph Install dependencies: pip install -e . pip install -e . Update the index (run the flow once to build/update the graph): cocoindex update main cocoindex update main Browse the knowledge graph Browse the knowledge graph Open Neo4j Browser atĀ http://localhost:7474. http://localhost:7474 Sample Cypher queries: // All relationships MATCH p=()-->() RETURN p // Who attended which meetings (including organizer) MATCH (p:Person)-[:ATTENDED]->(m:Meeting) RETURN p, m // Tasks decided in meetings MATCH (m:Meeting)-[:DECIDED]->(t:Task) RETURN m, t // Task assignments MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task) RETURN p, t // All relationships MATCH p=()-->() RETURN p // Who attended which meetings (including organizer) MATCH (p:Person)-[:ATTENDED]->(m:Meeting) RETURN p, m // Tasks decided in meetings MATCH (m:Meeting)-[:DECIDED]->(t:Task) RETURN m, t // Task assignments MATCH (p:Person)-[:ASSIGNED_TO]->(t:Task) RETURN p, t CocoInsight CocoInsight I used CocoInsight (Free beta now) to troubleshoot the index generation and understand the data lineage of the pipeline. It just connects to your local CocoIndex server, with Zero pipeline data retention. Start CocoInsight: cocoindex server -ci main cocoindex server -ci main Then open the UI atĀ https://cocoindex.io/cocoinsight. https://cocoindex.io/cocoinsight Key CocoIndex Features Demonstrated CocoIndex This example showcases several powerful CocoIndex capabilities, each critical for enterprise deployment: CocoIndex 1. Incremental Processing with Change Detection Changes to only a few meeting notes files trigger re-processing of just those files, not the entire document set. This dramatically reduces: LLM API costs (99%+ reduction for typical 1% daily churn) Compute resource consumption Database I/O and storage operations Overall pipeline execution time LLM API costs (99%+ reduction for typical 1% daily churn) Compute resource consumption Database I/O and storage operations Overall pipeline execution time In large enterprises, this transforms knowledge graph pipelines from expensive luxury to cost-effective standard practice. 2. Data Lineage and Observability CocoIndex tracks data transformations step-by-step. You can see where every field in your Neo4j graph came fromātracing back through LLM extraction, collection, and mapping. This becomes critical when meeting notes are edited: you can identify which changes propagated to the graph and when. CocoIndex 3. Declarative Data Flow The entire pipeline is defined declaratively in Python without complex plumbing. The framework handles scheduling, error recovery, state management, and change tracking automatically. This reduces development time and operational burden compared to building incremental ETL logic from scratch. 4. Schema Management and Idempotency CocoIndex automatically manages Neo4j schema based on your data transformationsācreating nodes and relationships on-the-fly while enforcing primary key constraints for data consistency. Primary key fields ensure that document edits, section deletions, and task reassignments update existing records rather than creating duplicatesāessential for maintaining data quality in large, evolving document sets. CocoIndex 5. Real-time Update Capability By changing the execution mode from batch to live, the pipeline continuously monitors Google Drive for changes and updates your knowledge graph in near real-time. The moment a meeting note is updated, edited, or a section is deleted, the graph reflects those changes within the next polling interval. Beyond Meeting Notes: The Incremental Graph Pattern for Enterprise Data This source ā extract ā graph pipeline isn't meeting-specificāit's a general pattern for any high-churn document corpus. Here's where the same architecture delivers immediate ROI: Research & IP Management Research & IP Management Parse internal research papers, patents, and technical docs into concept graphs Track citation networks, author collaborations, and methodology evolution Incremental benefit: Papers get revised pre-publication; abstracts get rewritten; citations change as related work emerges Parse internal research papers, patents, and technical docs into concept graphs Track citation networks, author collaborations, and methodology evolution Incremental benefit: Papers get revised pre-publication; abstracts get rewritten; citations change as related work emerges Support Intelligence Support Intelligence Extract issues, resolutions, affected customers, and product components from ticket systems Build graphs linking recurring problems to root causes and KB articles Incremental benefit: Tickets mutate constantlyāstatus changes, reassignments, solution updates, customer follow-ups Extract issues, resolutions, affected customers, and product components from ticket systems Build graphs linking recurring problems to root causes and KB articles Incremental benefit: Tickets mutate constantlyāstatus changes, reassignments, solution updates, customer follow-ups Email Thread Intelligence Email Thread Intelligence Parse millions of email threads into decision graphs: who decided what, when, and why Surface hidden knowledge: "What was the rationale for deprecating API v2?" lives in a 2022 email thread Incremental benefit: Threads continue, forwards add context, replies modify positions Parse millions of email threads into decision graphs: who decided what, when, and why Surface hidden knowledge: "What was the rationale for deprecating API v2?" lives in a 2022 email thread Incremental benefit: Threads continue, forwards add context, replies modify positions Regulatory & Compliance Graphs Regulatory & Compliance Graphs Extract requirements from policies, regulations, audit reports, and standards docs Map requirement dependencies: which controls cascade when GDPR Article 17 changes Incremental benefit: Regulations get amended, internal policies get versioned, audit findings trigger doc updates Extract requirements from policies, regulations, audit reports, and standards docs Map requirement dependencies: which controls cascade when GDPR Article 17 changes Incremental benefit: Regulations get amended, internal policies get versioned, audit findings trigger doc updates Market Intelligence Market Intelligence Ingest competitor press releases, SEC filings, news articles, product announcements Build graphs of competitor products, partnerships, hiring patterns, market positioning Incremental benefit: News flows constantly, filings get amended, partnerships evolve Ingest competitor press releases, SEC filings, news articles, product announcements Build graphs of competitor products, partnerships, hiring patterns, market positioning Incremental benefit: News flows constantly, filings get amended, partnerships evolve Contract & Legal Document Analysis Contract & Legal Document Analysis Extract entities, obligations, dates, and dependencies from contracts and agreements Track amendment chains, renewal dates, party changes, obligation fulfillments Incremental benefit: Contracts get amended via riders, parties get acquired/renamed, terms get renegotiated Extract entities, obligations, dates, and dependencies from contracts and agreements Track amendment chains, renewal dates, party changes, obligation fulfillments Incremental benefit: Contracts get amended via riders, parties get acquired/renamed, terms get renegotiated Codebase Documentation Graphs Codebase Documentation Graphs Parse READMEs, architecture docs, API specs, and inline comments into knowledge graphs Link code modules to architectural decisions, dependencies, and responsible teams Incremental benefit: Docs drift, code refactors cascade through doc updates, ownership changes Parse READMEs, architecture docs, API specs, and inline comments into knowledge graphs Link code modules to architectural decisions, dependencies, and responsible teams Incremental benefit: Docs drift, code refactors cascade through doc updates, ownership changes Clinical Trial & Research Data Clinical Trial & Research Data Extract protocols, adverse events, patient cohorts, and outcomes from trial documentation Build graphs linking interventions to outcomes, patients to cohorts, papers to trials Incremental benefit: Protocols get amended, safety reports accumulate, publications reference evolving data Extract protocols, adverse events, patient cohorts, and outcomes from trial documentation Build graphs linking interventions to outcomes, patients to cohorts, papers to trials Incremental benefit: Protocols get amended, safety reports accumulate, publications reference evolving data The pattern holds:Ā any document corpus with >1% monthly churn and complex entity relationships benefits from incremental graph construction over batch reprocessing. any document corpus with >1% monthly churn and complex entity relationships benefits from incremental graph construction over batch reprocessing. Summary The combination of CocoIndex's incremental processing, LLM-powered extraction, and Neo4j's graph database creates a powerful system for turning unstructured meeting notes into queryable, actionable intelligence. In enterprise environments where document volumes reach millions and change rates run into thousands daily, incremental processing isn't a nice-to-haveāit's essential for cost-effective, scalable knowledge graph operations. CocoIndex's Rather than drowning in plain-text documents or reprocessing the entire corpus constantly, organizations can now explore meeting data as a connected graph, uncovering patterns and relationships invisible in static documentsāwithout the prohibitive costs of full reprocessing. This example demonstrates a broader principle:Ā modern data infrastructure combines AI, databases, and intelligent orchestration. CocoIndex handles the orchestration with change detection and incremental processing, LLMs provide intelligent understanding, and Neo4j provides efficient relationship querying. Together, they form a foundation for knowledge extraction at enterprise scale. modern data infrastructure combines AI, databases, and intelligent orchestration CocoIndex Whether you're managing meetings, research, customer interactions, or any text-heavy domain, this patternāsource ā detect changes ā split ā extract ā collect ā exportāprovides a reusable template for building knowledge graphs that scale with your data while remaining cost-effective as volumes and change rates grow. Support CocoIndex ā¤ļø If this example was helpful, the easiest way to support CocoIndex is toĀ give the project a ā on GitHub. give the project a ā on GitHub Your stars help us grow the community, stay motivated, and keep shipping better tools for real-time data ingestion and transformation.