Data modeling in is not as obvious as it is when dealing with relational databases. Unlike traditional relational databases that rely on data normalization and SQL joins, Elasticsearch requires alternative approaches for managing relationships. Elasticsearch There are four common workarounds to managing relationships in Elasticsearch: Application-side joins Data denormalization Nested field types and nested queries Parent-child relationships In this blog, we’ll discuss how you can design your data model to handle relationships using the nested field type and parent-child relationships. We’ll cover the architecture, performance implications, and use cases for these two techniques. Nested Field Types and Nested Queries Elasticsearch supports nested structures, where objects can contain other objects. Nested field types are JSON objects within the main document, which can have their own distinct fields and types. These nested objects are treated as separate, hidden documents that can only be accessed using a nested query. Nested field types are well-suited for relationships where data integrity, close coupling, and hierarchical structure are important. These include one-to-one and one-to-many relationships where there is one main entity. For example, representing a person and their multiple addresses and phone numbers within a single document. With nested field types, Elasticsearch stores the entire document, as well as parent and nested objects, on a single Lucene block and segment. This can result in faster query speeds as the relationship is contained to a document. Example of Nested Field Type and Nested Query Let’s look at an example of a blog post with comments. We want to nest the comments below the blog post so they can be easily queried together in the same document. GITHUB JULIE-MILLS {
  "post_id": "1",
  "title": "Introduction to Elasticsearch Data Modeling",
  "content": "Exploring various data modeling options in Elasticsearch.",
  "comments": [
    {
      "comment_id": "101",
      "text": "Great overview of data modeling!"
    },
    {
      "comment_id": "102",
      "text": "Looking forward to more content."
    }
  ]
} Benefits of Nested Field Types and Nested Queries The benefits of nested object relationships include: Data is stored in the same Lucene block and segment: Storing nested objects in the same Lucene block and segment leads to faster queries because the data is collocated. Data integrity: Because the relationships are maintained within the same document, it can ensure accuracy in nested queries. Document data model: Easy for developers familiar with the NoSQL data model where you are querying documents and nested data within them. Drawbacks of Nested Field Types and Nested Queries : on any part of a document with nested objects require reindexing the entire document, which can be memory-intensive, especially if the documents are large or updates are frequent. Update inefficiency Updates, inserts, and deletes : If you have documents with particularly large nested fields, this can have a performance implication. This is because the search request retrieves the entire document. Query performance with large nested fields : Running queries across nested structures with multiple levels can still become complex. That’s because queries may involve nested queries within nested queries, leading to less readable code. Multiple levels of nesting can become complex Parent-Child Relationships In parent-child mapping, documents are organized into parent and child types. Each child document has a direct association with a parent document. This relationship is established through a specific field value in the child document that matches the parent's ID. The parent-child model adopts a decentralized approach where parent and child documents exist independently. Parent-child joins are suitable for one-to-many or many-to-many relationships between entities. Imagine an application where you want to create relationships between companies and contacts and want to search for companies and contacts as well as contacts at specific companies. Elasticsearch makes parent-child joins performant by keeping track of what parents are connected to which children and having both entities reside on the same shard. By localizing the join operation, Elasticsearch avoids the need for extensive inter-shard communication, which can be a performance bottleneck. Example of Parent-Child Relationships Let’s take the example of a parent-child relationship for blog posts and comments. Each blog post, i.e., the parent, can have several comments, i.e., the children. To create the parent-child relationship, let’s index the data as follows: PUT my-index-000001
{
  "mappings": {
    "properties": {
      "post_id": {
        "type": "keyword"
      },
      "post_id": { 
        "type": "join",
        "relations": {
          "post": "comment" 
        }
      }
    }
  }
} A parent document would be a post that looks like this: {
  "post_id": "1",
  "title": "Introduction to Elasticsearch Data Modeling",
  "content": "Exploring various data modeling options in Elasticsearch."
} The child document would then be a comment that contains the post_id linking it to its parent. {
  "comment_id": "101",
  "text": "Great overview of data modeling!",
  "post_id": "1"
} Benefits of Parent-Child Relationships The benefits of parent-child modeling include: : In parent-child relationships, the parent and child documents are separate and are linked by a unique parent ID. This setup is closer to a relational database model and can be more intuitive for those familiar with such concepts. Resembles relational data model : Child documents can be added, modified, or deleted without affecting the parent document or other child documents. This is particularly beneficial when dealing with a large number of child documents that require frequent updates. Note that associating a child document with a different parent is a more complex process as the new parent may be on another shard. Update efficiency : Since child documents are stored separately, they may be more memory and storage-efficient, especially in cases where there are many child documents with significant size differences. Better suited for heterogeneous children Drawbacks of Parent-Child Relationships The drawbacks of parent-child relationships include: : Joining documents across separate indices adds computational work during query execution, again impacting performance. notes that parent-child queries can be 5-10x slower than querying nested objects. Expensive, slow queries Elasticsearch : Parent-child relationships can consume more memory and cache resources. Elasticsearch maintains a map of parent-child relationships, which can grow large and consume significant memory, especially with a high volume of documents. Mapping overhead : Since both parent and child documents reside on the same shard, there's a potential risk of uneven data distribution across the cluster. Some shards might become significantly larger than others, especially if there are parent documents with many children. This can lead to challenges in managing and . Shard size management scaling the Elasticsearch cluster : If you need to , the parent-child relationship can complicate this process. You'll need to ensure that the relationship integrity is maintained during such operations. Routine cluster maintenance tasks, such as shard rebalancing or node upgrades, may become more complex. Special care must be taken to ensure that parent-child relationships are not disrupted during these processes. Reindexing and cluster maintenance reindex data or change the sharding strategy , the company behind Elasticsearch, will always recommend that you do application-side joins, data denormalization and/or nested objects before going down the path of parent-child relationships. Elastic Feature Comparison of Nested Queries and Parent-Child Relationships The table below provides a recap of the characteristics of nested field types and queries and parent-child relationships to compare the data modeling approaches side by side. Nested field types and nested queries Parent-child relationships Definition Nests an object within another object Links parent and child documents together Relationships One-to-one, one-to-many One-to-many, many-to-many Query speed Generally faster than parent-child relationships as the data is stored in the same block and segment Generally 5-10x slower than nested objects as parent and child documents are joined at query time. Query flexibility Less flexible than parent-child queries as it limits the scope of the querying to within the bounds of each nested object Offers more flexibility in querying as parent or child documents can be queried together or separately Data updates Updating nested objects required the reindexing of the entire document Updating child documents is easier as it does not require all documents to be reindexed Management Simpler management since everything is contained within a single document More complex to manage due to separate indexing and maintaining of relationships between parent and child documents Use cases Store and query complex data with multiple levels of hierarchy Relationships where there are few parents and many children, like products and product reviews Alternatives to Elasticsearch for Relationship Modeling While Elasticsearch provides several workarounds to , including nested queries and parent-child relationships, it's established that these models do not scale well. When designing for applications at scale, it may make sense to consider an alternative approach with native SQL join capabilities, . SQL-style joins Rockset Rockset is a search and analytics database that's designed for SQL search, aggregations, and joins on any data, including deeply nested JSON data. As data is streamed into Rockset, it is encoded in the database’s core data structures used to store and index the data for fast retrieval. Rockset indexes the data in a way that allows for fast queries, including joins, using its SQL-based query optimizer. As a result, there is no upfront data modeling required to support SQL joins. One of the challenges with Elasticsearch is how to preserve the relationship in an efficient manner when data is updated. One of the reasons is that Elasticsearch is built on Apache Lucene, which stores data in immutable segments, resulting in all documents needing to be reindexed. Rockset uses RocksDB, a key-value store open-sourced by Meta and built for data mutations, to be able to efficiently support field-level updates without needing to reindex entire documents. Comparing Elasticsearch and Rockset Using a Real-World Example Let’s compare the parent-child relationship approach in Elasticsearch with a in Rockset. SQL query In the parent-child relationship example above, we modeled posts with multiple comments by creating two document types: posts or the parent document type comments or the child document types We used a unique identifier, the parent ID, to establish the relationship between the parent and child documents. At query time, we use the Elasticsearch DSL to retrieve comments for a specific post. In Rockset, the data containing posts would be stored in one collection, a table in the relational world, while the data containing comments would be stored in a separate collection. At query time, we would join the data together using a SQL query. Here are the two approaches side-by-side: Parent-Child Relationships in Elasticsearch POST /blog/posts/1
{
  "title": "Elasticsearch Modeling",
  "content": "A post about data modeling in Elasticsearch"
}

POST /blog/comments/2?parent=1
{
  "text": "Great post!"
}

POST /blog/comments/3?parent=1
{
  "text": "I learned a lot from this."
} To retrieve a post by its title and all of its comments, you would need to create a query as follows. GET /posts/_search
{
  "query": {
    "bool": {
      "must": [
        { "match": { "title": "Exploring Elasticsearch Models" } }
      ]
    }
  },
  "inner_hits": {
    "_source": ["text"],
    "name": "comments",
    "path": "comments"
  }
} SQL in Rockset To then query this data, you just need to write a simple SQL query. SELECT p.title, p.content, c.text
FROM posts p
JOIN comments c ON p.post_id = c.post_id
WHERE p.post_id = 1; If you have multiple data sets that need to be joined for your application, then Rockset is more straightforward and scalable than Elasticsearch. It also simplifies operations as you do not need to remodel your data, manage updates, or reindex operations. Managing Relationships in Elasticsearch This blog provided an overview of the nested field types and nested queries, and parent-child relationships in Elasticsearch with the goal of helping you to determine the best data modeling approach for your workload. The nested field types and queries are useful for one-to-one or one-to-many relationships where the relationship is maintained within a single document. This is considered to be a simpler and more scalable approach to relationship management. The parent-child relationship model is better suited for one-to-many to many-to-many relationships but comes with increased complexity, especially as the relationships need to be contained to a specific shard. If one of the primary requirements of your application is modeling relationships, it may make sense to consider Rockset. Rockset simplifies data modeling and offers a more scalable approach to relationship management using SQL joins. You can compare and contrast the performance of Elasticsearch and Rockset by with $300 in credits today. starting a free trial

Walkthroughs, tutorials, guides, and tips. This story will teach you how to do something new or how to do something better.

This writer has a vested interest be it monetary, business, or otherwise, with 1 or more of the products or companies mentioned within.

Data Modeling in Elasticsearch: Using Nested Queries and Parent-Child Relationships

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

4 Elasticsearch Performance Challenges and How to Solve Them

A Step By Step Guide To Data Visualization With Power BI

An A-Z Guide to Decision Trees

Analysis And Prediction on HR Data Set for Beginners

Comparing Different Time-Series Databases

Data Modeling in Salesforce and Heroku Data Services

4 Elasticsearch Performance Challenges and How to Solve Them

A Step By Step Guide To Data Visualization With Power BI

An A-Z Guide to Decision Trees

Analysis And Prediction on HR Data Set for Beginners

Comparing Different Time-Series Databases

Data Modeling in Salesforce and Heroku Data Services

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps