Unlocking the Power of JSON in MySQL: Tips and Tricks

Written by akankov | Published 2023/09/11
Tech Story Tags: mysql | json | json-schema | performance | indexes | json-performance-comparison | migrate-from-text-to-json | json-in-mysql

TLDR In this article, I will explain how and why you should use the JSON data type instead of TEXT.via the TL;DR App

JSON data type was first introduced in MySQL 5.7.8 in August 2015. It was quite a while ago, but I still see many people using the TEXT data type to store JSON data. In this article, I will explain how and why you should use the JSON data type instead of TEXT, show how to migrate from TEXT to JSON, and demonstrate a few tips and tricks with JSON data type.

Why use JSON data type instead of TEXT

The first and most obvious reason is that JSON data type provides validation for JSON data. It checks that each JSON value is valid before inserting it into the database.

The second and most important reason is that the JSON data type is optimized for storing JSON data. It checks that each JSON value is valid and stores each element in an optimized binary format that allows quick read access to JSON documents.

The third reason is that the JSON data type allows you to create indexes using JSON values. This is a massive advantage if you need to query JSON data.

The fourth reason is that the JSON data type allows you to use JSON functions to manipulate JSON data.

Performance comparison

Strictly speaking, it is possible to use JSON functions on TEXT data. But it is not recommended. Let's compare the speed of JSON functions on TEXT and JSON data types and use examples from a previous article. But instead of comparing old JSON columns and current JSON columns, we compare JSON and TEXT.

Let's start with TEXT.

CREATE TABLE t_with_text
(
    id       INT PRIMARY KEY AUTO_INCREMENT,
    json_col LONGTEXT,
    name     VARCHAR(100) AS (json_col ->> '$.name'),
    age      INT AS (json_col -> '$.age')
);

INSERT INTO t_with_text(json_col)
VALUES (JSON_OBJECT('name', 'Joe', 'age', 24,
                    'data', REPEAT('x', 10 * 1000 * 1000))),
       (JSON_OBJECT('name', 'Sue', 'age', 32,
                    'data', REPEAT('y', 10 * 1000 * 1000))),
       (JSON_OBJECT('name', 'Pete', 'age', 40,
                    'data', REPEAT('z', 10 * 1000 * 1000))),
       (JSON_OBJECT('name', 'Jenny', 'age', 27,
                    'data', REPEAT('w', 10 * 1000 * 1000)));

INSERT INTO t_with_text(json_col)
SELECT json_col
FROM t_with_text;
INSERT INTO t_with_text(json_col)
SELECT json_col
FROM t_with_text;

UPDATE t_with_text
SET json_col = JSON_SET(json_col, '$.age', age + 1); -- 16 rows affected in 4 s 418 ms

And now JSON.

CREATE TABLE t_with_json
(
    id       INT PRIMARY KEY AUTO_INCREMENT,
    json_col LONGTEXT,
    name     VARCHAR(100) AS (json_col ->> '$.name'),
    age      INT AS (json_col -> '$.age')
);

INSERT INTO t_with_json(json_col)
VALUES (JSON_OBJECT('name', 'Joe', 'age', 24,
                    'data', REPEAT('x', 10 * 1000 * 1000))),
       (JSON_OBJECT('name', 'Sue', 'age', 32,
                    'data', REPEAT('y', 10 * 1000 * 1000))),
       (JSON_OBJECT('name', 'Pete', 'age', 40,
                    'data', REPEAT('z', 10 * 1000 * 1000))),
       (JSON_OBJECT('name', 'Jenny', 'age', 27,
                    'data', REPEAT('w', 10 * 1000 * 1000)));

INSERT INTO t_with_json(json_col)
SELECT json_col
FROM t_with_json;
INSERT INTO t_with_json(json_col)
SELECT json_col
FROM t_with_json;

UPDATE t_with_json
SET json_col = JSON_SET(json_col, '$.age', age + 1); -- 16 rows affected in 1 s 59 ms

As you can see, the JSON column performs much faster than TEXT.

How to migrate from TEXT to JSON

In order to migrate from TEXT to JSON, you need to create a new column with JSON data type, copy data from TEXT column to JSON column, and drop TEXT column.

ALTER TABLE `users`
    ADD COLUMN `data_json` JSON AFTER `data`;

UPDATE `users`
SET `data_json` = `data`;

ALTER TABLE `users`
    DROP COLUMN `data`;

It looks pretty simple, but there are few things to consider.

First - TEXT column can contain invalid JSON data. In this case, you need to fix invalid JSON data before migrating to JSON data type.

In order to select invalid JSON data, you can use the following query:

SELECT *
FROM `users`
WHERE JSON_VALID(`data`) = 0;

This query will return all rows where the data column contains invalid JSON data.

Second - the amount of data stored in JSON columns might be huge. In this case, you need to use a tool like pt-online-schema-change to avoid locking the table.

After migrating to JSON data type, it is possible to check your data against JSON schema. In order to do this, you need to create a JSON schema and use JSON_SCHEMA_VALID function.

SET @schema = '{
  "type": "object",
  "required": [
    "firstName",
    "lastName",
    "age"
  ],
  "properties": {
    "age": {
      "type": "integer",
      "minimum": 0
    },
    "firstName": {
      "type": "string",
      "minLength": 1,
      "maxLength": 120
    },
    "lastName": {
      "type": "string",
      "minLength": 1,
      "maxLength": 120
    }
  }
}
';

SELECT *
FROM `users`
WHERE JSON_SCHEMA_VALID(@schema, `data_json`) = 0;

JSON_SCHEMA_VALID works only on valid JSON documents. If you have invalid JSON documents, you need to fix them first.

Indexing JSON data

MySQL allows you to create indexes on JSON data. In order to create an index on JSON data, you need to specify the path to the JSON value.

CREATE INDEX `idx_users_data_json_firstName`
    ON `users` ((CAST(`data_json` ->> '$.firstName' AS CHAR(120))));

You can also create a virtual column and create an index on it.

ALTER TABLE `users`
    ADD COLUMN `firstName` VARCHAR(120) AS (`data_json` ->> '$.firstName') VIRTUAL;

CREATE INDEX `idx_users_firstName`
    ON `users` (`firstName`);

Multi-Valued Indexes

MySQL 8.0.17 introduced multi-valued indexes. Multi-valued indexes allow you to create indexes on JSON arrays.

CREATE INDEX `idx_users_data_json_tags`
    ON `users` ((CAST(`data_json` ->> '$.tags' AS CHAR(255) array)));

I can't say multi-valued indexes work well right now. We stumbled upon a bug while trying to use them. During the investigation, we found that a few other bugs were reported with similar but different problems. Here is that bug:

CREATE TABLE explain_bug
(
    project_id INT  NULL,
    json_data  JSON NULL,
    INDEX project_id_json_data (project_id, (CAST(json_data -> '$' AS CHAR(255) ARRAY)))
) COLLATE = utf8mb4_unicode_ci;

INSERT INTO explain_bug (project_id, json_data)
VALUES (42, '[]');

SELECT *
FROM explain_bug
WHERE project_id = 42; -- nothing is returned

SELECT *
FROM explain_bug IGNORE INDEX (project_id_json_data)
WHERE project_id = 42; -- expected result is returned                                               

Here is a couple more related bugs: https://bugs.mysql.com/bug.php?id=108659 https://bugs.mysql.com/bug.php?id=109333

So multi-valued indexes are not production-ready, but it is a nice feature to have and I hope it will be fixed soon.

JSON functions

MySQL provides a lot of functions to work with JSON data. You can find the full list of JSON functions here. Let's check a few of them that I use most often.

JSON_EXTRACT

JSON_EXTRACT function allows you to extract a value from a JSON document. It takes at least two arguments: JSON document and a path to the value.

SELECT JSON_EXTRACT('{"id": 42, "name": "Joe"}', '$.name'); -- "Joe"

There is a shorthand for JSON_EXTRACT function. You can use -> operator instead of JSON_EXTRACT function.

SELECT '{"id": 42, "name": "Joe"}' -> '$.name'; -- "Joe"

It is also possible to extract multiple values from a JSON document.

SELECT JSON_EXTRACT('{"id": 42, "name": "Joe"}', '$.name', '$.id'); -- ["Joe", 42]

JSON_SET

JSON_SET function allows you to set a value in a JSON document. It takes at least three arguments: JSON document, a path to the value, and a new value.

SELECT JSON_SET('{"id": 42, "name": "Joe"}', '$.name', 'Sue'); -- {"id": 42, "name": "Sue"}

It is possible to update multiple values in one call. For that you need to specify a path and a new value for each value you want to update.

SELECT JSON_SET('{"id": 42, "name": "Joe"}', '$.name', 'Sue', '$.id', 43); -- {"id": 43, "name": "Sue"}

JSON_REMOVE

JSON_REMOVE function allows you to remove a value from a JSON document. It takes at least two arguments: JSON document and a path to the value.

SELECT JSON_REMOVE('{"id": 42, "name": "Joe"}', '$.name'); -- {"id": 42}

It is possible to remove multiple values in one call. For that, you need to specify a path for each value you want to remove.

SELECT JSON_REMOVE('{"id": 42, "name": "Joe"}', '$.name', '$.id'); -- {}

JSON_MERGE_PRESERVE

JSON_MERGE_PRESERVE function allows you to merge multiple JSON documents into one. It takes at least two JSON documents as arguments:

SELECT JSON_MERGE_PRESERVE('{"id": 42, "name": "Joe"}',
                           '{"id": 43, "name": "Sue"}'); -- {"id": [42, 43], "name": ["Joe", "Sue"]}

Conclusion

JSON data type is a great way to store JSON data in MySQL. It provides validation, allows you to create indexes, and use JSON functions to manipulate JSON data. It is a great alternative to TEXT data type.


Written by akankov | Convert coffee into code )
Published by HackerNoon on 2023/09/11