JSON data type was first introduced in MySQL 5.7.8 in August 2015. It was quite a while ago, but I still see many people using the TEXT data type to store JSON data. In this article, I will explain how and why you should use the JSON data type instead of TEXT, show how to migrate from TEXT to JSON, and demonstrate a few tips and tricks with JSON data type.
The first and most obvious reason is that JSON data type provides validation for JSON data. It checks that each JSON value is valid before inserting it into the database.
The second and most important reason is that the JSON data type is optimized for storing JSON data. It checks that each JSON value is valid and stores each element in an optimized binary format that allows quick read access to JSON documents.
The third reason is that the JSON data type allows you to create indexes using JSON values. This is a massive advantage if you need to query JSON data.
The fourth reason is that the JSON data type allows you to use JSON functions to manipulate JSON data.
Strictly speaking, it is possible to use JSON functions on TEXT data. But it is not recommended. Let's compare the speed of JSON functions on TEXT and JSON data types and use examples from a previous article. But instead of comparing old JSON columns and current JSON columns, we compare JSON and TEXT.
Let's start with TEXT.
CREATE TABLE t_with_text
(
id INT PRIMARY KEY AUTO_INCREMENT,
json_col LONGTEXT,
name VARCHAR(100) AS (json_col ->> '$.name'),
age INT AS (json_col -> '$.age')
);
INSERT INTO t_with_text(json_col)
VALUES (JSON_OBJECT('name', 'Joe', 'age', 24,
'data', REPEAT('x', 10 * 1000 * 1000))),
(JSON_OBJECT('name', 'Sue', 'age', 32,
'data', REPEAT('y', 10 * 1000 * 1000))),
(JSON_OBJECT('name', 'Pete', 'age', 40,
'data', REPEAT('z', 10 * 1000 * 1000))),
(JSON_OBJECT('name', 'Jenny', 'age', 27,
'data', REPEAT('w', 10 * 1000 * 1000)));
INSERT INTO t_with_text(json_col)
SELECT json_col
FROM t_with_text;
INSERT INTO t_with_text(json_col)
SELECT json_col
FROM t_with_text;
UPDATE t_with_text
SET json_col = JSON_SET(json_col, '$.age', age + 1); -- 16 rows affected in 4 s 418 ms
And now JSON.
CREATE TABLE t_with_json
(
id INT PRIMARY KEY AUTO_INCREMENT,
json_col LONGTEXT,
name VARCHAR(100) AS (json_col ->> '$.name'),
age INT AS (json_col -> '$.age')
);
INSERT INTO t_with_json(json_col)
VALUES (JSON_OBJECT('name', 'Joe', 'age', 24,
'data', REPEAT('x', 10 * 1000 * 1000))),
(JSON_OBJECT('name', 'Sue', 'age', 32,
'data', REPEAT('y', 10 * 1000 * 1000))),
(JSON_OBJECT('name', 'Pete', 'age', 40,
'data', REPEAT('z', 10 * 1000 * 1000))),
(JSON_OBJECT('name', 'Jenny', 'age', 27,
'data', REPEAT('w', 10 * 1000 * 1000)));
INSERT INTO t_with_json(json_col)
SELECT json_col
FROM t_with_json;
INSERT INTO t_with_json(json_col)
SELECT json_col
FROM t_with_json;
UPDATE t_with_json
SET json_col = JSON_SET(json_col, '$.age', age + 1); -- 16 rows affected in 1 s 59 ms
As you can see, the JSON column performs much faster than TEXT.
In order to migrate from TEXT to JSON, you need to create a new column with JSON data type, copy data from TEXT column to JSON column, and drop TEXT column.
ALTER TABLE `users`
ADD COLUMN `data_json` JSON AFTER `data`;
UPDATE `users`
SET `data_json` = `data`;
ALTER TABLE `users`
DROP COLUMN `data`;
It looks pretty simple, but there are few things to consider.
First - TEXT column can contain invalid JSON data. In this case, you need to fix invalid JSON data before migrating to JSON data type.
In order to select invalid JSON data, you can use the following query:
SELECT *
FROM `users`
WHERE JSON_VALID(`data`) = 0;
This query will return all rows where the data
column contains invalid JSON data.
Second - the amount of data stored in JSON columns might be huge. In this case, you need to use a tool like pt-online-schema-change
to avoid locking the table.
After migrating to JSON data type, it is possible to check your data against JSON schema. In order to do this, you need to create a JSON schema and use JSON_SCHEMA_VALID
function.
SET @schema = '{
"type": "object",
"required": [
"firstName",
"lastName",
"age"
],
"properties": {
"age": {
"type": "integer",
"minimum": 0
},
"firstName": {
"type": "string",
"minLength": 1,
"maxLength": 120
},
"lastName": {
"type": "string",
"minLength": 1,
"maxLength": 120
}
}
}
';
SELECT *
FROM `users`
WHERE JSON_SCHEMA_VALID(@schema, `data_json`) = 0;
JSON_SCHEMA_VALID
works only on valid JSON documents. If you have invalid JSON documents, you need to fix them first.
MySQL allows you to create indexes on JSON data. In order to create an index on JSON data, you need to specify the path to the JSON value.
CREATE INDEX `idx_users_data_json_firstName`
ON `users` ((CAST(`data_json` ->> '$.firstName' AS CHAR(120))));
You can also create a virtual column and create an index on it.
ALTER TABLE `users`
ADD COLUMN `firstName` VARCHAR(120) AS (`data_json` ->> '$.firstName') VIRTUAL;
CREATE INDEX `idx_users_firstName`
ON `users` (`firstName`);
MySQL 8.0.17 introduced multi-valued indexes. Multi-valued indexes allow you to create indexes on JSON arrays.
CREATE INDEX `idx_users_data_json_tags`
ON `users` ((CAST(`data_json` ->> '$.tags' AS CHAR(255) array)));
I can't say multi-valued indexes work well right now. We stumbled upon a bug while trying to use them. During the investigation, we found that a few other bugs were reported with similar but different problems. Here is that bug:
CREATE TABLE explain_bug
(
project_id INT NULL,
json_data JSON NULL,
INDEX project_id_json_data (project_id, (CAST(json_data -> '$' AS CHAR(255) ARRAY)))
) COLLATE = utf8mb4_unicode_ci;
INSERT INTO explain_bug (project_id, json_data)
VALUES (42, '[]');
SELECT *
FROM explain_bug
WHERE project_id = 42; -- nothing is returned
SELECT *
FROM explain_bug IGNORE INDEX (project_id_json_data)
WHERE project_id = 42; -- expected result is returned
Here is a couple more related bugs: https://bugs.mysql.com/bug.php?id=108659 https://bugs.mysql.com/bug.php?id=109333
So multi-valued indexes are not production-ready, but it is a nice feature to have and I hope it will be fixed soon.
MySQL provides a lot of functions to work with JSON data. You can find the full list of JSON functions here. Let's check a few of them that I use most often.
JSON_EXTRACT
function allows you to extract a value from a JSON document. It takes at least two arguments: JSON document and a path to the value.
SELECT JSON_EXTRACT('{"id": 42, "name": "Joe"}', '$.name'); -- "Joe"
There is a shorthand for JSON_EXTRACT
function. You can use ->
operator instead of JSON_EXTRACT
function.
SELECT '{"id": 42, "name": "Joe"}' -> '$.name'; -- "Joe"
It is also possible to extract multiple values from a JSON document.
SELECT JSON_EXTRACT('{"id": 42, "name": "Joe"}', '$.name', '$.id'); -- ["Joe", 42]
JSON_SET
function allows you to set a value in a JSON document. It takes at least three arguments: JSON document, a path to the value, and a new value.
SELECT JSON_SET('{"id": 42, "name": "Joe"}', '$.name', 'Sue'); -- {"id": 42, "name": "Sue"}
It is possible to update multiple values in one call. For that you need to specify a path and a new value for each value you want to update.
SELECT JSON_SET('{"id": 42, "name": "Joe"}', '$.name', 'Sue', '$.id', 43); -- {"id": 43, "name": "Sue"}
JSON_REMOVE
function allows you to remove a value from a JSON document. It takes at least two arguments: JSON document and a path to the value.
SELECT JSON_REMOVE('{"id": 42, "name": "Joe"}', '$.name'); -- {"id": 42}
It is possible to remove multiple values in one call. For that, you need to specify a path for each value you want to remove.
SELECT JSON_REMOVE('{"id": 42, "name": "Joe"}', '$.name', '$.id'); -- {}
JSON_MERGE_PRESERVE
function allows you to merge multiple JSON documents into one. It takes at least two JSON documents as arguments:
SELECT JSON_MERGE_PRESERVE('{"id": 42, "name": "Joe"}',
'{"id": 43, "name": "Sue"}'); -- {"id": [42, 43], "name": ["Joe", "Sue"]}
JSON data type is a great way to store JSON data in MySQL. It provides validation, allows you to create indexes, and use JSON functions to manipulate JSON data. It is a great alternative to TEXT data type.