What Is the Difference Between Enum, Varchar and Int Data Types in MySQL

Written by akankov | Published 2022/08/12
Tech Story Tags: mysql | enum | varchar | database | sql-database | database-design | software-development | software-engineering

TLDRThe 'varchar' and 'enum' fields are different types of data types. They are different in terms of performance, easiness to use, and support for feature changes. An integer is not human-readable, and you have to keep the mapping between numbers and words in your code. The correct order of options in the fields should match your expected sorting behavior. The same story with enums takes the same byte consumption as with an intintintint, and it might bring some problems to beginners.via the TL;DR App

Quite often in my practice, I see people use varchar fields with a small and limited number of available options.

Let's check out today what field data type is better and will be the best approach for your project.

Let's imagine us designing a database for a CMS. Just pretend we don't have any of them available at every corner, and we need to create the best example.

To begin with our database, we will start with the main table post. I'm now omitting the author field because this is our blog, and we don't have any plans to add additional authors except for us ;)

CREATE TABLE post (
    id         INT          NOT NULL AUTO_INCREMENT,
    title      VARCHAR(255) NOT NULL,
    status     ENUM ('archived', 'deleted', 'draft', 'published') DEFAULT 'draft',
    content    TEXT         NOT NULL,
    created_at DATETIME     NOT NULL                              DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME     NOT NULL                              DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id),
    KEY (status)
)
    ENGINE = InnoDB;

I can imagine three ways how to describe the status field.

The first one is in the example above.

status VARCHAR(255) NOT NULL default 'draft'

The next one will be enum:

status ENUM ('archived', 'deleted', 'draft', 'published') DEFAULT 'draft',

And the last one is an integer:

status TINYINT(1) UNSIGNED NOT NULL DEFAULT '0'

Let us compare these three variants in terms of performance, easiness to use, and support for feature changes. Setting initial scores for each option.

varchar_score = 0
enum_score = 0
integer_score = 0

Usability for developers/humans

Varchar and enum approaches look pretty much the same for a human eye. On the other hand, the integer is not human-readable. Then you see status=1, you have no way to say what status is just by looking into the database, and you have to keep the mapping between numbers and words in your code.

Updating the scores:

varchar_score += 1
enum_score +=1

Sorting and querying.

Internally, MySQL uses numbers and null to store values of enums.

Internal value

DisplayedValue

null

null

0

''

1

'archived'

2

'deleted'

3

'draft'

4

'published'

And now imagine your enum like that:

enum('draft', 'published', 'deleted', 'archived')

Ordering of the data will work based on internal representation - first will go values with draft next - deleted, draft, and published. This behavior is confusing the first time until you realize the reason for that. The correct order of options in the enum should match your expected sorting behavior.

For alphabetic sorting, it should be like this:

ENUM ('archived', 'deleted', 'draft', 'published')

Considering that, it looks like enums might bring some problems to beginners. But you simply have to keep in mind that enum is just an integer under the hood.

Space consumption

Tinyint takes 1 byte. The same story with enum - it also takes 1 byte of storage. Technically, both of them might be extended to 2 bytes, but I cannot imagine a real-life case where you will need more than 255 options.

For varchar, it's a bit more complicated. It will depend on the character set. For example, latin1 will take one byte per character, and utf8mb4 can require up to four bytes per character. Plus one byte to record the length of the string.

The set of statuses archived, deleted, draft, and published will take from 6 to 10 bytes in the latin1 character set.

The final amount of storage will depend on the distribution of the values, but anyway, varchar will take much more space.

Also, you should consider not only the storage size for the data but the storage size for the indexes. And indexes for varchar will take more space as well.

enum_score += 1
integer_score +=1

Database migrations

This case is pretty simple. For varchar and integers, you don't need to change the schema to add a new possible option. On the other hand, adding a new option to an enum will require changes to the schema. And depending on how you are going to do it - it might lead to database downtime, which is not acceptable.

varchar_score += 1
integer_score +=1

Query performance

To make some experiments let's spin up the MySQL database inside the Docker container and compare the results.

docker run --name blog_mysql -e MYSQL_ROOT_PASSWORD=secret -e MYSQL_DATABASE=blog -d mysql:8
docker exec -it blog_mysql mysql -u root -psecret blog

Creating three very similar tables. The only difference will be in the status field.

CREATE TABLE post_enum (
    id         INT          NOT NULL AUTO_INCREMENT,
    title      VARCHAR(255) NOT NULL,
    status     ENUM ('archived', 'deleted', 'draft', 'published') DEFAULT 'draft',
    content    TEXT         NOT NULL,
    created_at DATETIME     NOT NULL                              DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME     NOT NULL                              DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id),
    KEY (status)
)
    ENGINE = InnoDB
    COLLATE utf8_bin;
    
    
CREATE TABLE post_varchar (
    id         INT          NOT NULL AUTO_INCREMENT,
    title      VARCHAR(255) NOT NULL,
    status     VARCHAR(10)           DEFAULT 'draft',
    content    TEXT         NOT NULL,
    created_at DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id),
    KEY (status)
)
    ENGINE = InnoDB
    COLLATE utf8_bin;

CREATE TABLE post_int (
    id         INT          NOT NULL AUTO_INCREMENT,
    title      VARCHAR(255) NOT NULL,
    status     TINYINT(1)   NOT NULL DEFAULT 0,
    content    TEXT         NOT NULL,
    created_at DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME     NOT NULL DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (id),
    KEY (status)
)
    ENGINE = InnoDB
    COLLATE utf8_bin;

SHOW TABLES;
+----------------+
| Tables_in_blog |
+----------------+
| post_enum      |
| post_int       |
| post_varchar   |
+----------------+

Let's have some fun, shall we? Not many people are doing MySQL procedures, but let's try some.

DROP PROCEDURE IF EXISTS add_post_varchar;
DROP PROCEDURE IF EXISTS add_post_enum;
DROP PROCEDURE IF EXISTS add_post_int;
DELIMITER $$

CREATE PROCEDURE add_post_varchar(number_of_posts INT)
BEGIN
    DECLARE loop_counter INT UNSIGNED DEFAULT 0;
    DECLARE post_content TEXT DEFAULT 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.';
    DECLARE post_title VARCHAR(255) DEFAULT 'Lorem Ipsum';

    WHILE loop_counter < number_of_posts
        DO
            INSERT
            INTO
                post_varchar (title, status, content)
            SELECT
                post_title,
                ELT(0.5 + RAND() * 4, 'archived', 'deleted', 'draft', 'published'),
                post_content;

            SET loop_counter = loop_counter + 1;
        END WHILE;
END $$

CREATE PROCEDURE add_post_enum(number_of_posts INT)
BEGIN
    DECLARE loop_counter INT UNSIGNED DEFAULT 0;
    DECLARE post_content TEXT DEFAULT 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.';
    DECLARE post_title VARCHAR(255) DEFAULT 'Lorem Ipsum';

    WHILE loop_counter < number_of_posts
        DO
            INSERT
            INTO
                post_enum (title, status, content)
            SELECT
                post_title,
                ELT(0.5 + RAND() * 4, 'archived', 'deleted', 'draft', 'published'),
                post_content;

            SET loop_counter = loop_counter + 1;
        END WHILE;
END $$
CREATE PROCEDURE add_post_int(number_of_posts INT)
BEGIN
    DECLARE loop_counter INT UNSIGNED DEFAULT 0;
    DECLARE post_content TEXT DEFAULT 'Lorem Ipsum is simply dummy text of the printing and typesetting industry.';
    DECLARE post_title VARCHAR(255) DEFAULT 'Lorem Ipsum';

    WHILE loop_counter < number_of_posts
        DO
            INSERT
            INTO
                post_int (title, status, content)
            SELECT
                post_title,
                ELT(0.5 + RAND() * 4, '1', '2', '3', '4'),
                post_content;

            SET loop_counter = loop_counter + 1;
        END WHILE;
END $$

DELIMITER ;

And create 200 000 records in each table.

call add_post_int(200000);
call add_post_enum(200000);
call add_post_varchar(200000);
SELECT COUNT(*), status FROM post_enum GROUP BY status
UNION
SELECT COUNT(*), status FROM post_int GROUP BY status
UNION
SELECT count(*), status FROM post_varchar GROUP BY status;
+----------+-----------+
| COUNT(*) | status    |
+----------+-----------+
|    50058 | archived  |
|    50229 | deleted   |
|    50273 | draft     |
|    49440 | published |
|    50123 | 1         |
|    49848 | 2         |
|    49642 | 3         |
|    50387 | 4         |
|    49885 | archived  |
|    49974 | deleted   |
|    50060 | draft     |
|    50081 | published |
+----------+-----------+

Now we have almost even distribution.

Let's run some queries. First, enable profiling:

SET profiling=1;

Next run queries which will use only the index to obtain the data and will not go to the heap.

SELECT SQL_NO_CACHE COUNT(*), status FROM post_enum GROUP BY status;
SELECT SQL_NO_CACHE COUNT(*), status FROM post_varchar GROUP BY status;
SELECT SQL_NO_CACHE COUNT(*), status FROM post_int GROUP BY status;

And finally, let's check our performance.

Query_ID

Duration

Query

1

0.05163150

SELECT SQL_NO_CACHE COUNT(*), status FROM post_enum GROUP BY status

2

0.05172150

SELECT SQL_NO_CACHE COUNT(*), status FROM post_varchar GROUP BY status

3

0.05893025

SELECT SQL_NO_CACHE COUNT(*), status FROM post_int GROUP BY status

Every scenario works at the same speed. Updating the scores:

varchar_score += 1
enum_score += 1
integer_score += 1

Conclusion

Our final scores are:

Option

Score

integer_score

3

varchar_score

3

enum_score

3

Turns out all our scores are the same. I didn't expect it to be so in the beginning. Honestly, I expect enums to win this battle. But enums have some specific behavior in sorting, which is not a big deal if you know your tools, but it might confuse a beginner.

Enums have advantages in being human-friendly and taking less space. Considering all of that, I'd recommend using enums for options like this.

And may the fast queries be with you ;)


Written by akankov | Convert coffee into code )
Published by HackerNoon on 2022/08/12