paint-brush
How to Set Up a Dedicated Database Server for Analyticsby@alejandroduarte
1,431 reads
1,431 reads

How to Set Up a Dedicated Database Server for Analytics

by Alejandro DuarteMay 18th, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

If you happen to have an old laptop or machine lying around and gathering dust, you might want to repurpose it. Does a fast SQL database for analytical processing (for example to generate ad-hoc reports) sound good? If so, keep reading! In this article, I’ll show you how to dust off an old laptop and repurpose it as a dedicated database server for analytics on top of Rocky Linux and Docker.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - How to Set Up a Dedicated Database Server for Analytics
Alejandro Duarte HackerNoon profile picture


If you happen to have an old laptop or machine lying around and gathering dust, you might want to repurpose it. Does a fast SQL database for analytical processing (for example to generate ad-hoc reports) sound good? If so, keep reading!


In this article, I’ll show you how to dust off an old laptop and repurpose it as a dedicated database server for analytics on top of Rocky Linux and Docker.


I used this Lenovo Thinkpad T440s with an Intel® Core™ i7 processor, 8 GB DDR3L-SDRAM, and 256 GB SSD:



Why Rocky Linux and Docker?

Rocky Linux is the continuation of CentOS, an open-source enterprise operating system compatible with Red Hat Enterprise Linux (RHEL).


It offers good security and administration features, and since it’s based on RHEL, it has a vast and active community around it.


Docker is virtualization software that runs on top of Linux systems without virtualizing hardware. This optimizes the available resources while maintaining the advantages of isolated environments.


You can set up Docker on your regular everyday machine if you prefer, but if you have an old laptop with the minimal hardware required to run Rocky Linux, I suggest taking advantage of it.

Before You Start

Before you start, make sure to back up any important files that you might have in your old machine.


Use online tools like Google Drive, DropBox, or external hard drives. Assign 1 or 2 hours exclusively to this task. Explore the file system and avoid unpleasant surprises later.

Installing Rocky Linux

I won’t go through the details on how to install Rocky Linux. Consult the official documentation if you want to see detailed instructions.


However, make sure to configure a Wi-Fi connection with DHCP (or Ethernet if you prefer) in the installation GUI.


You don’t need to create additional users, root is enough. Also, don’t install the Server with the GUI option. Instead, I recommend selecting Custom Operating System and adding the following software:


  • Headless Management
  • Security Tools
  • System tools



Follow the installation instructions to complete the process.

Configuring a Wi-Fi Connection With Static IP


If you configured a Wi-Fi connection during the installation, you will have to perform some additional steps.


  1. Connect the machine to your router using an Ethernet cable.


  2. Run encli connection and take note of the Ethernet connection name. In my case, it was enp0s25.


  3. Enable the connection by running nmcli connection up enp0s25 (use the name of your connection).


  4. Install the missing Wi-Fi utility by running yum install NetworkManager-wifi

  5. Disconnect the Ethernet cable.


  6. Restart the machine by running reboot.


  7. Once the machine restarts, login as root and check that the Wi-Fi connection is working by running ping google.com.


  8. Install the network management UI using dnf install NetworkManager-tui.


  9. Run nmtui and edit the Wi-Fi connection to use a manual static IP. I set the server address as 192.168.1.200/24 and added the gateway and DNS servers that I got from another (macOS) machine in the same network using ifconfig. Use ip addr on Linux or ipconfig on Windows. Here’s an example:




Preventing the Laptop From Suspending When the Lid is Closed

To avoid the laptop automatically going into a suspended mode when the lid is closed, edit the /etc/systemd/logind.conf file and set the following properties (you can use vi or install GNU nano using yum install nano):


[Login]
HandleLidSwitch=lock
LidSwitchIgnoreInhibited=no


To preserve the previous changes on reboot, run systemctl restart systemd-logind.service.


Double-check that everything works by rebooting your machine.

Connecting to the Server Using SSH

At this point, you can move to your day-to-day machine and connect to the new server using SSH:


ssh [email protected]


You can also assign a hostname on your client machine. Edit the /etc/hosts file (Unix-like machines) or the c:\Windows\System32\drivers\etc\hosts file (on Windows), and add the following line (use any name you want but specify the static IP address that you configured before):


192.168.1.200 pickanyname.local


You’ll have to do this on every machine that you use to connect to the server if you want to use the hostname instead of the IP address. After this, you’ll be able to connect to the server as follows:


ssh [email protected]

Installing Docker

After connecting to the server via SSH, install docker by running the following commands:


yum install -y yum-utils
 
yum-config-manager \
    --add-repo \
    https://download.docker.com/linux/centos/docker-ce.repo
 
yum install docker-ce \
    docker-ce-cli \
    containerd.io \
    docker-compose-plugin


To make the Docker service automatically start when the machine is restarted, run these commands:


systemctl enable docker.service
 
systemctl start docker.service

Installing the Database

With Docker ready, it’s easy to install software while maintaining your server tidy. The software runs in isolated environments called containers. You can create containers from custom or ready-to-use images.


An image is a template with instructions on how to create a container. You can start, restart, stop, and even delete containers when you don’t want them in your system anymore. All this without leaving garbage in your host operating system (Rocky Linux, in this article).


A database is a good candidate to run in an isolated environment. Execute the following command to start a container that runs a MariaDB server with the ColumnStore engine included:


docker run --detach \
    --name mariadb-cs \
    --restart unless-stopped \
    --env MARIADB_ROOT_PASSWORD='password' \
    --publish '3306:3306/tcp' \
    --expose '3306' \
    mariadb/columnstore


This downloads the MariaDB/columnstore image from Docker Hub (a public registry of Docker images maintained by Docker Inc.).


After downloading the image, Docker creates a new container with the name MariaDB-cs and proceeds to start the container, effectively starting the MariaDB database server on port 3306.


The command also exposes the port to the outside world so that you can connect to the database, for example, from your development machine.

Preparing a Demo Database for Analytics

What you just installed, allows you to use the MariaDB ColumnStore storage engine when you create a table. For example:


CREATE TABLE some_table(
    ... column definitions here ...
) ENGINE=ColumnStore;


This tells the MariaDB database to use the ColumnStore engine to store the data in a columnar fashion. This engine is designed for big data scaling and can process petabytes of data freeing you from defining and maintaining indexes. This makes it ideal for ad-hoc queries on very large data sets.


You can still use the default InnoDB engine suitable for transactional workloads when needed. Just omit the ENGINE option. You can join tables that use ColumnStore with tables that use InnoDB in a single SQL statement.


You can also use replication between a transactional node (InnoDB) and an analytical one (ColumnStore) to enable hybrid transactional-analytical processing (HTAP) workloads.


The best way to experiment with ColumnStore is by trying it out using an existing data set.


Check this Github repository that loads 5 million rows into a table that you can use to run analytical queries and see by yourself the performance of ColumnStore.