Define: SQL [pron. “sequel”] – stands for Structured Query Language (SQL), used by databases to model and manage tabular/relational datasets; a set of standardized Data Definition Language (DDL) functions to create tables, views, and define relational schema models, and Data Manipulation Language (DML) to query, insert, and modify data in the tables.
*Read-only select queries are technically part of its Data Query Language (DQL) group. Still, operationally, many refer to it as DML because it can do more than read-only queries.
IBM researchers Donald Chamberlin and Raymond Boyce created SQL in the 1970s to bring to life Edgar Codd’s concept of relational data models. This new language for relational databases made accessing data possible without having to understand how or where the system physically stored it.
Members have spent years developing and proposing ISO standards to encapsulate the SQL specifications. For example, in 1987, the international standards body produced the ISO 9075:1987 specification. Another popular implementation was SQL 92, which refers to the ISO standard of 1992. Additions and modifications have continued as recently as 2016.
Other spin-offs include Chamberlin’s ongoing work with SQL++ for additional modern data applications and analysis.
All major database platforms have adopted SQL as a standard command interface for relational database systems. But it’s not just transactional databases that have an interest in SQL. Streaming platforms and applications have also used it to abstract different kinds of back-end services for developers.
Analytics and NoSQL platforms have also used SQL to make querying easier for end-users, reducing the need for custom coding. All programming languages also have some interface to SQL for developers to use.
A SQL database stores tables of information within a set of row and column definitions known as schemas. Tables group the row values that share common characteristics – e.g., one table of stores, another of sales transactions, etc.
With a SQL database, you can insert new data into these tables and then extract it later using a simple query. Tables can be joined together, grouped, summarized, allowing for on-the-fly computations using their values.
SQL databases focus on storing and persisting data to a disk for long-term access. Hence, a variety of other functions exist to manage tables, storage, and related services.
This focus on standardization has helped databases become a critical part of business today.
A SQL query tells the database how to retrieve or manage its data tables. Different queries exist for selecting rows, creating tables, and inserting or deleting rows.
For example, SELECT queries request specifically named columns from a table and receive back a set of rows:
SELECT COLUMNX FROM TABLE1;
You can also request rows where columns match specific values:
SELECT COLUMNX FROM TABLE1 WHERE COLUMNX = 100;
Join multiple tables virtually in a query when matching values from one table and mapped to another:
SELECT COLUMNA, COLUMNB FROM TABLE1, TABLE2 WHERE TABLE1.ID = TABLE2.STOREID;
Queries also create new tables by defining the data types of each column. This example creates three columns of integer, variable length string characters, and floating-point numbers:
CREATE TABLE SALES (STORENUM INT, NAME VARCHAR, DAILYSALES FLOAT);
Then add some new rows to the new table with an INSERT statement:
INSERT INTO SALES VALUES (1, 'HAPPY MART', 13570.00);
SQL commands can be much more advanced and complex but they are all based on the above basic examples.
SQL databases include enterprise relational databases such as Oracle, Microsoft SQL Server*, IBM DB2, MySQL, PostgreSQL, and many more. Many businesses use these databases as core parts of their solutions and they expect high reliability, high performance, and out-of-the-box security including user role management and encryption.
*Note that the Microsoft product name often confuses people searching for the general term of “SQL database,” so sometimes the terms are mistakenly used interchangeably. “MSSQL” is used to reference it more clearly.
Much has been written about these common databases but there many other systems that use SQL as their language of choice. A variety of these systems are discussed below.
Desktop and Embedded SQL Databases
We are so used to enterprise databases that it is easy to forget about how crucial some other database platforms are (or have been). Desktop databases are file-based and do not include a network layer that connects back to a central source. Users would zip up a file or folder and share it with others as a project or document file.
Early on, desktop developers and analysts turned to Microsoft Access and Dbase as the first SQL-compatible databases. Many data administrators still use Access for building internal forms-based data entry tools.
Similarly, many choose SQLite for embedded and offline scenarios where low overhead is required. It is designed to power standalone applications that may also be portable. To this day, SQLite is a popular choice for developers building mobile applications.
While not running from a centralized real-time service, synchronizing solutions do exist for this sort of technology. Sync solutions allow them to update enterprise data stores if needed, but another programming layer usually delivers it on top of the database itself.
Cloud SQL Databases
Public cloud platform vendors such as Google Cloud, Amazon Web Services, and Microsoft Azure all have hosted database-as-a-service (DBaaS) offerings that use SQL. The vendor focused on providing the backend system while the user focused on coding applications.
They may differ from on-prem versions of similar offerings but have built-in management consoles, monitoring, optimizations, and scalable distributed computing environments.
Cloud vendors may also provide scalable control through container orchestration (e.g., Kubernetes), while others automatically grow or shrink as needed by demand.
Hybrid Cloud Databases
Many enterprises want the flexibility of a SQL-based DBaaS but with the security of having an on-prem backup. Database vendors have taken advantage of the cloud platforms to provide flexible deployment options. Hybrid cloud SQL database deployments store some parts of a distributed database on different platforms or on-prem, spreading the load and reliability.
This topic area continues to develop as cloud database integration matures. Enterprises will always aim to maintain different levels of control depending on the need. Don’t expect this to change anytime soon.
Analytic SQL Databases
Some SQL-based databases are focused on doing more advanced, high volume, or high-performance analytics. These big data analytic databases include on-prem and cloud-based offerings. Examples include Snowflake, Amazon Redshift, and Teradata.
NoSQL Databases
What does NoSQL have to do with SQL? While initially geared toward easing web development with JSON document stores, some NoSQL platforms have integrated SQL as yet another way to index and access data.
NoSQL vendors such as Couchbase, MongoDB, Cassandra, and Redis each have their own approach to querying and interacting with their systems.
Some use third-party ODBC/JDBC drivers to provide a layer on top of their SQL querying system. Others, like Couchbase, build the capability into the system, allowing efficient indexing and flexible query analysis optimizations.
Using SQL on JSON data requires additional syntax beyond standard SQL. For example, additional functions can query objects within an array or look for sub-objects in a complex set of mappings.
For example, Couchbase developed N1QL to extend SQL to allow this kind of interaction. Couchbase also integrates Chamberlin’s SQL++ language in its analytics service.
Other enterprise databases have some functions for manipulating JSON, but they may not be storing and interacting with JSON directly in the database as NoSQL databases do.
It’s important to differentiate between database administrators (DBA) and developers – both have unique requirements for SQL. A DBA will always be proficient in SQL as many maintenance functions require it. However, a developer may only need to know just enough to get data in or out.
To that end, developers can use Object-Relational Mapping (ORM) or Object Data Modeller (ODM) libraries to abstract the programming from the underlying database and its SQL query language. Higher-level methods allow the developer to focus on writing familiar code, while the ORM/ODM libraries create the SQL and manage the tabular data. They also help manage and “migrate” the data model when code changes are required.
Examples include Laravel’s Eloquent ORM, .NET Entity Framework, SQLx, and packages dedicated primarily to handling table migrations, like golang-migrate. Ottoman is an Object Data Modeler (ODM) for Couchbase’s Node.js SDK providing JSON schema and validation for NoSQL (Intro to Ottoman post).
When an application grows into a production system, SQL query efficiency also becomes more critical. A SQL-centric power user or developer will push SQL to the max, identify bottlenecks in the code, and find ways to optimize the queries. They may also be forced to manually handle the SQL code generation instead of using an ORM or other automated system.
GraphQL is an alternative to ORM, addressing some of the inherent challenges of object-relational mapping. It provides a consistent data interaction language regardless of the type of data source. Additional packages such as Prisma provide a database connectivity layer.
In addition to Analytic Databases, many other platforms use SQL for data science applications, reporting, stream processing, and dashboarding.
Spark
Apache Spark’s distributed analytics environment, often used by data scientists, uses various languages, including Spark SQL for accessing distributed file datasets as Data Frames. In this case, you would use SQL to abstract a complex set of agents doing the background work.
The Spark environment is available through web-based platforms such as Jupyter, Zeppelin, and Databricks.
Business Intelligence Dashboards
Data and business analysts using graphical user interfaces to databases are familiar with connecting to backend databases. They do not have to use SQL directly. Instead, business analysts use wizards to choose and analyze relevant values.
The output is a chart, map, or table summary that the user can put into a report. The application itself will often read the tables into memory and transform them for the user’s purpose without exposing the underlying SQL.
These platforms become challenging when datasets get large. At that point, the system may pass a SQL query through to the database (also known as a pass-through query) to crunch any aggregations or computations before sending the data to the client. BI products in this category include Tableau, QlikView, SAP, and more.
Event Streaming SQL
Apache Kafka’s success has helped make it possible to move data around from many sources into a distributed “changelog” messaging system. However, it is not easy to build an application on top of Kafka that does real-time analytics as events pass by. Event streams have SQL-based options to extract value from real-time data.
ksqlDB addresses this by using SQL to create and manage the underlying streams. Users can then create standard queries against both stored data and real-time streams as they are received.
As you can see by the many different uses above, today’s database solutions use SQL pervasively. You can learn its basic patterns quickly and then apply them to many different analytic and operational scenarios. Here are some links to learn more about how Couchbase technology takes advantage of SQL’s flexibility and familiarity.
Previously published at https://blog.couchbase.com/navigating-the-sql-landscape-for-dbas-and-developers/