In March 1971, Intel introduced the world's first general microprocessor, the . It had ~2,300 transistors and cost $60. Intel 4004 Fast forward almost 50 years, and the newest iPhone has nearly transistors (but unfortunately costs a little more than $60). 12 billion Many of the programming languages we use today were not introduced until the 90s (Java was introduced in 1996). However, there is one programming language that is still as popular today as it was when it was introduced nearly 50 years ago: SQL. This article will discuss the events that led to the introduction of relational databases, why SQL grew in popularity, and what we can learn from its success. History of Early Database Management - IDS and CODASYL Image Source In 1962, Charles W. Bachman (no relation to Erlich Bachman) was working as part of a small team at General Electric. One year later, Bachman’s team introduced what would later be recognized as the first database management system - the Integrated Data Store (IDS). 10 years later, Bachman would receive the coveted , often called the Nobel Prize of computer science, for his contributions to computing with IDS. Turing Award What was IDS? In the early 1960s, computer science was just beginning as an academic field. For context, The American Standard Code for Information Interchange (ASCII) was not introduced until 1963. To understand IDS we have to first understand the two main forces that led to its development: The introduction of disk storage A migration to high-level programming languages Disk storage Above: Moving a RAMAC 305 In 1956, IBM introduced the first commercial hard disk drive, the . RAMAC 305 The introduction of disk drives allowed programmers to retrieve and update data by jumping directly to a location on the disk. This was a vast improvement from its predecessor, tape drives, which required moving sequentially through the tape to retrieve a specific piece of data. But developers now had to figure out where records were stored on the disk. Due to the limitations of in early operating systems, this was an advanced task reserved only for experienced programmers. file management systems Developers needed a solution to simplify working with disk drives. High-level programming At the same time, computer science was beginning to move from innovators to early adopters on the . Low-level programming languages like Assembly were popular among the early academics, but regular programmers were switching to higher-level programming languages like COBOL for their usability. adoption curve IDS solved for disk storage and high-level programming. IDS allowed developers to use high-level programming languages, like COBOL, to build applications that input and retrieve data from disk storage. Because of this function, IDS has received the distinction as the first database management system. So what was IDS? CODASYL - A new standard for database management In 1969, the Committee of Data Systems Languages (CODASYL) released a report proposing a standard for database management. Bachman was part of the committee, and the report drew heavily from IDS. The CODASYL Data Model introduced many of the core features in database management systems that we use today: SchemasData definition language (DDL) Data manipulation language (DML) Most importantly, IDS and CODASYL introduced a new way for modeling data that influenced the eventual development of SQL - the . network data model The Network Data Model - Better Than Todays Relational Model? Above: Network model example A data model is a standardized way to describe (model) the world (data). The previous hierarchical data model used tree structures to describe data, but these were limited to one-to-many relationships. The new network model allows records to have multiple parents, which created a . By allowing multiple parents, network models were able to model many-to-one and many-to-many relationships. graph structure In the network model, relationships between tables were stored in . Each set had an (i.e. teachers) and one or more (i.e. classes and students). sets owner members One of the key benefits of the network model is that related records in a set were connected directly by pointers. Sets were implemented using next, prior, and owner pointers, which allowed for easy traversal similar to a linked list. The low-level nature of network data models offered performance benefits, but they came at a cost. The network data model had increased storage costs because every record had to store extra pointers to its previous and parent record. The Arrival of The Relational Model Above: Example of a relational model In 1970, 8 years afters IDS, Edgar F. Codd introduced the relational model in his seminal paper (also earning him a spot with Bachman as a Turing Award recipient). “A Relational Model of Data for Large Shared Data Banks” Codd showed that all data in a database could be represented in terms of ( in SQL) grouped together into ( in SQL). To describe the database queries, he invented a form of first-order predicate logic called . tuples rows relations tables tuple relational calculus Tuple relational calculus introduced a for querying data. Declarative programming languages allow programmers to say they want to do without describing to do it. declarative language what how This new declarative language was much easier to use for developers. The relational model laid out all the data in the open. Developers could retrieve all the data from a table, or read a single row, in one command (thanks ). query optimizer Gone were the days of following a labyrinth of pointers to find your data. Relational vs Network Data Models Relational databases decreased the high storage costs that network databases had by . Normalization was a process of decomposing tables to eliminate redundancy, and thereby decrease the footprint of data on disk. normalizing data However, relational databases had an increased CPU cost. In order to process normalized data, relational databases had to load tables in memory and use compute power to “join” tables back together. Let’s walk through the process of finding all classes and students for a given teacher with a relational model. The database system would first perform an operation to retrieve all relevant classes. Then it would perform a second operation to retrieve student data. All of this data would be stored in memory, and it would run a third operation to merge the data before returning the result. Performance comparison between relational and network models In a performance case study using real data, found that network database models had 23x better insert performance and 123x faster query performance. Raima So why are relational databases the leading database solution? Usability The relational model was more flexible to changes, and its declarative syntax simplified the job for programmers. And Moore’s law was working its magic in the background. The cost of computing continued to decrease, and eventually, the increased computing cost of relational models was outweighed by the productivity gains. Fast forward 50 years, and now the most expensive resource in a data center is the CPU. The Rise and Reign of SQL And finally, we've come to the arrival of the SQL that we all love. 4 years after the publication of Codd’s paper, Donald Chamberlin and Raymond Boyce published “ ”. SEQUEL: A Structured English Query Language They described SEQUEL as “a set of simple operations on tabular structures, […] of equivalent power to the first-order predicate calculus”. IBM saw the potential and moved quickly to develop the first version of SEQUEL as part of their System R project in the early 1970s. The name would later change to SQL due to with the UK-based Hawker Siddeley aircraft company. trademark issues The next big step in SQL's adoption was almost a decade later. In 1986, the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) published the first official SQL standard: . The standard broke down : SQL-86 SQL into several parts commands to define and modify schemas and relations Data Definition Language (DDL): commands to query, insert and delete information from a database Data Manipulation Language (DML): commands specifying the timing of transactions Transaction Control: commands to set constraints in the information in a database Integrity: commands to define views Views: commands to specify user access Authorization: commands that specify how to embed SQL in other languagesCompetitors to SQL Embedded SQL: Between 1974 to today there have been numerous competitors that have tried stealing market share from SQL’s dominance as a query language. These new syntaxes often catered to a specific, new technology: Lisp -> .NET -> Ruby on Rails -> CLSQL LINQ ActiveRecord Fast forward 35 years and SQL is still ubiquitous with databases. How has SQL maintained its reign as a query language, and what can we learn from its story? The Secret to SQLs 50 Year Reign - And What We Can Learn Above: Stack Overflow Developer Survey, 2017 We started this story with Bachman’s introduction of the first database management system, IDS. We talked about how the shift to disk storage and high-level programming necessitated a new way of working with data. Then CODASYL came and standardized database management. IDS and CODASYL introduced the new network data model, and finally, Codd dropped the relational model. This happened over . eight years How did SQL manage to stick around for the next 50 years? I think these are the four main reasons: Built on first principles Bushnell’s law Listening and adapting Adoption of APIs Built on first principles A first principle is a foundational proposition that can not be deduced from any other proposition or assumption. For instance, combining hydrocarbons with oxygen to create a chemical reaction. And this is still the principle that powers the internal combustion engines in every car. In 1970, Codd created a new first principle for databases: tuple relational calculus. This new logic created led to the relational model, which then led to SQL. Tuple relational calculus is the chemical reaction, relational models are the internal combustion engines, and SQL is the car. In a later post, we’ll discuss new technologies that are trying to create the electric engine for databases. Bushnell’s Law Building on first principles alone can not guarantee success. Assembly is as close as programmers can get to typing 1s and 0s, but it was still replaced with COBOL (and later C). The missing ingredient was usability. We saw the same story play out with the switch from the network to a relational model. The network model had faster performance, but every company today uses relational databases because of its simplicity ( ). to get started "The best games are easy to learn but hard to master" - Nolan Bushnell, founder of Atari Nolan Bushnell knew the secret to getting people to use a new product. Assembly unfortunately was both difficult to learn and difficult to master. SQL found the perfect balance. With ~10 SQL commands, anyone can learn the 20% that will get you 80% of the way there.  But there is a long path of indexing, views, and optimization to becoming a master. Listening and Adapting Query languages are not timeless monoliths but adaptive groups of standards that change over time. The SQL standard has continued to adapt over time and incorporate feedback from its users. Since the original conception, we have seen 10 different SQL standards all with important updates. Here were 3 big ones: Added regular expression matching, recursive queries (e.g. transitive closure), triggers, support for procedural and control-of-flow statements, non-scalar types (arrays), and some object-oriented features (e.g. structured types). Support for embedding SQL in Java (SQL/OLB) and vice versa (SQL/JRT). SQL:1999: Introduced XML-related features (SQL/XML), window functions, standardized sequences, and columns with auto-generated values (including identity-columns). SQL:2003: Adds row pattern matching, polymorphic table functions, JSON. SQL:2016: SQL also demonstrates the power of creating rails that other products can build on. Instead of enforcing a syntax, SQL provides a standard for each database to create its own implementation (T-SQL, MySQL, PSQL, etc). Adoption of APIs The final secret behind SQL’s success has been the rise of application programming interfaces (API). APIs simplify programming by abstracting the underlying implementation and only exposing objects or actions the developer needs. APIs allow SQL to continue to adapt to new technologies with specialized syntax. In 2006 Hadoop introduced the distributed file system (HDFS), which was initially inaccessible to the SQL syntax. Then in 2013 Apache created Apache Impala, which allowed developers to use SQL to query HDFS databases. The Fascinating Story of SQL SQL is one of the most ubiquitous programming languages today, but we often forget how long of a history it has had. Its journey started at the dawn of modern computing and was brought to life by 2 Turing award recipients. I've shared my thoughts on why SQL has been able to maintain its dominance: first principles, Bushnell’s law, adapting, and APIs. Leave a comment and let me know what you think were the major factors that contributed to SQL’s success. There is one more technology to talk about that hasn’t changed in 50 years. . SQL editors Working with databases is becoming an increasingly collaborative process as more people learn SQL. Developers today may be seen working with individuals on a marketing team to analyze user data or debugging queries with a data scientist. Arctype has built a collaborative SQL editor that allows you to easily share databases, queries, and dashboards with anyone. Join our growing community and . try out Arctype today Previously published at: https://blog.arctype.com/sql-50-years/ .

Flow

Apache

Assembly

General Electric

SQL's 50 Year Reign: Here's Why SQL Is Still Relevant Today

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

10 Ways to Optimize Your Database

10 Ways to Reduce Data Loss and Potential Downtime Of Your Database

10 Principles of Proper Database Benchmarking

10 Minute Guide to Fixing Damaged SQL Databases - No Recovery Required!

10 Best Online Courses to Learn Oracle and PL/SQL for Beginners

121 Stories To Learn About Databases

10 Ways to Optimize Your Database

10 Ways to Reduce Data Loss and Potential Downtime Of Your Database

10 Principles of Proper Database Benchmarking

10 Minute Guide to Fixing Damaged SQL Databases - No Recovery Required!

10 Best Online Courses to Learn Oracle and PL/SQL for Beginners

121 Stories To Learn About Databases

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps