I follow some very bright people on Linkedin and Twitter (about the only reason I’m on Twitter), and I learn about interesting tech and interesting use cases by random chance.
The latest is
“Malloy is an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database. Malloy currently connects to BigQuery and Postgres, and natively supports DuckDB. We've built a Visual Studio Code extension to facilitate building Malloy data models, querying and transforming data, and creating simple visualizations and dashboards.”
Malloy was developed by
"feels like JSON made a baby with SQL, in the worse possible way"
To be clear, I know about Looker, but I haven’t done any real work with it. So that brings us to Malloy.
Lloyd describes Malloy on the
“an experimental language for describing data relationships and transformations. It is both a semantic modeling language and a querying language that runs queries against a relational database.” also stating: “SQL is complete but ugly…Everything is expressible, but nothing is reusable; simple ideas are complex to express; the language is verbose and lacks smart defaults. Malloy is immediately understandable by SQL users, and far easier to use and learn.”
Malloy natively supports
First, a couple of examples taken from the docs so you can see what a Malloy description looks like and the SQL it generates:
query: table('malloy-data.faa.airports') -> {
group_by: fac_type
aggregate: airport_count is count()
where: state = 'CA'
order_by: airport_count desc
}
SELECT
base.fac_type as fac_type,
COUNT( 1) as airport_count
FROM `malloy-data.faa.airports` as base
WHERE base.state='CA'
GROUP BY 1
ORDER BY 2 desc
This will translate to this SQL:
And from the dataset, it will produce this result:
Note that you can display this as HTML, JSON, or SQL. The basic structure of a Malloy Query takes the form of:
query: <source> {
join_one: <source> with …
join_many: <source> on …
} -> {
group_by:
<field/dimension>
<field/dimension>
aggregate:
<aggregation/measure>
<aggregation/measure>
nest:
<named_query OR query_def>
<named_query OR query_def>
where: <filter_expression>, <filter_expression>, …
having: <aggregate_filter_expression>, <aggregate_filter_expression>
order_by: <field/dimension>, <aggregation/measure>, …
limit: <limit>
}
This maps to the following SQL query structure:
SELECT
<group_by>, <group_by>, …
<aggregate>, <aggregate>, …
<nest>, <nest>, … -- very much a simplification; read more in Nesting Queries doc.
FROM <source>
LEFT JOIN <source> ON …
LEFT JOIN <source> ON …
WHERE (<filter_expression>) AND (<filter_expression>) AND …
GROUP BY <group_by>, <group_by>, …
HAVING <aggregate_filter_expression> AND <aggregate_filter_expression> AND …
ORDER BY <group_by> | <aggregate>
LIMIT <limit>
If you don’t know Mimoune Djouallah, you should definitely follow him on
So, I’m going to summarize what Mimoune has going on here on his
In the left-hand pane, we have the schema information, which is pretty obvious. Then to the right of that on top is our Malloy Query, in which you can see the nested aggregations easily described in just a few lines.
Below that is the Semantic Model, and to the right is the generated bar chart that was created from the query. You can try out various saved queries from the “Queries” on the top row. Here is what is really wild, however, look at the below screenshot:
That is the generated SQL from the Malloy Query, which was only 6 actual lines of code; it replaced 40 lines of SQL. You can see the embedded DuckDB connection that is accessing the various Parquet files that have the data for the query in the Semantic Model.
It’s totally worth clicking into the Fiddle and playing around a bit. There is another really fun Fiddle from Lloyd that connects to the IMDB dataset that is absolutely worth checking out
What I really like here is how you can create nested data sets that Mallow will run together in a single query; you can aggregate pretty much anything into a single query and simplify your SQL.
The main purpose of this article is to bring Malloy and some ideas to your mind; it’s not meant to be an exhaustive tutorial by any means. Yes, it’s another language, but it’s similar enough that it should be easy to pick up on.
What really excited me about this whole affair was doing full analytics and visualizations on a ton of data in a browser, with no servers running, with DuckDB and Malloy. I’ve seen a lot of use cases where analysts would love to have this speed and ease of use.
I hope I got you thinking a little bit and you’ll check it all out, but keep in mind that Malloy isn’t an actual product at this time and no one is offering support.