This series of articles has been a lot of fun for me as I have learned about and explored new technology. It’s also fun to see what has been catching on since I first discovered it. My last article on was incredibly popular, much to my surprise, but it seems I wasn’t the only one interested in what the heck it was. Thanks to that article, I ran across the open-source Apache 2.0 licensed project, , sponsored by . It’s a SQL database that accommodates both historical and streaming data. Written in C++ and powered by , the focus is on simplicity and performance. With a single executable, installation is simple. Apache Paimon Proton Timeplus ClickHouse A trend I’ve seen is that more and more real-time analytics applications are being built, but you don’t want to build them twice. Once for streaming and once for the historical backfill. There would be definite advantages to having a single platform that could query in either batch or streaming mode or even a hybrid mode where you are joining historical data to a stream of incoming data. It appears that Proton was built to do just that. Proton Overview In a nutshell, we have a ClickHouse database, and Timeplus has added support for streaming services. That should get you a Flink-like query engine and Kafka-like streaming storage with that ClickHouse database. So, what does that look like? The dotted line is where Proton comes in. I suggest reading through the docs to get a good sense of what is possible. architecture To create a random stream of data and query it with Proton, we can do something like this: -- Create a stream with random data.
CREATE RANDOM STREAM devices(device string default 'device'||to_string(rand()%4), temperature float default rand()%1000/10);

-- Run the long-running stream query.
SELECT device, count(*), min(temperature), max(temperature) FROM devices GROUP BY device;

┌─device──┬─count()─┬─min(temperature)─┬─max(temperature)─┐
│ device0 │    2256 │                0 │             99.6 │
│ device1 │    2260 │              0.1 │             99.7 │
│ device3 │    2259 │              0.3 │             99.9 │
│ device2 │    2225 │              0.2 │             99.8 │
└─────────┴─────────┴──────────────────┴──────────────────┘ Proton Features Proton has many nifty features; one that struck me immediately was the ability to create a materialized view to save specific events in Proton. Borrowing from the documentation, let’s say you have a Kafka stream reporting web events, and you want to save the broken link reports so you can query them later, even with Kafka down or the events removed. It would look something like this: create materialized view mv_broken_links as
select raw:requestedUrl as url,raw:method as method, raw:ipAddress as ip, 
       raw:response.statusCode as statusCode, domain(raw:headers.referrer)
as referrer
from frontend_events where raw:response.statusCode<>'200'; Then, if you want to directly query the materialized view and make a bar chart from the data, it would look like this: -- streaming query
select * from mv_broken_links;

-- historical query
select method, count() as cnt, bar(cnt,0,40,5) as bar from table(mv_broken_links) 
group by method order by cnt desc;

┌─method─┬─cnt─┬─bar─┐
│ GET    │  25 │ ███ │
│ DELETE │  20 │ ██▌ │
│ HEAD   │  17 │ ██  │
│ POST   │  17 │ ██  │
│ PUT    │  17 │ ██  │
│ PATCH  │  17 │ ██  │
└────────┴─────┴─────┘ Some of this functionality reminds me of , a company I worked at a few years ago. Upsolver Drivers for other languages are available for Java, Go, and Python. Using Proton with something like would be a minimal footprint for streaming historical data. Redpanda There are a lot of other features available, but this isn’t meant to be a tutorial. I want to do a light explanation and draw attention to some features. The docs are concise and, overall, well written, certainly better than many open-source projects. Summary While I don’t personally need this kind of arrangement at the moment, I’ve certainly been at places and seen companies where this would be very, very cool to have. As cool as this guy? Probably not, but then again, nothing is :). Frivolity aside, the Proton team has done an excellent job documenting the project and making it as simple to install and use as possible. I love these single-binary projects that don’t need a vast Java ecosystem with tons of dependencies. Make no mistake, though, Timeplus has a commercial version that gives you more capability than the stock Proton release. However, they seem to be very supportive of Proton and welcoming of the community. Check out my other What the Heck is… articles at the links below: What The Heck Is DuckDB? What the Heck Is Malloy? What the Heck is PRQL? What the Heck is GlareDB? What the Heck is SeaTunnel? What the Heck is LanceDB? What the heck is SDF? What the Heck is Paimon?

The code in this story is for educational purposes. The readers are solely responsible for whatever they build with it.

Walkthroughs, tutorials, guides, and tips. This story will teach you how to do something new or how to do something better.

What the Heck is Proton?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

Announcing the COBOL Streamhouse

The Noonification: If You Cant Beat Em, Buy Em! (10/18/2023)

The Noonification: The Conversational AI Arms Race Has Begun (2/9/2023)

3 Top Resources To Learn About Apache Kafka

47 Stories To Learn About Kafka

5 Things Every Apache Kafka Dev Needs To Know: A Performance and Architectural Deep Dive

Announcing the COBOL Streamhouse

The Noonification: If You Cant Beat Em, Buy Em! (10/18/2023)

The Noonification: The Conversational AI Arms Race Has Begun (2/9/2023)

3 Top Resources To Learn About Apache Kafka

47 Stories To Learn About Kafka

5 Things Every Apache Kafka Dev Needs To Know: A Performance and Architectural Deep Dive

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps