By Isaac Godfried | founder/data scientist
Hi everyone and welcome back to our series. A few weeks ago we published a story on how we developed a Kafka “backbone” in order to get USGS flow information. This week we follow up by showcasing how you can use that Kafka producer in conjunction with NodeJS and SocketIO to push real-time updates to your clients. In our specific case our updates will not be anywhere near real time due to the USGS API; we nonetheless decided to implement it in a real time “ready” fashion in case the USGS speeds up their API (unlikely) or to possibly port it to other applications and since it’s just a good experience to have.
To get started we created a simple NodeJS app with Express. We created a index.js that contained the core of our code and an index.html file. In order to use Kafka with Node you need to install a Kafka JavaScript library using NPM. We recommend that you use kafka-node as it seemed to work fairly well for us. For this you can use the command: npm install kafka-node
. Warning DO NOT simply do npm install node
as this will install a very old Kafka package that has not been updated in 3+ years which will throw errors when it tries to connect to your topic.
Once we had the library set up we created a very simple function to send a SocketIO message. We then initialized a Kafka consumer with kafka-node library and wrote a consumer.on(message) function to call the SocketIO function. We also added some simple code to save the message as well for persistence purposes. If this seems confusing don’t worry the full code is below and you’ll see how simple it is.
Code is self explanatory. Basically on a message from Kafka we call SocketIO which emits the message to the client. I also included some stuff just to save the message. In reality you would probably be saving the message to Redis, SQL or something else but this is just my impromptu archive method.
The next step for us was sending the messages to the client in the web browser. Luckily this is fairly trivial using SocketIO.
Now we wanted update our map using the JSON message from the USGS. For this we decided to use the very popular data visualization library D3js. Basically on our D3js map there are circles of rivers which display a different color based on whether the flow of the river is high or low (you can see http://rivermaps.herokuapp.com for an example). We wanted to take the information from the message being received from SocketIO and update our map in real time. To do this we created an update function to be called on the socket event. The update event would change the color of the circle (i.e. river) by selecting the relevant circle based on the latitude/longitude which corresponded to the JSON message . If this sounds confusing then don’t worry because like last time the code is actually much simpler.
Pretty simple all things considered! So in summary for creating a program like this you will need a Kafka Producer (in whatever language suits you best), a Kafka consumer in NodeJS which will call SocketIO, and an update method for your graph which SocketIO will call upon receiving a message.
Some final thoughts:
It seems that the inclusion of SocketIO is almost redundant. I wonder if you could send the message just using kafka-node?
Although this works, for the user to derive any benefit (in our example) they would have to stay on the page for more than an hour without refreshing. It would be much more interesting to see a case with actual real time data updating every second. If only the USGS would get their API running faster!
It would also be nice to tie an example of real time ML and stream processing with Flink at some as well. Who knows maybe an article on real time Twitter sentiment analysis and geographic mapping is in the future.
Anyways, that is all for today! Hope that was helpful to anyone who came here looking for information on getting your own real time system up and running.