This article shows how to apply Node.js Stream and a bit of Reactive programming to a real(tm) problem. The article is intended to be highly practical and oriented for an intermediate reader. I intentionally omit some basic explanations. If you miss something try to check the API documentation or its retelling(e.g.: this one)
So, lets start from the problem description. We need to implement a simple web scraper which grabs all the data from some REST API, process the data somehow and inserts into our Database. For simplicity, I omit the details about the actual database and REST API(in real life it was the API of some travel fare aggregator website and a Pg database)
Consider we have two functions(code of the IO simulator functions and the other article code is here):
For this article we will ignore the errors which can occur. Maybe I will describe it in next article.
Also there are some requirements about the data processing. Let’s say that we need to remove all the items which id contains number 3. And we need to extend with current timestamp all the items which id contains number 9 (e.g.: {id: 9} -> {id: 9, timestamp: 1490571732068}
). These requirements are silly but similar to real ones which appear in real web scrapers
Time to get hands dirty! Some naive implementation can look like this:
What are the problems with such code?
So, you may already guess that there is a better way to solve this problem with Node Streams. Let’s split out problem for three: input, output and processing.
So, our Readable Stream can look like this:
Looks a little bit heavy, but it’s a standard interface. As a chunk of data we will use a list of thousand items. The important thing here is objectMode: true
– we want to operate with objects instead of binary data.
Now the output part. We need to implement the Writable Stream. Something like this:
Some important bits here:
objectMode
— we want to operate with objectshighWaterMark
— the size of our buffer… in objects. You should be careful with it because there is no direct conection between quantity and real size in bytes. In our case one object is a list of some size_writev
— ‘explains’ how should it write more than one chunk from a buffer at one time.And now we can connect them:
https://gist.github.com/9e3174a05adb55a5fbb750b5e972bc0e
From my perspective the code doesn’t require any comments. We show the data flow and hide the details of implementation until you ask. Also, it’s definitely effective:
And for the dessert we will implement data processing. We can manually implement a Transform stream, but there is no fun with it. We will use a library called Highland.js which gives us an ability to use well known functional programming primitives(.filter, .map, etc.) with our streams. Actually, Highland.js is much bigger than just .map and .filter but I don’t want to mute the scope of the article. So, transformation can look like this:
Much the same as with simple JS lists. We need .flatten()
and .batchWithTimeOrCount(100, 1000)
because our streams operate on list of items instead of individual items.
So, that’s all. I hope this article gives you some motivation to learn Node Stream and Highland.js
Hacker Noon is how hackers start their afternoons. We’re a part of the @AMIfamily. We are now accepting submissions and happy to discuss advertising & sponsorship opportunities.
To learn more, read our about page, like/message us on Facebook, or simply, tweet/DM @HackerNoon.
If you enjoyed this story, we recommend reading our latest tech stories and trending tech stories. Until next time, don’t take the realities of the world for granted!