Image courtesy of the blog amon.cx A couple of days ago, I , called . Basically, a real time display of the top stories. Well, this article proved to be really popular amongst the hacker community and I got plenty of messages and tweets about it, asking me to expand upon my project. posted about my fun side project TopHN Hacker News Well, I am more than happy to do this, so what I thought I would do was to publish two or three more blog posts which would delve a bit deeper into the code and how I built my project. The Components There are basically two main sections to my TopHN project, and indeed, these run on two separate virtual servers. One server is basically my server that I set up on Digital Ocean’s $10/month VPS running Ubuntu Linux. This server also runs a short (~100 lines) app which is purely used to read the data from the Hacker News Firebase feed and store a replicated copy in . RethinkDB Node.js RethinkDB The other server is the ‘presentation layer’, and also runs a small Node.js app which is built to serve up the front page of the site, and also to listen to the RethinkDB server in real time and push the changed information to the web browsers that are connected. Today, I want to talk about the first server, which hosts my RethinkDB database, and the ‘feeder’ Node.js app which grabs data from Hacker News. Installing RethinkDB I won’t go into the actual installation steps here too much, because there are already . Basically, once you have a Digital Ocean (or any other) Linux VPS set up, you simply need to get root console access and follow the step by step instructions on the RethinkDB site to get up and running. It is really easy, which is one of the reasons I went with this database for this project. great instructions on the RethinkDB site Once you have installed and started the RethinkDB server, you can access the control panel by going to http://<your VPS IP Address>:8081 Tip: I would secure this management portal behind a reverse proxy, and set and admin password so that random people on the internet who figure out your RethinkDB server can’t just log in and manipulate your data. Once again, excellent . I highly recommend that you do this before carrying on, but if time is short, you can continue with the instructions in this post and come back to this later. instructions for doing so are on their site Setting Up The Data Tables There are a couple of ways you can do this, via code, or manually do it in the management console. I am going to do it in the console because it is really a ‘once off’ exercise, and saves a few lines of code which may confuse people later. In the management console click on the ‘ ’ menu along the top, then click on ‘ ’. give the database a name. I will call it for Hacker News data, but you can call it anything you want (as long as you remember it for later in the code). Tables + Add Database hn_data Once you have created the empty database, click on ‘ ’ several times and create three tables, called: + Add Table — this will contain the actual feed of articles and comments. hn_feed — this will contain the latest lists of ‘top stories’, ‘best stories’, ‘ask HN’, ‘jobs’ etc. stories from Hacker News. hn_lists — this will contain the user profiles that are read from Hacker News. hn_users No need to populate these tables with data now — we can do this with pure code, in a future step. Creating A RethinkDB User Before we go much further, we need to create a unique user for RethinkDB which will have read/write access to this database we just created. This is the user we will be invoking from the Node.js app later to push the Hacker News data into our database. It is a good idea NOT to use the default admin user for this, but instead create a new user which ONLY has access to this database. That way, if the username is compromised, you can easily change their password to re-secure your feed again. To create a user, we can use a ReQL query from right within the RethinkDB console. Click on the ‘ ’ menu option along the top, and enter in the following ReQL command: Data Explorer r.db('rethinkdb').table('users').insert({id: 'hnfeeder', password: 'verysecretpassword'}) Don’t forget to hit ‘ ’ to execute this command after you type it in. And don’t forget to replace the and with your ones of your own choosing (and remember them for later). Run id password Next, we want to give this new user full read/write permissions into the table that we just created, with the following ReQL command: hn_data r.db('hn_data').grant('hnfeeder', {read: true, write: true, config: true}); Don’t forget to hit ‘ ’ again. Now your user has read, write and config rights in the database. Config rights basically means the ability to create new tables etc., so you can actually leave that as for now because we won’t be doing anything like that at the moment via the app. Run hnfeeder false That is basically it for the database console at this point. Installing Node.js Digital Ocean actually have some great instructions on in this article. Follow it step by step for best results, and then come back to this post. installing Node.js on Ubuntu The only thing we have to add here is the node modules that we need for this project. There are only two — (a) the RethinkDB module and (b) the Firebase module. First, change to the folder where you will be creating the actual application. I simply created mine in but for better security, you might want to create it in or similar. Lets stick with for now: /root /var/app /root cd /root npm install --save rethinkdb firebase Creating The Application Now we are ready to create the Node.js app itself. In the folder (or wherever you will be creating the app), create a file called , and using your favourite editor, type in (or copy and paste) the following code: /root feeder.js Lets go through this code and see what is happening. The first two lines are simply activating the node modules we installed earlier. Lines 4 to 7 are initiating the Firebase connection to the Hacker News API. You can set the to whatever you like, but the HAS to be exactly as is. appName databaseURL Line 9 is simple the placeholder for the RethinkDB connection that will be stored in . This is used later all over the app for conversations with our RethinkDB server. rdbconn Lines 11 to 14 initiates the link to our RethinkDB server. Remember to replace , , and with whatever you set earlier. If you are running this app on the same server as you installed RethinkDB on (as we are), then can simply have the value . host user password host localhost That is all the preliminary connection stuff out of the way. Now we are getting to the nitty gritty of the app. Lines 16 to 23 are where we set up the to various Firebase feeds for the HN API. We need to set up a for each individual feed we want. More information on the feed locations are on the site, but basically, there is a unique endpoint for each feed they publish. references reference Hacker News API Documentation For instance, the ‘New Stories’ feed is published at etc. /v0/newstories In our example, we are really only setting up the New Stories feed, but if you want extra, simply add extra lines with the unique feed endpoints that you need/want. Line 23 is a pretty important one. This is a special feed from Firebase/HN which contains a list of stories, comments and users that have changed since the last push update. This feed gets pushed out ever 20 seconds or so. As you can imagine, with the level of activity on Hacker News, each push of this feed can contain hundreds of article & comment IDs that have changed, and dozens of user profiles that have changed. Updates Updates This is the feed that we will be mainly listening to, in order to see what is new on Hacker News, and update our local database accordingly. Lines 26 to 43 is the function. This is the function we call to read a single user from Firebase using their user ID, and then inserting it into our local database. We call this function for user ID that is pushed out to us in the Updates feed that I spoke about above. pullUser() every Line 28 here is what calls the Firebase function, to read a specific user ID only the once. If the user is successfully found, then line 31 does an call to save it to the table in our local database. Notice the qualifier in this command. This basically means that if the record doesn’t exist in the database, then create it, but if the user ID already exists, then update the existing record instead of throwing an error. It is essentially a “ ” command, which makes things so easy to manage and one of the reasons I have grown to love RethinkDB. once() insert() hn_users {conflict: "update"} insert or update if exists The rest of this function is essentially housekeeping that outputs a message with the user ID that has been added or updated. console.log Lines 46 to 67 is the function which does exactly the same as the function, but for Hacker News articles and comments. Note: Articles and Comments are stored in the same table in HN). pullItem() pullUser() You may be thinking at this point — How come we aren’t saving or mapping individual fields when saving to our local database? Well, that is actually really simple to answer. Firebase returns data as JSON structures, and RethinkDB, being a NoSQL system, expects data to be sent to it as JSON structures. So there is really no extra manipulation to be done. We are simply handballing the JSON data that Firebase returns to us straight to RethinkDB. All fields and their values are sent across ‘as is’. Lines 70 to 82 are the busiest lines in the whole app. This is the function that waits for the Updates event to be pushed from Firebase. The function is basically a function that wait for a change of values event from Firebase, then runs. on("value" ...) This function checks the incoming Updates feed for two arrays. The array contains a list of article and comment IDs that have changed, and the array contains a list of user IDs that have changed. We simply cycle through these arrays and then call and to read the individual IDs and import them to our local database. items[] users[] pullItem() pullUser() Lines 87 to 93 is similar, but is an function which listens for an array of new stories that are being pushed out from Firebase. All we do here is grab the array of new story IDs and save them to the database for later use. on("value" ...) hn_lists Important: You need to set up a separate listener function for each feed that you want to save data from. Basically, for every that you set up earlier in the app (lines 17 to 23), you need to create an function here against that reference to read and process the data. reference on("value ...) That is IT! A short application of only 100 lines or so, but it does a lot. Lets run it. Save the file and back at the command line, type in: node feeder.js You should see console messages displaying the user, article and comments being transferred over. If you still have your RethinkDB console window open in your browser, you should see the activity graph spike every 15 to 20 seconds as a stream of data is read and saved after being pushed from Firebase. Tip: You can now set up your Node.js app to run as a service so that it will auto start and run even if your server is rebooted. The details on setting up PM2 on Node.js to . run as a service is in this detailed Digital Ocean guide Once you have done this, you can essentially ‘fire and forget’ your feed refresh server. Caution: Over time your RethinkDB will fill up, so please ensure your virtual server has enough disk space. You may want to point your RethinkDB data to a separate block storage device. I found that my database increases by at least 100–150MB per day. That is 1000MB of data every 10 days or so! Check The Data You might be a bit skeptical that everything is working as it should be, and I don’t blame you. This all seemed to easy, didn’t it? ;) Well, there is an easy way to check what is happening, and this is via the tab in the RethinkDB console again. You can run simple ReQL queries to check the tables. For example, to read the first 40 or so records from the feeds table, you can run the following: Data Explorer r.db('hn_data').table('hn_feed') which should show you something like the following: You can even do fancy queries, like return the last 10 articles by reverse date order by querying: r.db('hn_data').table('hn_feed').orderBy({index: r.desc('time')}).limit(10) (Note: For this to work effectively, you have to create a secondary index in the column in RethinkDB). You can create a secondary index directly in ReQL via this command: time r.db('hn_data').table('hn_feed').indexCreate('time') Please be patient — it can take a few minutes to fully index the table (you can check the reindexing progress from the dashboard), but after that, the above sorted ReQL query should work. Conclusion That is it! You now have a fully working RethinkDB server which is busy replicating the data from Hacker News in semi real time. The next article I will posts will discuss what we actually DO with this data, i.e. display it in a real time web page using as the front end framework. Catch you then! Vue.js