As the news broke about Instagram hitting one billion monthly active users, I couldn’t help but create an account of my own and try to win myself some following.
Given my keen interest in IT, however, I wished to take the opportunity to test some real-world applications of what I had been learning and do so (gain followers, that is) using a bot, instead of putting in all the tedious work.
First thing I did was check for an Instagram API. To no avail, however, as it turned out to be a useless, outdated piece of software.
And although Facebook has been releasing new Instagram API recently, they only support business clients.
But hey, that’s no problem — I thought — I can create one of my own.
And that is precisely what we will learn today.
If you think about it, Instagram’s website is in and of itself the platform’s API. All we need to do is just figure out how to interact with it remotely instead of manually, like regular users.
And when there’s a will there’s a way.
Here is where Puppeteer comes into the picture. The library allows us to create a headless Google Chrome / Chromium instance and control it by using the DevTools protocol.
Setting up the project
You can go ahead and copy the repository.
Our structure looks like this:
| -- instagrambot/
| | -- .env
| | -- .eslintrc.js
| | -- .gitignore
| | -- README.md
| | -- package-lock.json
| | -- package.json
| | -- src/
| | | -- common/
| | | | -- browser/
| | | | | -- api/
| | | | | | -- authenticate.ts
| | | | | | -- comment-post.ts
| | | | | | -- find-posts.ts
| | | | | | -- follow-post.ts
| | | | | | -- get-following.ts
| | | | | | -- get-post-info.ts
| | | | | | -- get-user-info.ts
| | | | | | -- index.ts
| | | | | | -- like-post.ts
| | | | | | -- unfollow-user.ts
| | | | | -- index.ts
| | | | -- interfaces/
| | | | | -- index.ts
| | | | -- scheduler/
| | | | | -- index.ts
| | | | | -- jobs.ts
| | | | -- scraper/
| | | | | -- scraper.js
| | | | -- utils/
| | | | | -- index.ts
| | | | -- wit/
| | | | | -- index.ts
| | | -- config.ts
| | | -- index.ts
| | -- tsconfig.json
| | -- tslint.json
With that covered, let’s create our Browser interface which we will use to get rendered pages from Puppeteer.
Our getPage function creates a browser’s page for us, goes to the provided URL and injects our scraper (mentioned later). Also, it waits for our callback to return a promise, resolves it and closes the page.
To be perfectly clear, Puppeteer is on its own a browser interface, we just abstracted some code that would be constantly repeated.
We don’t have to worry about memory leaks caused by pages that we might have left open by accident.
Another helpful thing for anything related to web-scraping (which is somewhat what we do here) is creating your own scraper helper. Our scraper will come in handy as we proceed to more advanced scraping.
First, we define some helpers, mostly for setting data attributes. Other than that, we have this Element class which is an abstraction over normal HTMLElement. Also, there is a find function, it gives us a more developer-friendly way of querying elements.
Finally, we can create our first function that is actually related to Instagram. When we open the browser for the first time we need to authenticate our user.
First, we wait for the page to open, then we type in our credentials provided in config.ts. There is a 100 ms delay between each character. Then we take the Log in button and if it exists, we click on it.
If you would like to see the magic happen, set in Puppeteer’s launch options headless to false. It will open a browser and follow every action our bot will make.
Now that we are logged in, Instagram will automatically set cookies in our browser, so we don’t have to worry about having to log in ever again.
We can close the page (our interface will take care of it), and move on to creating our first function for finding posts with a #hashtag.
Instagram’s URL for the most recent posts with a #hashtag is https://www.instagram.com/explore/tags/follow4follow.
First 9 posts are always Top posts, meaning that they will probably never return our follow or like as they have thousands of them. Ideally, we should skip them and get only the recent ones.
More posts will load as we scroll down. In one scroll there will appear 12 posts, so we have to calculate how many times do we have to scroll in order to get the expected number.
On the first load, there is 9 top and 12 normal posts. That gives us 21 in total. If we wanted to find 36 posts and omit the first 9, we would have to subtract the first 12 and then divide the rest by 12, so we know how many times we have to scroll.
36(total) — 12(first) = 24 (the missing posts)
24 / 12 = 2 (the times we need to scroll)
Also, we will add one more scroll to the result, because if something took too long to render that would be our safety net.
We can iterate over returned URLs and execute a given set of actions on each one.
The thing is, we don’t know anything about the post except its URL, but we can find all the necessary information by scraping it.
Getting information about a post
As we see above, there is a lot of useful information on the website regarding the post, such as:
- Is the author followed?
- The follow button selector
- Is the post liked?
- The like button selector
- The author’s username
- The description and the comments
- The comment selector
- The number of likes
But before we go any further…
Adding NLP for our comments
There is this one thing that we should take into consideration and that is what is the purpose of the post?. Is it to show off someone’s new watch or to mourn a departed relative?
Ideally, we would like to know what the post is about. Here’s how we can do that:
Wit.ai is a service from Facebook which let us create an app and teach it to understand sentences.
That’s where NLP comes from, it stands for Natural Language Processing. It is also included in the Messenger API if you would ever like to make a chatbot, for example.
While it may take some time, we can teach our app to understand the description of a post and give us insights.
It is very simple, really, all we have to do is tell it what to look for in a sentence. In our case, the sentence will be a post’s description, that we will send with the node-wit library.
First, you will need to create an account on wit.ai. You can use your GitHub account to log in.
Then you can either create your own app or use someone else’s app. If you would like to use my trained app, here.
Our app takes a message and returns whether it’s happy_description or sad_description and how sure it is about it.
There’s also an emoji library for making our comments more lively.
Now let’s put our token in the config.ts and make a little helper for transforming messages to intents and generating comments based on the intent provided.
While the code is ready for us to put emoji into our comments, Puppeteer has had some issues lately with typing them. Once the issue is resolved, just uncomment the line and you are good to go.
Now that we can get the post information and it’s intent, using selectors we have previously found on the website, we can get to the elements holding the data.
Getting information about users
There is only so much we can take from a post. Sometimes we would be interested in the user’s profile.
We can glean a lot of useful information such as:
- The number of posts
- The number of followers
- The number of following
- Is the account followed?
- Bio (the description), but since this is of no use to us, we are not going to scrap it.
For now, we need to implement like-post.ts.
We simply check if the post is already liked. If not, we take the like selector and click on it.
The same goes for follow-post.ts.
With comment-post.ts we type the comment in the textarea and press enter.
Our bot also has to be able to unfollow people, since otherwise it might cross the 7500 follows limit.
First, we need to get the URLs of the people whom we are following.
We click on the following button in our profile.
A list should show up with the last 20 users that we have followed. We can then execute unfollow-user.ts for each URL.
Now that we have the URLs, we simply unfollow one user at a time. We click on the unfollow button and then click on the confirmation dialog.
Now that everything is ready, we have to think about how our bot is going to be working.
Obviously, it can’t just mindlessly keep following people all the time, because, as you may have guessed, Instagram would ban him very quickly.
From what I have read and seen during my own tests, there are different limits, depending on the size of the account and on age.
Let’s play it safe for now. Don’t worry, though, as an account grows, the limits get pushed further.
We can create a simple scheduler that will execute registered jobs once an hour, given that hour is in their expected time range.
As you might have noticed, we were using a lot of variables coming from a config. There are some values that we provide with proccess.env, which means they are sensitive and we should include them in the .env file. The rest can be changed manually.
Let’s register our jobs with hour ranges that a normal user would be active in and the only thing left will be to run the application.
Due to Instagram’s robots.txt policy, such an application is not allowed to be run. The post was made for educational purposes only.
npm run start
Thank you very much for reading, hope you liked it!
If you have any questions or comments feel free to put them in the comment section below or send me a message.
Follow me on twitter @maciejcieslar.
Originally published at www.mcieslar.com on July 10, 2018.