Hackernoon logoPrevent IoT Botnet Attacks Using AI with Code. by@maniksoni653

Prevent IoT Botnet Attacks Using AI with Code.

Author profile picture

@maniksoni653Manik Soni

SDE @Amazon

What if I say you that your IoT devices are deceiving you?


A botnet is an ‘internet’ of compromised systems which are controlled by the ‘herder’(owner of botnet). The systems can be compromised by any kind of malware which is executed in your system and allows someone else to control your system. Your may look to be working fine but in actual it may not!

Botnet attacks on Aug 31,2018 on USA from different parts of the world

You can also watch my video after reading this article to have a clearer understanding of prevention of botnet attacks using AI.

IoT Botnets:

An interconnected system of compromised IoT devices. These can include compromised CCTV cameras, cell phones, AC..etc.

What can Botnets do
  • Stealing and sending your personal data to someone else.
  • Deleting your data
  • You’ll we monitored by your devices!
  • You may loose access of your device completely.
  • Launching Attacks: Botnets can generate huge floods of traffic to overwhelm the target. These floods can be generated in many ways such as sending more requests to a server than it can handle or having computers send victim a huge amount of data. Some attacks are so big that they can max out a country’s international bandwidth capacity.
  • Can be used to influence political events
Some of the infamous Botnets
  • Bashlite: Also know as Gayfgt, discovered in 2014, this botnet controlled over 100,000 electronic devices
  • Mirai: Gaining worldwide attention in 2016, this botnet attacked Krebs, OVH, Dyn generated traffic volume above 1Tbps!.

Dataset Info:

We’ll use Logistic Regression to solve this problem.

The dataset used contains 75000+ samples with 0/1 as ouput. 0 denotes that the data from IoT device isn’t any type of attack. 1 denotes that it could be a tcp/ip flood, spam/junk data.

I downloaded the dataset from UCI Machine Learning Repository and is used by this Research paper.

The Dataset contains 115 features and hence I’ll explain you not what each feature is but how this features are generated.

Attribute Information:
H: Stats summarizing the recent traffic from this packet’s host (IP) 
HH: Stats summarizing the recent traffic going from this packet’s host (IP) to the packet’s destination host. 
HpHp: Stats summarizing the recent traffic going from this packet’s host+port (IP) to the packet’s destination host+port. Example -> 
HH_jit: Stats summarizing the jitter of the traffic going from this packet’s host (IP) to the packet’s destination host.

How much recent history of the stream is capture in these statistics 
L5, L3, L1, L0, L0.1

  • The statistics extracted from the packet stream: 
    weight: The weight of the stream (can be viewed as the number of items observed in recent history) 
    mean: mean of the two streams.
    std: standard deviation of two streams.
    radius: The root squared sum of the two streams’ variances.
    magnitude: The root squared sum of the two streams’ means 
    cov: an approximated covariance between two streams 
    pcc: an approximated covariance between two streams

I use Deep Learning Studio’s Jupyter Notebooks to train my model on this dataset. It actually comes pre-configured with all the ML/DL frameworks. If you don’t know about it, please check out this.

Different Environments on Deep Learning Studio


Thanks for giving your precious time for reading my article. If you really liked it, do share and clap 👏.

Please Subscribe to my YouTube channel Follow me on medium and LinkedIn.

Happy Deep Learning.

Youtube channel


Join Hacker Noon

Create your free account to unlock your custom reading experience.