paint-brush
AI and Data: Balancing Progress, Privacy, and Securityby@patriciadehemricourt
596 reads
596 reads

AI and Data: Balancing Progress, Privacy, and Security

by Patricia de HemricourtFebruary 8th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The right to privacy encompasses two key elements: the right not to be observed and the right to control the flow of information when observed. With the increasing collection of data and the power of emerging technologies, individuals will be targeted and monitored by both the public and private sectors to an unprecedented degree, often without adequate anonymity or consent.
featured image - AI and Data: Balancing Progress, Privacy, and Security
Patricia de Hemricourt HackerNoon profile picture

With ChatGPT dominating the tech and other news, the concern
about data security might have momentarily taken a back seat. Yet, the expansion of AI/ML technologies' potential impact on data privacy and security should sound the alarm for organizations and individuals alike.

Data Privacy in AI the Age


As we are at the cusp of fully entering the AI age, taking the necessary steps to protect our privacy is paramount. The right to privacy encompasses two key elements:

  • The right not to be observed  
  • The right to control the flow of information when observed.

With the increasing collection of data and the power of emerging technologies, individuals will be targeted and monitored by both the public and private sectors to an unprecedented degree, often without adequate anonymity or consent.

Surveillance, such as biometric identification technologies, is becoming increasingly sophisticated.

Though some companies have self-regulated the sale of facial recognition to law enforcement and its use in public spaces faces an upcoming ban in the EU, they are a burning cause for concern, especially as they are now capable of analyzing emotions.

Still, the rise of AI potential data sources continues unabated.

The mass shift to home working during the pandemic has led to tracking workers through cameras, keystroke monitoring, productivity software, and audio recordings.

Yet, the popularity of remote working might decrease the day a camera fixated on the worker’s face will calculate the exact time ratio that a worker is focusing on work or simply thinking or daydreaming, as this data could be factored into salary slips.

In parallel, despite stringent regulatory protection, the spread of open-source networked data surveillance is increasingly accessed by a growing number of both public and private sector actors.

For example, live camera feeds of highly touristic destinations, are already used to surveil Instagram influencers to debunk the so-called spontaneity of their published pictures.

As our lives become increasingly digitalized over the next decade, our "everyday experiences" will increasingly be recorded. The collected data is likely to be commodified through internet-enabled devices, intelligent infrastructure, and "smart" cities networks designed to improve urban life quality.

In the metaverse, this pattern will be exponentially applied, as it could collect even more sensitive data from naïve ‘metaversers’ unaware that their facial expressions, gait, vital signs, brainwave patterns, and vocal inflections might be dissected and analyzed for commercial, political, or other purposes.

Even if they consented to the collection of data, it is unlikely they fully understand their extent and potential use. Even more so, to date, the potential uses for these data are still largely unknown and will develop in the future as technology evolves.

This growing data collection, typically presented as necessary for the beneficial use of a service or product, its commercialization/opensource sharing, is known as the "mosaic effect". The mosaic effect gives rise to two key privacy risks: re-identification and attribute disclosure.

To date, research indicatesthat 99.98% of US residents could be correctly re-identified in any data set using 15 demographic attributes.

The potential implications are left to your imagination at this stage.

Data Security in the AI Age

Defining the rights might be crucial, but without securing the collected data against breach, even the best laid and fully observed legislation would be little more than a pyrrhic victory.

So, it is more important than ever to be aware of the potential cybersecurity threats that AI/ML technologies can pose. With their democratization, it's becoming easier for hackers to use these tools to arm their exploits with more intelligence and efficiency.

One of the key differences between adversarial AI/ML-powered attacks and traditional cyberattacks is the combination of speed, depth, automation, scale, and sophistication that these models offer.

In fact, AI/ML models can bring about three major changes to the way threats are orchestrated and executed:

  • an amplification in terms of the number of actors participating in
    an attack
  • The occurrence rate of these attacks, and the number of attacked targets; introduction of new threat vectors that would be impractical for humans to craft using traditional algorithms
  • The injection of intelligence into traditional attack vectors, bringing new attributes and behavior to these threats, such as opportunism and polymorphism

AI/ML-powered cyberattacks fall into seven categories: probing, scanning, spoofing, flooding, misdirecting, executing, and bypassing.

  • AI/MLS Probing: The use of AI/MLS to access an organizational asset to determine its characteristics, such as using AI/MLS to intelligently mine public domain and social network data to launch personalized social engineering attacks. 
  • AI/MLS Scanning: The use of AI/MLS to access a set of organizational assets sequentially to detect which assets have a specific characteristic, such as using AI/MLS to accurately identify operating systems with a 94% success rate.·       
  • AI/MLS Spoofing: The use of AI/MLS as a masquerade tool to disguise the identity of an entity, such as using AI/MLS to create a stealthy backdoor neural network that behaves as expected on user-chosen inputs but misbehaves otherwise on attacker-chosen inputs.
  • AI/MLS for Flooding: The use of AI/MLS to overload an organizational asset's capacity by creating self-learning "hivenets" and "swarmbots" to amplify the scale of the attack, applying adversarial machine learning techniques in IoT systems to trigger jamming and spectrum poisoning attacks, and using AI/MLS to break CAPTCHA and Google reCAPTCHA with varying degrees of success. 
  • AI/MLS for Misdirection: The use of AI/MLS to deceive a target and provoke an action based on false information. This can include techniques such as cross-site scripting and email scams. Research has shown that AI/MLS can be used to generate malicious domain names, which can feed several types of cyberattacks, including spam campaigns, phishing emails, and DDoS attacks.

    For example, misusing Generative Adversarial Networks (GANs) to act as a malware tool by producing malicious domain names that can infiltrate current Domain Generation Algorithms (DGA) classifiers.

    GANs are a class of deep-learning neural network architectures that can learn to generate new data similar to a training set. The danger of GANs lies in their technical ability to produce genuine-looking data (e.g., domain names, URLs, email addresses, IP addresses) that hackers can misuse to infiltrate most Network Intrusion Detection Systems (NIDS). An easier-to-achieve example would be launching AI-powered phishing attacks that can evade conventional filters and improve the open and click rate of Business Email Compromise (BEC) scam/phishing attacks.
  •      AI/MLS for Execution: The use of /MLS to execute malicious processes on a system, such as viruses and Trojans. Examples of this include:
  1. IBM DeepLocker, a hacked version of a video-conferencing software that uses deep convolutional neural networks to hide its attack payload in benign carrier applications and activate it when a specific target is identified.
  2. AVPASS, an open-source AI-aided software that can mutate Android malware to bypass anti-virus solutions. Additionally, researchers have proposed using AI/MLS to create
    malware that can evade detection by detecting blind spots in AI/MLS-based
    detection mechanisms.
  3. DeepHack, a proof-of-concept open-source AI/MLS-based hacking tool that uses ML algorithms to break into web applications or perform penetration tests autonomously, and GAN-based algorithms to generate adversarial malware samples that can bypass black-box ML-based detection models.
  • AI/MLS for Bypassing: The use of AI/MLS to create an alternative method to access an organizational asset or to elevate access privilege to a given asset. This can include using AI to optimize some credential dumping tacticssuch as accelerating the process of cracking admin passwords by reducing the number of probable passwords based on collected data about the end-users or their organization.

AI/MLS models have the potential to revolutionize the field of cybersecurity. However, they also introduce new risks, as they can be used to create advanced and evasive malware, as well as to bypass security measures.

With the increasing ubiquity of AI in IoT and the Internet of Behavior (IoB), the risk of data breaches becomes even greater.

Misuse of breached private data by weaponized AI/MLS models could have devastating consequences, including identity theft and financial fraud, disinformation campaigns, or even disruptive or destructive weaponization of IoT infrastructures.

This highlights the need for organizations to incorporate AI/MLS into their cybersecurity strategies and for researchers and professionals to develop new cybersecurity AI/MLS approaches that take adversaries into account.

The use of AI/MLS in cyber threat response is an underexplored research topic that is worth pursuing, as well as proactively anticipating the potential misuse of AI/MLS models and sharing countermeasures with the information security community.

It's important to raise awareness among AI/MLS researchers, cybersecurity academic and professional communities, policymakers, and legislators about the interplay between AI/MLS models and cybersecurity and the dangers that weaponized AI/MLS models can pose to cybersecurity.

The implications of these risks should be taken into account as the field of cybersecurity continues to evolve.