In this post I’ll demonstrate how to achieve simple image steganography using Python. All digital file formats use internal structures and schemas, therefore unique implementations are required for different mediums, and often for different formats within those mediums.
Steganography is the art and science of concealing a message or file within a different, typically unrelated medium. Nowadays we think about concealing a message electronically within some other kind of digital file type. However the practice of steganography is not a recent development. Soldiers and spies have been using physical steganographic techniques for centuries if not longer.
Steganography is often linked to and discussed alongside cryptography, however the two are not mutually exclusive. One might use steganography in conjunction with encryption in order to deliver a secret message to a recipient without drawing attention to the fact that a message was sent at all. In essence using a combination of the two allows for a layered defense of secret communications.
In the digital age, steganography is increasingly being used by hackers and criminals to covertly communicate sensitive information. It is even being used as a means of bypassing network defenses to communicate with malware.
In steganography the payload is the data covertly communicated and the carrier is the signal, stream, or data file that hides the payload.
When it comes to digital images, specifically in regards to steganography, there are three primary classes. There file types are those with lossy-compression, those with lossless-compression, and raw file types. We will be working with lossless-compression file types. Lossless and lossy compressed files types are most common in the world today because of their compact size.
Of these, raw files are typically the easiest to work with. However they tend to be very large files, are not commonly used by most consumers, and often require special software used professional photographers or enthusiasts in order to view and edit. Some examples of raw image file formats are RAW and DNG.
Lossless-compression means that the files are stored in a compressed format, but that this compression does not result in the actual data being modified when the file is opened, transported, or decompressed. For lossless files steganography is often accomplished by manipulating the least-significant bits (LSB) of the carrier file. Some examples of lossless-compression image file formats are PNG, TIFF, an BMP.
Lossy-compression is the opposite of lossless-compression. When lossy compression is used there is no guarantee that the file will not be modified slightly when subjected to storage, transmission, or decompression. In nearly all cases this modification will be imperceptible to the end user, otherwise it wouldn’t be very popular. However, since LSB steganography will modify these “unimportant” bits that can be lost during compression doing steganography on files with lossy-compression is more complex. This means we can’t risk using lossy compression that may not preserve our modifications. Some examples of lossy-compression image file formats are JPEG and BPG.
So we’ve established that we will be working with LSB (least-significant bit) steganography and we will be limiting ourselves to using lossless image formats. This type of data hiding can be achieved using a variety of software languages and tools, however, I’ll be using python 3.6 and relying on the PIL library for our image support.
**1. Set up a virtual environment (optional)**This step is optional, though for clarity and keeping a clean development environment I would recommend you set up a virtual environment. I won’t cover how to do so in detail here, but you can use this documentation if you’re unfamiliar with the process.
**2. Installing dependencies**As I mentioned previously, we’ll be working with the PIL library. PIL can be installed using pip by running$ pip install pillow
If you’d rather, I’ll include my code here which will have a requirements.txt file you can use to install all dependencies. Additional requirements are binascii, random, and os. Though several of these are included libraries.
Now that we have our environment set up, we can start looking at the code.
Simply stated, LSB steganography works by encoding a secret message into the least-significant bit of each pixel in an image. In order to do this we need to do several things.
I’ll break down each step below.
**1. Reading the carrier image.**First we’ll create a new file which will container our image class and all necessary methods needed to do image handling. I’ve called this steg_img.py
Create an IMG class that will contain all our logic for reading and creating images.
Open the image using PIL and get some information such as the image type and size.
2. Confirm the carrier is large enough to fit the payload.
Because we may be working with different image file types with different modes, we need to check to make sure there is sufficient space to store the message within the carrier.
First we need to find the size of the payload in bits. We can use python os library for this and will multiple the result by 8, since we want the size in bits rather than bytes.
Next we will compare the payload size to the space available in the carrier. The space available will vary by image mode. Generally I treat RGB and RGBA images as having a capacity of 3 bits per pixel, while L mode images only have 1 bit per pixel. Just to be sure I can fit the buffers and any additional metadata I check to make sure the carrier’s capacity is at least 2x the size of the payload.
3. Convert the payload and buffers into binary data.
Next we’ll convert to binary and generate random binary data to fill in the rest of the image’s capacity. Once we add a few buffers that will be used for reconstruction, we will fill up the rest of the difference between the payload and carrier capacity with pseudorandom data.
5. Create a new image.
Now that we have our payload and buffers in binary we’re ready to create a new image that is a copy of the carrier image but with modified least significant bits on each color of each pixel. After we’ve created the new image it will simply be saved as new.<file_type> in the current directory.
You’ll see above that there’s a method from common.py called set_bit, this is where the actual magic of replacing the least significant bit of each color of each pixel takes place.
And there you have it! With a bit of python and the help of the PIL library you can start hiding secret messages in your images.
You can view my full code here or install the simple ‘steg’ library I wrote via pip by running$ pip install steg
In the future I may add some other fun capabilities to the library, like the ability to encrypt your payloads prior to hiding or add additional file formats for hiding. I’ve also added a command line tool to the repo above that will allow you to hide and extract payloads from the command line.