A few companies I've worked for have an IT policy on their secure computers designed to stop movement of sensitive data outside the enterprise. This policy encrypts all file data being written to removable media (USB drives, external hard drives, etc.) such that only a computer within the same enterprise can decrypt and read the data.
I recently noticed the name of the file is not encrypted...
If the filename can be written to the removable disk in the clear, then of course we might see cleartext information like
secret_file_for_customer.zip
. This is a leak of a just few characters of cleartext, but could we leak more than just a few characters? What about data more complex than text?Enter metamorpher: a tool that converts arbitrary data into metadata!
Metamorpher is very straightforward; it takes an input file and turns it into a directory structure of empty files whose filenames contain a Base64 encoded representation of the input file. Base64 is just a way of encoding binary data into text which is important here because only a certain set of characters are allowed in filenames.
First, let's generate some data to work with:
dd if=/dev/urandom of=./random_data bs=1M count=100
Encode the data with the
-e
flag:metamorpher -e ./random_data
This creates a new subdirectory (
./random_data_metamorphed
) that contains the encoded data. There are thousands of empty files in there, each with a chunk of the file data. Here's a snippet of just the last three:$ find ./random_data_metamorphed -mindepth 2 | tail -n 3
./random_data_metamorphed/8913d.HWdVeDtcf5Q8RDSy9iAK7BaOSIMi2dG5W9IZfhkLUX_qlwSjIWe1tJ_uFfpI-R7LRsGqurrviiPruE-sxP-s3gFQGJHbhuV5z1L2iTZOXsdOiVk_Q_lzkO0jx7boZ5Dm1NZmJtUAT8F7YVyherNkru7d2CRaxULF7Uo9UnbMvZfR16MGV4SzKX-SScIIRUARL6q8tubr2N-BIwRp6nfr57xajTkPlIv4to-x9CXCF765Zs3OyGY_eMijH
./random_data_metamorphed/89145.cY1wcToYWMW8oYj_C9ttE22iGbt6z3FDcgVLvfOMFgFG2rBtsN0rpJb5ZEO3sb_72jWbeWJ8CuFBomoiBrSu2tB4_ITa9rWZhcIB62sERQQwARqdFE2PbwMh0Uuuf1-y2wkazgJdYE0o5jO4NkxS7jj1C-b5--OY538ibwZizbG3IS-c0-1nFyhNRMsU1mgcuWp8BpavmYZoDJ_KmZDOJjUV0ZAPxrtuuGmfRdg5ZIzx5mmB-mztIoqCM
./random_data_metamorphed/655fe9206ffff5d7fbc2bbe426f0b2330b4eb567.sha1
The first two filenames above contain two things separated by a
.
: The third filename above holds a SHA1 checksum of the input file (
<sha1sum>.sha1
). This helps guarantee data integrity when decoding.For this example, on an ext4 filesystem, the encoded directory takes up about 209 MB on disk. Notably, every file in the directory is zero-length. All of the disk space utilized is from filesystem overhead. This has an added advantage of obscuring how much data you're actually moving to the disk. Of course, encoding binary data in Base64 reduces the data density which accounts for the rough doubling in data size.
Ext4 allows a maximum filename length of 255 characters, and the hexadecimal ordering characters at the beginning of each filename (plus a delimiter) subtract 6 characters from that number. This leaves 249 characters per filename for the Base64 encoded data.
100 MB of data encoded as a Base64 string is 139,810,136 characters long. 139,810,136 characters / 249 characters per filename = 561,487 files (rounded up to nearest integer).
Sure enough, we see 561,487 files (plus the one checksum file):
$ find ./random_data_metamorphed -type f | wc -l
561488
On my laptop, the encoding process for this example took about 18 seconds.
We can copy this directory to a removable disk and let the encryption policy do its thing! It will "encrypt" the thousands of empty files, but if it doesn't touch the filenames, the data is stored in the clear.
Once we copy the contents to an unmanaged PC, the
-d
flag can decode the directory structure back to a file:metamorpher -d ./random_data_metamorphed -o ./copy_of_random_data
The decoding of this example took about 3 seconds on my laptop. Metamorpher will automatically check the output data against the checksum.
That's it! We've copied 100 MB of data with perfect integrity despite an encryption policy attempting to stop us.
As with all data security techniques, there's a balance between user convenience and security. It's obvious that sensitive character strings (like
secret_file_for_customer.zip
) could be leaked through a filename, but leaking a few dozen characters per filename isn't that big a deal. However, this method shows that an insider could quickly encode a large archive of sensitive data into filenames on disk and send the contents to a competing company or rival nation.I've reached out to the two software vendors I know of who implement this type of encryption policy. The first vendor noted their tool supports a mode where files written to the disk are dropped in a container directory and archived prior to encryption which would protect against this method. They also mentioned they may review how the security implications of allowing cleartext filenames are presented in their documentation.
The other vendor simply stated that they do not claim to prevent intentional data leakage from authorized users as it's an unachievable goal. While that may be true, I'd imagine their enterprise customers might appreciate an effort to make data leaks more difficult for insiders.
In doing research for this post, I came across a couple of projects that do similar things and may be of interest to readers who've made it this far: