In programming, we often need to go between strings and bytes. Humans read strings. Computers read bytes. As a result, dotnet developers need to understand how to convert a string to a byte array in C#. We accomplish this through a process called encoding and decoding to go back and forth between the two.
In this article, I’ll provide you code examples to convert a string to a byte array in C# — and back again! You’ll also learn about some of the nuances of character encodings to look out for!
Encoding and Decoding in C#
If we want to convert a string to a byte array — or go the other way — we need to understand the concept of encoding and decoding. In software engineering, encoding refers to the process of transforming a string into a sequence of bytes, while decoding involves the reverse process of transforming bytes back into a string. Simple, right?
Remember that strings are a sequence of characters, and the concept of a character makes a lot of sense to us as readers, but computers understand bytes. Characters themselves can be represented by numbers in computers so when we want to work with strings at a lower level, such as when sending data over a network or storing it in a file, we need to convert the string into a byte array.
Encoding comes into play when converting strings to byte arrays. It determines how the characters in the string are represented as bytes. The encoding scheme defines the mapping between characters and their byte representations. Common encoding schemes include UTF-8, UTF-16, ASCII, and Unicode.
Choosing the correct encoding is important because different encoding schemes support different sets of characters. For example, ASCII only supports the basic English alphabet (with a few more characters), while UTF-8 and UTF-16 are capable of representing characters from multiple languages and scripts. More on this later though since I know you’re eager for some code!
Choosing an Encoding for String to Byte Array Conversion
When converting strings to byte arrays in C#, one of the most important things to consider is character encoding. Character encoding determines the mapping between characters and byte values, and if you’re considering transforming data one way then you may want to put some thought into how to transform it back!
What’s that supposed to mean? Well, if we use a data transform — let’s say taking a string and transforming it to bytes with an ASCII encoding — if there is not a mapping of a particular character to a byte representation we lose that data in the result. Now what happens if you want to go the other way and get your byte array back to a string?
Data is missing!
In C#, there are various encodings available, including ASCII, UTF-8, and UTF-16, each with its specific characteristics and usage scenarios. Let’s explore these encodings and see how they can be used for string to byte array conversion.
ASCII Encoding
ASCII encoding represents characters using 7 bits, allowing for a total of 128 different characters. It’s primarily suitable for handling basic English characters and is more space-efficient compared to other encodings. Here’s an example of converting a string to a byte array using ASCII encoding:
string text = "Hello, World!";
byte[] asciiBytes = Encoding.ASCII.GetBytes(text);
In many modern applications, ASCII may not be what you’re after. This is especially true if you have users across the globe in locales that are not English. That’s not to say that ASCII *cannot* be used, but you’ll want to be careful about what data you ASCII encode so as to not lose information during the encoding transform.
So keep in mind that the ASCII character set represents characters using a 7-bit encoding scheme, allowing for a total of 128 unique characters. However, with the growing need for internationalization and multilingual support, ASCII alone is insufficient to represent all characters effectively.
Good thing we have some other options coming right up!
UTF-8 Encoding
UTF-8 encoding is a variable-length encoding scheme that can represent any Unicode character. It’s widely used for encoding text in various languages and is backward compatible with ASCII. It uses fewer bytes for representing ASCII characters but may require more bytes for non-ASCII characters.
Here’s an example of converting a string to a byte array using UTF-8 encoding:
string text = "Привет, мир!";
byte[] utf8Bytes = Encoding.UTF8.GetBytes(text);
UTF-16 Encoding
UTF-16 encoding represents characters using either 2 or 4 bytes, making it capable of representing any Unicode character. It’s commonly used by applications that need to handle multilingual text or when interoperability with other systems is required. The encoding can be little-endian or big-endian, with the former being more prevalent. Here’s an example of converting a string to a byte array using UTF-16 encoding:
string text = "こんにちは、世界!";
byte[] utf16Bytes = Encoding.Unicode.GetBytes(text);
UTF-8 Encoding vs UTF-16 Encoding – What’s The Difference?
Each of these encodings is variable width, and their size gives them different characteristics when we consider different alphabets.
Starting with UTF-8, it’s variable-width and backward-compatible with ASCII. In this encoding:
- ASCII characters, which are on the range U+0000 to U+007F, take only 1 byte.
- In the next range, code points U+0080 to U+07FF take twice as much space at 2 bytes each.
- Code points U+0800 to U+FFFF take one more byte, bringing us up to 3 bytes
- And finally, code points U+10000 to U+10FFFF take 4 bytes.
This can be very effective for English text because those characters will take up minimal space. However, when it comes to Asian text it’s not ideal for the exact opposite reason.
Because UTF-16 has code points from U+0000 to U+FFFF take 2 bytes and code points U+10000 to U+10FFFF are double that at 4 bytes, it’s not-so-great for English. It *does* happen to be better suited for Asian characters though.
There’s even UTF-32 encoding! This is a fixed-width encoding where all of the code points take four bytes — unlike the others I’ve mentioned which are dynamic in size. This can use much more storage than the other encodings, but because of its simplicity can be much faster to operate on.
Give some consideration to the alphabets you’ll need to support primarily!
Best Practices for String to Byte Array Conversion
When converting strings to byte arrays in C#, it’s important to follow best practices to ensure efficiency and reliability. In this section, I’ll discuss some key best practices that you should keep in mind when performing string-to-byte array conversions.
Error Handling and Validation
When working with encodings, it’s important to handle potential errors and validate your data to prevent unexpected behavior in your code. Ideally, you structure the flow of your application such that you know what kind of data you’re dealing with. If you can write code to avoid errors in the first place, this is preferred!
We don’t want to have to rely on this but sometimes it’s outside of our control — handling errors is by using try-catch blocks. By encapsulating the conversion code within a try block, you can catch any exceptions that may occur during the conversion process and handle them gracefully. If you don’t have control over the source of the input data, this is something you might need to do for safety.
Encoding Selection
C# provides several encoding options for converting strings to byte arrays, such as UTF-8, UTF-16, ASCII, and more. It’s important to select the appropriate encoding based on the specific requirements of your application. Consider factors such as character sets, compatibility with other systems, and performance implications when choosing the encoding.
// Example of encoding selection
string inputString = "Hello, World!";
byte[] encodedBytes = Encoding.UTF8.GetBytes(inputString);
Just like we saw in the previous examples, after the Encoding class we get to pick the static property holding the encoding instance. If we need to select one to pass around as a variable and parameter, you can absolutely store it in a dedicated encoding reference:
Encoding selectedEncoding = Encoding.UTF8;
SomeMethod("Hello World!", selectedEncoding);
Selecting the wrong encoding can have big consequences for your application! This is especially true if you save data with an encoding that will lose data resolution and you can’t reverse it… so put some care into this!
Now You Know How to Convert a String to Byte Array in C#!
You’re a pro now with encoding and decoding! Well, maybe not a full-on expert… but you have the basics put in front of you and some guidelines to work with. That’s a pretty good start.
Remember to select the right encoding for the situation you’re dealing with. Keep in mind that you can potentially lose data resolution when using the wrong encoding, and as a result, lose data forever!
If you found this useful and you’re looking for more learning opportunities, consider subscribing to my free weekly software engineering newsletter and check out my free videos on YouTube!
