Go adds fuzzing as a part of its testing tools. This feature is planned for the 1.18 release and is already available for beta testing. Let's see what it is for and why to add it as a part of the standard library.
According to Wikipedia, "Fuzzing or fuzz testing is an automated software testing technique that involves providing invalid, unexpected, or random data as inputs to a computer program." It is useful to discover crashes and a wide range of bugs, which are often a reason of vulnerabilities in software. Fuzzing is very effective for testing software that consumes structured inputs, like a special format or protocol.
Here are some examples of what can (and should be) fuzzed: cryptography, compression and serialization formats, network protocols, media codecs, text processings libraries, and anything that consumes untrusted inputs or is open to the internet.
Fuzzing is not a new technique. It's been around for decades and is well known, at least in security testing. In recent years fuzzing has become more and more popular. The reason for that is a combination of the following factors:
CI and automated testing became a standard - fuzzing takes time, and automating is a good idea;
More affordable CPU time makes it cheap to use;
The advent of coverage guided fuzzers. It's not just stuffing code with random data anymore. Fuzzers are smarter and require less time to find bugs.
You should understand that fuzz testing does not replace unit testing. It's rather a complimentary testing technique. Think about it like this: unit testing checks if the code works as expected, and fuzzing ensures that your assumptions about input and how to handle it are correct. And one of the main reasons it's successful is because fuzzing is impartial; it doesn't suffer from confirmation bias.
There are already some fuzzing tools available for Go, the most prominent is go-fuzz. It's coverage-guided - means fuzzer instruments the code and analyzes coverage in an attempt to discover new lines of code. And it's straightforward to use. So let's check it out:
1. Write fuzz function, here I put it in the same package as the function I'm testing
func Fuzz(data []byte) int {
FuzzedFunc(data)
return 0
}
2. Install components of go-fuzz and build the test program
go get -u github.com/dvyukov/go-fuzz/go-fuzz@latest \
github.com/dvyukov/go-fuzz/go-fuzz-build@latest
go-fuzz-build
3. Run the test program and wait
go-fuzz
2022/01/11 18:53:15 workers: 8, corpus: 1 (3s ago), crashers: 1, restarts: 1/0, execs: 0 (0/sec), cover: 0, uptime: 3s
2022/01/11 18:53:18 workers: 8, corpus: 1 (6s ago), crashers: 1, restarts: 1/0, execs: 0 (0/sec), cover: 3, uptime: 6s
2022/01/11 18:53:21 workers: 8, corpus: 1 (9s ago), crashers: 1, restarts: 1/7, execs: 14998 (1666/sec), cover: 3, uptime: 9s
And now you are fuzzing. Let it run for a while and check if the value of crashers increases in the output. It means go-fuzz found some crashes, and you can find stack trace and input for them in the crashers
directory. Input available in binary and quoted-string formats later is easier to use in tests.
Of course, there are some nuances to it: you can't fuzz package main; you can provide initial corpus, and it will help the fuzzer with initial data; returning different values from Fuzz function can increase and decrease the priority of the input; if you FuzzedFunc takes multiple arguments you may need to split the data variable into the required number of arguments.
Let's see what changes with fuzzing available as a part of the standard library.
Go adds fuzzing support as a part of its testing package and it's well documented. Also, there is additional documentation with more detail on the topic available here.
Here how new fuzzing support works:
1. Write required code. It’s now part of testing package, and code must be locates in a test file (*_test.go
)
import "testing"
func FuzzFuzzedFunc(f *testing.F) {
f.Add([]byte("aaa"))
f.Fuzz(func(t *testing.T, data []byte) {
FuzzedFunc(data)
})
}
2. Run the test. I will run it using the tip version of the language
go install golang.org/dl/gotip@latest
gotip download
gotip test -fuzz=FuzzFuzzedFunc ./...
fuzz: elapsed: 0s, gathering baseline coverage: 0/1 completed
fuzz: elapsed: 0s, gathering baseline coverage: 1/1 completed, now fuzzing with 8 workers
As you can see, there is no build step involved. Just go test
with -fuzz
flag with fuzz test name to run.
Fuzz test is basically a unit test but takes *testing.F
type as an argument. f.Add
function is used to provide seed corpus values for the test. Providing complex input is easier, compared to go-fuzz no need to split byte array.
f.Add([]byte("aaa"), true)
f.Fuzz(func(t *testing.T, data []byte, flag bool) {
FuzzedFunc(data, flag)
})
Fuzz test act as a unit test when go test
is run without the -fuzz
flag. Tests below are equivalent.
func FuzzFuzzedFunc(f *testing.F) {
f.Add([]byte("aaa"))
f.Fuzz(func(t *testing.T, data []byte) {
FuzzedFunc(data)
})
}
func TestFuzzedFunc(t *testing.T) {
seeds := [][]byte{
[]byte("aaa"),
}
for i, data := range seeds {
t.Run(fmt.Sprintf("seed#%d", i), func(t *testing.T) {
FuzzedFunc(data)
})
}
}
Contrary to go-fuzz, when a crash is found, fuzzing is terminated, and stack trace is printed to a console. Input that failed is written to a file in the directory testdata/fuzz/FuzzFuzzedFunc
within the package directory. Values from this file can be directly copied into Go code. This file, if kept, will also be used by go test as a seed corpus for fuzzing. It also means that this test will fail until you fix the code.
Here are my observations:
No need to download any tools or build anything;
A very convenient way to provide seed corpus as a code;
Fuzz test provided with seed corpus can be re-used as a regular unit test;
Because it's a part of go test now, it's easy to create a CI job for that - just clone the existing test job and add -fuzz flag.
The only downside, for now, is the lack of continuous fuzzing. But there is already a feature request regarding it.
Fuzzing is an important testing technique that helps find bugs and vulnerabilities in software that works with arbitrary input or is exposed to the internet. Adding it as part of the standard library significantly reduces entry barriers and makes it easy to start testing your code.