303 reads

CPU vs. GPU for Video Transcoding: Challenging the Cost-Speed Myth

by Mirza BilalOctober 13th, 2023

Too Long; Didn't Read

In a detailed evaluation of CPU vs. GPU computational power for video transcoding using AWS instances and FFmpeg, the widely held belief that CPUs are more cost-effective while GPUs are faster but pricier was put to the test. Through meticulous testing across a range of AWS instances, including both GPU and CPU-based types, results indicated that modern GPU instances, especially the AWS Graviton2-based G5g.xlarge, are not only faster but also more cost-efficient in video transcoding tasks compared to several CPU-centric instances. The experiment highlights the need to re-evaluate prevailing tech narratives, emphasizing that in certain applications, the more "advanced" option may also be the most economical.

featured image - CPU vs. GPU for Video Transcoding: Challenging the Cost-Speed Myth

When evaluating computational power, especially in terms of CPUs and GPUs, there’s a prevailing narrative. A general belief is, that CPUs may take longer to process, but they're cost-effective, whereas GPUs might be faster but operate at a higher cost.

How true is this widely accepted notion?

To challenge this belief, we conducted a tangible, real-world assessment using AWS instances and FFmpeg for video transcoding benchmarks. And sought to determine the most cost-and-time-efficient option for transcoding videos and audio, hence enabling us to save on our AWS bills.

To install FFmpeg from the source on CPU you can follow this guide, and for installing FFmpeg with hardware acceleration you can check this out.

The Instances Selection

In our tests, we compared various AWS instance types, covering both GPU and CPU across Intel and AWS's Silicon-based Graviton2 instances. On the GPU side, we picked instances featuring Nvidia Tesla T4 and T4G. Whereas for CPUs, we looked at three instances, two from the same generation and size, an Intel-based c7i.2xlarge, and Graviton2-powered c7g.2xlarge. The third CPU-based instance we chose was c6g.4xlarge, to assess the impact of more vCPU on transcoding.

GPU Instances:
- g4dn.xlarge
- g5g.xlarge
CPU Instances:
- c7g.2xlarge
- c7i.2xlarge
- c6g.4xlarge

We made thoughtful selections for each of these instances. We aimed to choose those with similar costs to ensure a fair cost-to-performance comparison. Additionally, to explore the performance implications of doubling the CPU count, we extended our benchmarks to the c6g.4xlarge instance.

The Process

Before analyzing the results, it's important to discuss and understand the types of tests that were conducted.

Downscale to 480p:

Downscaling is the process when the video is squeezed smaller than its original size. It's useful for platforms or devices that cannot support high-resolution videos or when smaller file sizes are needed. For this test, we downscale the input video to 480p (640 pixels x 480 pixels).
Resample at 720p:

Resampling does not change the video's resolution but may alter the underlying pixel values. It can be beneficial for modifying encoding settings or applying specific filters. In this case, we resampled the video at its original resolution of 720p (1280 pixels × 720 pixels).
Upscale to 1080p:

Upscaling is the opposite of downscaling and is used to convert the video to higher resolution. Upscaling generally produces better results than playing or rendering a smaller video and stretching at playtime. In this test, we upscale the video to a higher resolution of 1080p (1920 pixels × 1080 pixels).
No Scaling:

All the above tests were conducted using a scale filter of FFmpeg but for this test, we did not provide any filter for scaling instead we simply re-encoded the video.

The Benchmarking

To ensure objectivity, we use the same video file for benchmarking. The input video details are as follows:

Container Format: mp4
Duration: 01:34:40.38
Bitrate: 1579 kb/s
Video Codec: h264 (High),
Video Resolution: 1280x720 @ 23.98 fps
Audio: aac (LC), Sample Rate: 48000 Hz, 5.1

We utilized FFmpeg, a leading open-source software for multimedia processing, to devise our benchmark script. The script contains tests for both CPU and GPU-powered machines, first, it checks whether GPU is available or not. Depending on the result, it executes the appropriate command for video processing.

To execute our transcoding tasks, we used the following benchmark script:

#!/bin/bash

if lspci | grep -i "NVIDIA Corporation" >/dev/null; then
    echo "System has a GPU"
commands=(
    'ffmpeg -y -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i input.mp4  -vf "scale_cuda=720:480"   -c:a copy -c:v h264_nvenc output.mp4 -benchmark'
    'ffmpeg -y -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i input.mp4  -vf "scale_cuda=1280:720"  -c:a copy -c:v h264_nvenc output.mp4 -benchmark'
    'ffmpeg -y -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i input.mp4  -vf "scale_cuda=1920:1080" -c:a copy -c:v h264_nvenc output.mp4 -benchmark'
    'ffmpeg -y -hide_banner -hwaccel cuda -hwaccel_output_format cuda -i input.mp4                             -c:a copy -c:v h264_nvenc output.mp4 -benchmark'
)
else 
commands=(
    'ffmpeg -y -hide_banner -i input.mp4 -vf "scale=720:480"   -c:a copy -c:v libx264 output.mp4 -benchmark'
    'ffmpeg -y -hide_banner -i input.mp4 -vf "scale=1280:720"  -c:a copy -c:v libx264 output.mp4 -benchmark'
    'ffmpeg -y -hide_banner -i input.mp4 -vf "scale=1920:1080" -c:a copy -c:v libx264 output.mp4 -benchmark'
    'ffmpeg -y -hide_banner -i input.mp4                       -c:a copy -c:v libx264 output.mp4 -benchmark'
)
fi

for cmd in "${commands[@]}"; do
    echo "--------------------------------------------------------------------------------"
    echo "Executing: $cmd"
    echo "--------------------------------------------------------------------------------"
    # Use the time command to measure how long it takes to run the command
    { time eval "$cmd"; } 2>&1
    rm output.mp4
done | tee output_results.txt

The Findings

The data from our AWS benchmarks after executing 20 different tests over five different AWS Instances, painted a compelling narrative. The results showed clear differences in the cost-efficiency and performance dynamics among these instances. The data extracted and processed from benchmark results can be listed as:

The GPU instances, notably AWS Graviton2 g5g.xlarge, were not only faster but also more cost-effective across various transcoding operations, compared to CPU-centric instances like c7g.2xlarge. The introduction of the c6g.4xlarge, with its doubled vCPUs, provided insights into how increasing computational power influences performance and cost. Interestingly, even with the added CPUs, despite it being more expensive than GPU-powered instances, it performed significantly worse, and GPU instances continue to offer a better balance between speed and cost. Furthermore, it is important to mention here that the FFmpeg was built to run on multiple cores and was utilizing all CPU cores as can be seen in htop screenshot during a transcoding task.

Let's create a visualization of benchmark results to compare the time taken and cost when running on different EC2 instances.

The Winner?

From the previous charts, it's evident that the AWS Graviton2-based G5g.xlarge emerges as the most efficient choice. Not only does it excel in efficiency, but it also appears to be more cost-effective. To further illustrate its cost advantage, let's juxtapose it with various AWS instances to discern just how economical it truly is.

The bar chart offers a vivid representation of how G5g.xlarge stacks up against other AWS EC2 instances in terms of cost. When downscaling to 480p, the G5g.xlarge is significantly more cost-effective, with the c6g.4xlarge , which is the most expensive across different transcoding tasks, which is a whopping 370.9% more expensive than G5g.xlarge for downscaling operations. For the resampling at 720p, the disparity grows even more evident, with the c6g.4xlarge being 445% pricier than the G5g.xlarge. Similarly, when upscaling to 1080p, the cost associated with c6g.4xlarge is 438.9% more than our winner. Finally, for the 'No Scaling' operation, c6g.4xlarge proves to be 446.1% more expensive.

In stark contrast, the g4dn.xlarge, although being one of the GPU-based instances, presents minimal cost differences when compared with G5g.xlarge. Its costs are just around 24.8% to 27% for the various operations, showcasing that while GPUs might be fast, their cost benefits, especially in this case, aren't always as pronounced.

These findings underline the impressive cost efficiency of the AWS Graviton2 G5g.xlarge featuring Nvidia Tesla T4G, when placed against other popular AWS instances.

In Conclusion

The ever-evolving realm of technology often holds narratives based on past truths, which may not hold relevance today. Our experiment underscores a crucial fact: in video transcoding, modern GPU instances aren't just faster; they also offer a more economical choice. When choosing between a CPU or GPU for cloud-based operations, it's essential to consider both performance and cost. And as demonstrated, sometimes the supposedly "faster and pricier" option can also be the most cost-effective.