I was doing some tonight on LeetCode and I stumbled into an interesting performance case study. recreational programming The Problem This is LeetCode Problem: 881 Boats to Save People. Boats to Save People The person has weight , and each boat can carry a maximum weight of . i-th people[i] limit Each boat carries at most 2 people at the same time, provided the sum of the weight of those people is at most . limit Return the minimum number of boats to carry every given person. (It is guaranteed each person can be carried by a boat.) Example 1: Input: people = [ , ], limit = Output: Explanation: boat ( , ) 1 2 3 1 1 1 2 My code solved this problem in a blazing of memory. Not too shabby! 452ms and 21 MB -> people.sort() start, = ,len(people)- boats = start <= start < people[ ] > limit - people[start]: boats += -= boats += start += -= boats : class Solution def numRescueBoats ( , List[int], int) self people: limit: int: end 0 1 0 while end: while end and end 1 end 1 1 1 end 1 return After completing the problem, I realized that this problem is perfectly suited for Count Sort. Count Sort Count sort is one of my favorite algorithms because for the right data it is a sort that works in linear time, that’s O(n)! is a special sorting algorithm that works against data that is guaranteed to fit into a fixed range . The stable version of Count Sort is O(n+w) in both time and space complexity, where is the number of things to be sorted. The non-stable version of Count Sort can be done with O(w) additional space. Count Sort w n only Count sort works by putting elements in an array of fixed length according to their values on one pass. Then count sort scans through the and emits elements in sorted order. C C Here is how I implemented it for this problem. In this example, is the array I needed to sort. people count = [ ]*(limit+ ) people: count[i] += = range(len(count)): while count[i]: people[p] = count[i] -= += 0 1 for i in 1 p 0 for i in i 1 p 1 I was thrilled to use my favorite sorting algorithm! Count Sort should be faster than (python’s built in sorting algorithm). Which has an average performance of O(n * log n) and O(n) space complexity. Timsort Stable Count Sort My stable count sort solution solved the problem in of memory: 552ms and 21.1 MB count = [ ]*(limit+ ) i people: count[i] += total = i range(limit+ ): count[i],total = total,count[i]+total n = [ ]*len(people) i people: n[count[i]] = i count[i] += people = n start,end = ,len(people) boats = start <= end: people[end] <= limit - people[start]: start += boats += end -= boats : class Solution -> int: def numRescueBoats (self, people: List[int], limit: int) #Count Sort 0 1 for in 1 0 for in 1 0 for in 1 0 -1 0 while if 1 1 1 return There was a slight increase in memory usage, . This isn’t unexpected, Timsort uses O(N) space, whereas stable count sort uses O(N+W) space. But I was not expecting to see the runtime increase of .1 MB +100ms! Let’s see how the non-stable count sort does. Non-Stable Count Sort Non-Stable Count Sort Solution of memory 500ms and 20.1 MB : count = [ ]*(limit+ ) i people: count[i] += p = i range(len(count)): count[i]: people[p] = i count[i] -= p += start,end = ,len(people) boats = start <= end: people[end] <= limit - people[start]: start += boats += end -= boats : class Solution -> int: def numRescueBoats (self, people: List[int], limit: int) #Count Sort 0 1 for in 1 0 for in while 1 1 0 -1 0 while if 1 1 1 return I was happy to see the decrease in memory use. I predicted an O(w) memory usage, because I modify the array in place, so this decrease in memory usage was expected. The non-stable version also performed faster than the stable version, which was expected as the non-stable version has less overhead operations. people .9 MB 50 ms But why did Count Sort not beat out Timsort? Why would Timsort beat Count Sort? Timsort is a hybrid stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data. Wikipdia Timsort Honestly, I don’t know the answer here. But I have some ideas: Best Case Timsort has a Best case performance of O(n). This could beat out Count Sort (also O(n) if Timsort’s performance is faster. actual Timsort is very adaptive and works best with according to Wikipedia. It could just be smarter than me and out perform count sort here. real world data Timsort is Written in C The code for Timsort in CPython’s built-in implementation sort() is actually written in C! This would absolutely have a big performance impact over my implementation of count sort in Python. Leetcode is a Poor Test Harness Leetcode isn’t really meant to measure these kind of performance differences. There are modules that I could use to measure the performance of this code with 1000s of iterations to account for statistical variations. pytest The tests provided by leetcode are also limited. The inputs are only so large. <= people.length <= <= people[i] <= limit <= 1 50000 1 30000 It could be the case that CPython’s Timsort does out perform Count Sort for these small numbers, but when you get into the millions or billions scale, Count Sort starts to be faster. Conclusion I believe Timsort outperformed Count Sort for all the reasons I listed above and probably more reasons that I didn’t even think of! Software performance is a super hard problem. There are layers and layers of abstractions to dig into, as well as low level physical variance occurring at the atomic level when measuring performance. This was a fun exercise to go through, and I hope you learned something, I definitely did. Check out my course on APIs If you enjoyed this deep dive, you might love ! my course on APIs It is written in a beautiful Notion document and it is packed with awesome learning moments! If you can't afford this course but still want it, just and I'll give it to you for . contact me free Also published here.