Google Summer of Code Recipient
You’re organizing a hackathon and decide to give free cloud storage as prizes to the winners. For the prize fund, you’ve got 1024 GB of cloud space. You would be giving these gigabytes with the condition that a higher place in the hackathon gets a larger amount of space. Since you want to make as many participants happy as possible, you want to find the maximum number of places for which you’ll be awarding the prizes. That means, if you had just 8 GB available, you’d be having total 3 positions — the winner gets 5 GB, the runner up gets 2 GB and the person who came third gets 1 GB (another variation is possible — 4, 3 and 1 GBs, but the number of positions is still 3 for 8 GB).
So how do you solve this? Note that (as demonstrated in the above example) there are multiple distributions possible for a given number of positions (let’s call this number p). Indeed, this boils down to representing a number in terms of the sum of distinct smaller numbers in such a way that there are as many of these numbers as possible. For 8 gigs, we could’ve chosen the form 8 = 7 + 1 or 8 = 5 + 3, but these wouldn’t have been optimum, since 8 can be expressed as a sum of more than just a couple of numbers — as in 8 = 5 + 2 + 1. A mathematical concept which can add convenience to solving this problem is that of partitions — to quote Wikipedia, a partition of a positive integer n, also called an integer partition, is a way of writing n as a sum of positive integers. So in our case we simply want to calculate the partition of 512 which has as many numbers as possible. Let’s call this a max partition for the sake of the conversation.
In computer science, this problem falls into a certain class of problems whose solutions use greedy algorithms — procedures that make the locally optimal choice at each stage of the solution with the hope of finding a global optimum.
The “greedy” approach to solving our example is as follows: doesn’t it seem natural to start with 1 as the first summand? All that remains then is to express 7 as a max partition and add 1 to it. But now expressing 7 as a max partition has a constraint — we cannot use 1. So we use 2 and move on to represent 8 - (1+2) = 5 as a max partition. Again, for that, we can’t use 1 and 2. Neither can we use 3 or 4 because then we’ll end up using 2 and 1 again, respectively. Thus, we represent 5 as just itself and we’re done — we now have our max partition as 8 = 1 + 2 + 5. It can be shown easily that this final condition arises when the number we originally wanted to pop out (here 3) is atleast half the remaining number (here 5). I leave that part for you to figure out.
So, to put this strategy more formally — consider that we initially have two numbers n= 8 and l = 1. If n ≤ 2l, we simply represent n as itself, otherwise we pop out l and then solve the subproblem of representing n - l as a max partition such that each number in the partition is at least l+1. The value of l for this subproblem is 1 greater than that of the original problem. So for our example (n, l) of representing 8 as a max partition, we first pop out 1 and then solve the subproblem (n-l, l+1), i.e (7, 2). For this subproblem, we pop out 2 and then solve the subproblem (7-2, 2+1), that is (5, 3). Now, since 5 ≤ 2x3, we just pop out 5 and we’re done. We now just sum up the popped out numbers to get 8 = 1 + 2 + 5.
Since we’ve articulated the strategy more formally now, it’s easy to come up with a working program to solve our problem. Here’s a straightforward implementation in Python 3:
def max_partition(n, l=1):
partition = 
while n > 2*l:
n = n - l
l += 1
But wait — I’ve got another approach. Perhaps better. I noticed that numbers which can be represented as the sum of first n natural numbers are special. Aren’t they already in max-partition form when they show off their identity?! For example, isn’t 6 already in max-partition form when written as 6 = 1 + 2 + 3? Isn’t 10 already in max-partition form when written as 10 = 1 + 2 + 3 + 4? Let’s call such numbers “senior numbers” for the sake of this conversation. This insight forms the basis for my algorithm.
Here’s how we proceed: if the number n whose max-partition we have to find is already a senior number, we simply represent it in it’s n = 1 + 2 + 3 + … + k form. If it’s not a senior number, we still find a k which is just large enough to make the sum s = 1 + 2 + 3 + … + k greater than n (by large enough I mean k can make 1+ 2 + 3 + … + k greater than n, but cannot do the same for 1 + 2 + 3 + … + k-1). Since k is just large enough for s to be greater than n, s-n is going to be less than k. So s-n is going to be a number among 1, 2, 3, …, k-1. What if we “pluck out” s-n from the sum of first k natural numbers? That would give us s-(s-n), which is nothing but n!
By the way, the last character of the above paragraph isn’t for a factorial 🙂. Let’s visualize the idea we’ve learned so far. Usually, Ferrers diagrams are used to visualise partitions, but for our purposes, I found my custom visualization more convenient: in the below tree, the top node is the number whose max partition we wish to evaluate. The leaves are the numbers in the max-partition representation, that, of course, sum up to give the number on the top node. For a senior number, everything’s good:
For a number that’s not senior, we cut the appropriate branch so that the number on the leaf which was connected to the top node via that branch isn’t added. Consider the case of 9:
And below is the tree for 8. Notice that we’re simply finding the number k. For 8 and 9, it’s 4. Since the sum of first 4 natural numbers is 10 we first draw the tree for 10 and then replace 10 in the top node with the number we want — here it’s 8. We then cut of the branch connecting the top node to the number 10 – 8 = 2. For 9, that number was 10 - 9 = 1.
The max partition then is simply the sum of the remaining leaves. I hope you understand my algorithm by now.
One subtlety that’s remaining to be uncovered is the method to find out the “just large enough k”. But it’s a pretty straightforward calculation. The sum of first k natural numbers is n = k * (k+1) / 2. After solving this equation for a positive k, we get k = (√(1 + 8*n) – 1) / 2. Since k is going to be fractional if n wasn’t a senior number, we take the ceiling of it. That makes k large enough.
I think by now I’ve articulated this algorithm clearly. We can thus proceed to programming the solution. Here’s another straightforward implementation in Python 3:
k = ((1 + 8*n)**0.5 - 1) / 2
k = math.ceil(k)
summands = list(range(1, k+1))
the_sum = int(k * (k+1) / 2)
if the_sum - n > 0: # If n is not senior.
If we do some analysis on both the algorithms, we discover that both of them run in linear time, that is O(n). However, the invisible constant hidden in O(n) is perhaps much less for optimal_summands() than for max_partition().
I did some simple checks in Python to see which method is quicker and the latter one turned out to execute more than thrice as fast as the former. I used Python’s
timeit module to time both the algorithms, and here’s an instance of one of my checks on the Python interpreter:
>>> from timeit import timeit
>>> timeit(setup='from different_summands import optimal_summands', stmt='optimal_summands(10000)', number=100000)
>>> timeit(setup='from greedy_different_summands import max_partition', stmt='max_partition(10000)', number=100000)
I’ve often observed that knowing some mathematical facts allows one to develop a better algorithm, or atleast develop an algorithm faster and with more intuition. Mathematical insights can often dramatically improve the runtime of one’s programs. Math and computer science — especially the study of algorithms, are great friends!
If you know about some other factors which make the latter program work faster, please let me know in the comments. At the end, perhaps greed isn’t always good, but math is. 😀
By the way, you can split those 1024 gigabytes as:
1024 = 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10 + 12 + 13 + 14 + 15 + 16 + 17 + 18 + 19 + 20 + 21 + 22 + 23 + 24 + 25 + 26 + 27 + 28 + 29 + 30 + 31 + 32 + 33 + 34 + 35 + 36 + 37 + 38 + 39 + 40 + 41 + 42 + 43 + 44 + 45
(Notice the missing number? Hint: it’s the sum of first 45 natural numbers - 1024.)
Of course, then you’ve got to have more than 45 participants in your hackathon! 🙂
P.S: Should I write an ArXiv paper for this?
Create your free account to unlock your custom reading experience.