18,000,000,000 km away, Voyager 2 left the warming afterglow of solar-winds from the sun that is the heliosphere, and ventured onwards towards interstellar space.
Since launching on the 20th August 1977, the spacecraft has enjoyed an expansive panorama of our solar system — observing Jupiter in 1979, Saturn from 1981 and Uranus towards the end of 1985.
Given the vastness of space, it begs the question — if you have a limited selection spots available, how can you determine the best sample of a population?
In this case, I assume the “best” to be a selection’s relative uniqueness against the entire population. For instance, how unique is Mars amongst all other planets in our solar system. But to reach Mars requires a compromise, a rejection of alternatives, say the set of Mercury and Venus.
Representation of scaled-distances and size of Solar System
In short, we want to find the set of N items that are the most insightful about our population while minimising the cost in reaching them.
Using Python3, we firstly want to calculate the distinction amongst each item. We can imagine a list describing each planet:
#each key contains a list of values representative of planet's propertiesPlanets = {}
'''planet list => index_0 = radius-in-km, index_1 = weight-in-kg (strip exponent for example), index_2 = human-population, index_3 = O2-concentration-in-atmosphere-percent, index_4 = gravity-acceleration-in-m/s^2.'''
planets["Mercury"] = [2440, 3.285, 0, 0.13, 3.59]planets["Venus"] = [6052, 4.867, 0, 0, 8.87]planets["Earth"] = [6371, 5.972, 7530000000, 20.95, 9.81]planets["Mars"] = [3390, 6.39, 0, 0.146, 3.77]planets["Jupiter"] = [69911, 1.898, 0, 0, 25.95]planets["Saturn"] = [58232, 5.683, 0, 0, 11.08]planets["Uranus"] = [25362, 8.681, 0, 0, 10.67]planets["Neptune"] = [24622, 1.024, 0, 0, 14.07]
What we want to achieve is a vector/planet so that we can use cosine-similarity to determine the angular-displacement of each planet from one another. Cosine-similarity is heavily utilised for information-retrieval to determine the respective value of a document against a set of documents. Visually, we want a representation of our planets so we can just measure the angles between them like so:
Determining cosine similarity between two vectors is achieved with:
planetList = [*planets]
#determine cosineSimilarity for each planetresults = {}
for planet in planetList:cosineResults = {}for altPlanet in planetList:if altPlanet is not planet:
#determine cosine similarity
cosineResult = dot(planets\[planet\], planets\[altPlanet\])/(norm(planets\[planet\])\*norm(planets\[altPlanet\]))
#print(planet + " VS " + altPlanet + "===" + str(cosineResult)) cosineResults[altPlanet] = cosineResultresults[planet] = cosineResults
We’ll receive a rational number representing the angular-displacement between these vectors such as:
0.35869123
So we can now interpret this as saying Earth is approximately 36% similar to Mars.
Iterating through our planets and archiving our cosineResult
between Planet N
and Planet N+1
, we’re able to develop a matrix of respective distinction scores amongst our items.
Next, we need to calculate the cost of the relationship between our planets. That is, to understand the cost of travelling from Planet N
and Planet N+1
.
Naturally, this is a multivariable problem involving a range of facets such as distance, time, potential hazards, the impact of solar radiation etc. To concentrate focus on the idea of measuring reward/cost we’ll just consider distance between planets. That is, the cost of “Earth” to “Mars” is the distance between them, which is on average, 225 million km. To keep our scores at a manageable scale, we’ll drop the “million” so our final cost is just 225
.
Computing our score for the trip from Earth to Mars is simply reward/cost:
score = 0.35869123/225 0.00159418324
We reach now an interesting inflexion point where our model for optimising can vary with respect to the amount of domain-knowledge and historic-knowledge available. In some cases there may be data from historic trips (Voyager 1) that we can exploit to understand latent benefits/perils of adventuring to a particular planet.
If such a scenario was present, we could explore our options with a Monte-Carlo Search Tree or an even more rudimentary model like multiple linear regression.
In this case, we don’t have any data like that so we can only consider the relative scores available to us. As such, I think the best way to assess the optimal sample is to then sort our scores with the following structure:
#create list of paired planets and scoresscoreRanks = [[planetA, planetB], score-of-planetA-to-planetB],...]
Sorting in ascending order leaves us with:
import operator
scoreRanks = sorted(scoreRanks, key=itemgetter(1))
Lastly, say we want to retrieve 3-planets that represents the best combination for maximising variance and minimising cost. We can achieve this by recursively iterating through our sorted scoreRanks
.
def planetCombination(self, ranks, length, result):
#iteratively determine optimal ranking and then return the top-number of hits determined by LENGTH
if len(set(result)) == length:return result
else: #pull highest valueif len(result) == 0:result.append(scoreRanks[0][0][0])result.append(scoreRanks[0][0][1])
#find highest value from last itemlastItem = result[-1]
#retrieve highest value from last item that isn't in results for entry in scoreRanks:if entry[0][0] == lastItem and entry[0][1] not in result:result.append(entry[0][1])return self.planetCombination(scoreRanks, 3, result)
Our final outputted list will show us a list of a combination of three planets (our requirement of three is a fixed-input in this example, assigned as the variable length
), that satisfies our base-case.
Github link is here.
The above is an unoptimised solution. If you can think of any superior alternatives or suggestions please let me know!
Thanks!