AI, blockchain, bioengineering
Exploration versus exploitation is a topic that merits its own study in the field of AI and reinforcement learning. But it is far more than that. We consider this tradeoff every day in our lives, influencing every decision we make.
A bit of background on what exploration versus exploitation is. Consider the graphic below.
You are at the "X" in the lower left corner. The squares with borders are those that you can see. Your goal is to capture as many green rewards as possible. Red squares with a negative integer indicate a loss in reward.
You have a choice. You can advance to the "5" square and end the game with a reward of 5. Or you can explore the environment and possibly finish with a reward of 49 (50 - 1 is the maximum). However, you risk losing reward, returning to the "5" square and ending the game with a reward of less than 5.
The exploitation of knowledge we currently have is a task we constantly balance with the exploration of new knowledge. Do you start working after high school or go to college? Do you apply to a specialized program or explore various majors? Do you marry the first person you meet or date several? Do you hire the fourth person you interview or only after you've interviewed at least twenty others? Do you order the same steak and fries or try something new?
These might seem either trivial or questions with existing, well-defined approaches.
However, the danger lies when we make the exploration/exploitation tradeoff without being fully aware of it.
A familiar example is that of starting a tech company. The reason why so many fail is often cited as, "No one used the product." However, the underlying reason was that the founders exploited their first vision of their business model and executed on that before sufficiently exploring the search space.
They could have explored the environment by talking to customers, running A/B tests, validating the market, testing pricing - other Lean Startup techniques. While this would have consumed valuable time at the onset, it would have produced a much better decision algorithm that could have been exploited later, saving time in sum.
I recommend a three-step process for reducing exploration costs and improving exploitation costs.
Weak, high-threat assumptions should be explored first and turned into strong assumptions. If there are none of these, it might be time to increase the ratio of exploitation to exploration you pursue.
This systematically de-risks your course of action and makes you well aware of the exploration/exploitation scale you have been implicitly balancing.
Create your free account to unlock your custom reading experience.