Too Long; Didn't Read
Using Optuna to Search for Tiny RL Policies, I used the Optuna framework to search for trivial policies in an environment. I decided to use CMA-ES1 as my optimization method to find a faster solution and find a solution faster and faster. I used Optuna directly optimize each parameter in the weight array. This approach scales really poorly, since Optuna is designed to optimize hyperparameters and it suggests one number at a time. Check out my code at this link if you’re interested.