Sometimes we run into extravagant ideas that end up bearing no fruits. I spent the good part of the last day trying to nail down the dropout regularization technique in neural networks to derive from the Braess Paradox in Game theory congestion games. Sadly enough for my Academic Machine learning career, this doesn’t seem to hold. I will shortly explain what is Dropout, what is the Braess Paradox, Why would they be related and what did I find experimenting with it. is a standard regularization technique when training Neural network to prevent overfitting. Instead of training the whole fully connected layers at once, you sample network paths with probability p, train the sub-neural-nets created in the process, then later combine them together to your final neural net on which you test and perform actual predictions. It is shown in the research paper and in many later usages of the technique that it does in fact improve the test accuracy and prevents overfitting. Dropout The Braess paradox is a paradox in the theory of Congestion games. Like other phenomenon as the and the , it shows how things can go wrong in multi-player games where the players aim to maximize their gain. The weird thing about the Braess paradox though, is that by adding more edges to a transportation network, presumably ‘improving the infrastructure’, congestion can grow by up to 4/3. tragedy of the commons prisoner’s dilemma In the example in the illustration, which I gratefully taken from the article , say the edges marked with x make up for ½ an hour latency for a single player but 1 hour if both players use them. You can examine that in the case on the right, equilibrium will be for the two players to use each a different colored path, one using the red and one using the blue. In the case on the left though, before removing the diagonal connective edge actually make both players use the same route, ending up with 2h latency. This is the equilibrium and no player will switch to a different route. here The Braess paradox have been since investigated thoroughly as it could have real-life implications. What’s the use of adding roads to the public highway system if it could actually worsen congestion ? Instead of being a linear straight-forward question of resource allocation, building new infrastructure becomes worrisome. The Braess paradox can also appear in any network — communication networks, digital currencies exchange rates and there are even a bunch of articles about it in . appearing physics It was especially the Physical examples that drove my attention to the question - Could the Braess Paradox appear in Neural nets ? As in physics it’s hard to claim there are selfish agents whose action cause a non-optimal equilibrium to appear. Combining this and the fact that in Neural nets: The standard way of operating is arranging the neurons in layers, rather than a full mesh network We have the Dropout technique mentioned before, where removing network edges helps optimization We have CNNs and other structures where locality and reduced connectivity helps optimization I had the Eureka moment of: Aha ! The two must be related. As I said before, not all Eureka moments were born equal, and this one proved to be rather superficial than profound. Who knows though, maybe someone reading this post will find the missing link. To check this hypothesis, I set out with the following code: What I basically did was build a small neural net in the shape of the Braess network, and by playing with different regularization, learning rates and especially , find examples where the network optimizes better or faster when the diagonal edge is removed. Although I did find many examples for it, the expected behavior in a case where the source of trouble is the Braess paradox is that the diagonal link will draw all the weights to itself. So, in a way, while optimizing the neural network weights and bias to fit the data, the analogue of latency will be loss, and with the correct — or rather incorrect — set of weights the optimization process will neglect all other paths but the one involving the diagonal path. initial weights Sadly, examining 100K different initial weights I found no example of the weights behaving this way by the end of the Neural net optimization. When the reduced-edge net performed better than the dense-net, the weights on the diagonal path were very actually very low, in one or two orders of magnitude than other weights in the network. Thinking about it, the neural net is not really susceptible for the Braess paradox, for two main reasons: The functions on the edges don’t behave so linearly and nice. But this does not imply immediately the Braess paradox can’t appear there, and also there are examples of Braess paradox in more complex functions. The optimization of the Neural net as done in Backpropagation is done all at once. All the weights are updated according to one loss function that propagates back rather than to each path by itself. Though I’m not a physics expert, I believe that this differs even from the physics settings of Braess paradox, where the different springs and strings or electrons behave in a way like independent agents optimizing their own path. If the neural net was optimized by a greedy algorithm of some sort, it might have happened that the Braess paradox would appear explicitly. Anyway, it was fun experimenting and thinking harder about how Neural nets should be constructed. For more about the Braess paradox please read at: https://homepage.ruhr-uni-bochum.de/Dietrich.Braess/#eng http://theory.stanford.edu/~tim/papers/rbp.pdf

Is the Braess Paradox related to Dropout in Neural Nets ?

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A better incentivization for Swarm

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

A better incentivization for Swarm

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps