With my few years of experience in training and using various available open source model , I have learned the hard way of setting various hyper parameters and using them efficiently. I have lost track of sources from where I have collected the info, but, this mostly seems to work for me. So, today, I would like to share the ones I remember. networks None of it might work for you out of the box, but, worked for me most of the times. most of the observations aren’t my intuitions, but, learned from various resources I have gone through. Embedding Dimension If you are trying to create your own vectors for words(anything), the dimension of embedding is slightly tricker to finalize on and it is confusing to know what might work and what might not. This particular drafted formula worked for me most of the scenarios. the dimension of embedding to use: 4th root of total words(Dictionary of your content) Output Feature Map Resolution In image scenario, if you are planning to build your own network, my observation on various networks says, the resolution of your final feature map should be classification the spatial resolution of final feature map= 1/34th of original image resolution (image classification) The same observation on has a slightly different variation of the formula semantic segmentation the spatial resolution of the encoder output feature map = 1/16th of original image resolution (semantic segmentation) The Learning Rate The biggest talking point, the rate, I don’t have a solid formula around it, but, the rough idea is to start high and reduce as you go ahead further, if you are starting from your own randomly initialised weights, then, make it high and move it around, if the weights are initialised from a pre-training network, then keep it slightly low and keep reducing as per your validation score or various formulae around it. learning The formula is, always reduce the learning rate as your training progresses Weight initialisation Always Initialise with pre trained network weights (Imagenet or anything else)The problems with randomly initialised weights are worse than anyone can handle from the resources point of view. LSTM ForgetGate LSTMS: initialise the forget gate biases to higher values, if not it just acts as a sigmoid of your input(when the initialised weights end up being small) (Most of the major libraries do it by default now) Dropout A hard yes (always should be used) BatchNormalisation: Yes! (The paper says, co-variance shift, I say, it is better and faster during training phase) Data preprocessing Zero mean and unit variance ( most of the pre-trained networks you see use this and it works) Mini batch size 32–128 works better (higher batch size than this might not yield better results most of the times) even the lower batch size is good enough, but, might take a longer time to compress An Ensemble Of Models Yes!, and is mostly better than a single model. Make this decision based on the complexity of the problem. These are the few things, I can think, as of now. I will keep updating this, as and when I get to know more. And I say again, these are just a few techniques, which worked for me, might or might not work for you. But, this has been the format for most of the successful previous pre-trained networks.

Rules of thumb for Deep Learning

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

10 Top Open Source AI Technologies For Startups

The Noonification: Use This 7-Step McKinsey Framework to Solve Any Problem (1/10/2023)

The Noonification: A Taxonomy of Inclusiveness (1/11/2024)

The Noonification: What is the InfiniteNature-Zero AI Model? (11/19/2022)

10 Ways AI Has Changed Our Lives

100 Days of AI, Day 8: Experimenting With Microsoft's Semantic Kernel Using GPT-4

10 Top Open Source AI Technologies For Startups

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps