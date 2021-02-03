I explain Artificial Intelligence terms and news to non-experts.
In this video, I explain what convolutions and convolutional neural networks are, and introduce, in detail, one of the best and most used state-of-the-art CNN architectures in 2020: DenseNet.
The Convolutional Neural Networks
A … convolution?
Training a CNN
The activation function: ReLU
The pooling layers: Max-Pooling
The state-of-the-art CNNs: A quick history
The most promising CNN architecture: DenseNet
Conclusion
facial recognition targeted ads image
recognition
video analysis animali detection these
are all powerful ai applications
you must already have heard of at least
once
but do you know what they all have in
common they are all using the same type
of neural network architecture
the convolutional neural network they
are the most used type of neural
networks
and the best for any computer vision
applications
once you understand these you are ready
to dive into the field and become an
expert
the convolutional neural networks are a
family of deep neural networks that uses
mainly convolutions to achieve the task
expected as the name says convolution is
the process
where the original image which is our
input in a computer vision application
is convolved using filters that detects
important small features of an image
such as edges the network will
autonomously learn filter's value that
detect
important features to match the output
we want to have
such as the name of the object in a
specific image
sent as input these filters are
basically squares of size
3x3 or 5x5 so they can detect the
direction
of the edge left right up or down
just like you can see in this image the
process of convolution makes a dot
product between the filter and the
pixels it faces
then it goes to the right and does it
again convolving the whole
image once it's done these give us the
output of the first convolution layer
which is called
a feature map then we do the same thing
with another filter
giving us many filter maps at the end
which are all sent into the next layer
as input to produce
again many other feature maps until it
reaches the end of the network with
extremely detailed general information
about what the image contains there are
many filters and the numbers inside
these filters are called the weights
which are the parameters trained during
our training phase
of course the network is not only
composed of convolutions
in order to learn we also need to add an
activation function
and a pooling layer between each
convolution layer
basically these activation functions
make possible the use of the back
propagation technique
which basically calculates the error
between our guess
and the real answer we were supposed to
have
then propagating this error throughout
the network
changing the weights of the filters
based on this error
once the propagated error reaches the
first layer another example is fed to
the network
and the whole learning process is
repeated thus iteratively improving our
algorithm
this activation function is responsible
for determining
the output of each convolution
computation and reducing the complexity
of our network
the most popular activation function is
called the real u
function which stands for rectified
linear
unit it puts to zero any negative
results which are known to be harmful to
the network
and keeps positive values the same
having all these zeros make the network
much more efficient to train in
computation time
since a multiplication with zero will
always equal
zero then again to simplify our network
and reduce the numbers of parameters
we have the pooling layers typically
we use a two by two pixels window and
take the maximum value of this window to
make the first pixel of our feature map
this is known as max pooling then we
repeat this process for the whole
feature map
which will reduce the x y dimensions of
the feature map
thus reducing the number of parameters
in the network the deeper we get into it
this is all done while keeping the most
important information
these three layers convolution
activation and pooling layers can be
repeated multiple times in a network
which we call our conf layers making the
network
deeper and deeper finally there are the
fully connected layers that learn a
non-linear function
from the last pooling layer's outputs it
flattens the multi-dimensional
volume that is resulted from the pooling
layers into a one-dimensional vector
with the same amount of total parameters
then we use this vector in a small fully
connected neural network
with one or more layers for image
classification
or other purposes resulting in one
output per image
such as the class of the object of
course
this is the most basic form of
convolutional neural networks
there have been many different
convolutional architectures
since lenet5 by jan lacun in 1998
and more recently with the first deep
learning network
applied in the most popular object
recognition competition
with the progress of the gpus the alex
net network in 2012
this competition is the imagenet
large-scale visual recognition
competition
rls vrc where the best object detection
algorithms were competing every year
on the biggest computer vision data set
ever created
imagenet it exploded right after this
year
where new architectures were beating the
precedent one
and always performing better until today
nowadays most state-of-the-art
architectures perform
similarly and have some specific use
cases
where they are better you can see here a
quick comparison of the most used
architectures in 2020
this is why i will only cover my
favorite network in this video which is
the one that yields the best results in
my researches
densenet it is also the most interesting
and promising cnn architecture in my
opinion please let me know in the
comments if you would like me to cover
any other type of network architecture
the densenet family first appeared in
2016
in the paper called densely connected
convolutional
networks by facebook ai research
it is a family because it has many
versions
with different depth ranging from 121
layers
with 0.8 million parameters
up to a version with 264
layers with 15.3 million parameters
which is smaller than the 101 layers
deep
resnet architecture as you can see here
the densnet architecture uses the same
concepts of convolutions
pooling and the relu activation function
to work
the important detail and innovation in
this network architecture
are the dense blocks here is an example
of a five-layer dense block in these
dense blocks
each layer takes all the preceding
feature maps as input
thus helping the training process by
alleviating the vanishing gradient
problem
this vanishing gradient problem appears
in really deep
networks where they are so deep that
when we back propagate the error into
the network
this error is reduced at every step and
eventually becomes
zero these connections basically allow
the error to be propagated
further without being reduced too much
these connections also encourage feature
reuse and reduce the numbers of
parameters
for the same reason since it's reusing
previous feature maps information
instead of generating more parameters
and therefore
accessing the network's collective
knowledge and reducing the chance of
overfitting
due to this reduction in total
parameters
and as i said this works extremely well
reducing the number of parameters by
around 5 times compared to a
state-of-the-art resnet architecture
with the same number of layers
the original dense net family is
composed of four dense blocks
with transition layers which do
convolution
and pooling as well and a final
classification layer if we are working
on an image classification task
such as the rls vrc competition
the size of the dense block is the only
thing changing for
each version of the densenet family to
make the network
deeper of course this was just an
introduction to the convolutional
neural networks and more precisely the
dense net architecture
i strongly invite you to further read
about these architectures if you want to
make a well thought choice for your
application
the paper and github links for densenet
are in the description of the video
please let me know if you would like me
to cover any other architecture
please leave a like if you went this far
in the video
and since there are over 90 of you guys
watching that are not subscribed yet
consider subscribing to the channel to
not miss any further news clearly
explained
thank you for watching
