Getting started with basics:
A neural network is a series of algorithms that endeavors to recognize underlying relationships in a set of data through a process that mimics the way the human brain operates. Generally when you open your eyes, what you see is called data and is processed by the Neurons(data processing cells) in your brain, and recognizes what is around you. That’s how similar the Neural Networks works. They take a large set of data draws out the patterns from data, and output what it is.
What is ANN’s ?
Neural networks sometimes called as Artificial Neural networks(ANN’s), because they are not natural like neurons in your brain. They artificially mimic the nature and functioning of Neural network. ANN’s are composed of a large number of highly interconnected processing elements(neurones) working in unison to solve specific problems.
ANNs, like people, like child, they even learn by example.Neural network has several other categories of Networks like Convolutional Neural Networks(CNN), Recurrent Neural Netoworks(RNN), Long Short Term Memory Networks(LSTM).In this blog, we will focus on Convolutional Neural Networks(CNN)
What are convolution Neural networks?
A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm which can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image and be able to differentiate one from the other. The pre-processing required in a ConvNet is much lower as compared to other classification algorithms. While in primitive methods filters are hand-engineered, with enough training, ConvNets have the ability to learn these filters/characteristics.
Computers can not see things as we do, for computers image is nothing but a matrix
The ConvNet architecture consists of three types of layers: Convolutional Layer, Pooling Layer, and Fully-Connected Layer.
CNNs are mostly applied to image data. Every image is a matrix of pixel values. With colored images, particularly RGB(Red,Green,Blue)-based images, the presence of separate color (3 in case of RGB images) introduces an additional ‘depth’ field to the data, making the input 3-dimensional. Hence, for a given RGB image of size, say 255 X 255 (Width x Height) pixels, we’ll have 3 matrices associated with each image, one for each of the color channels. Thus, the image in it’s entirety, constitutes a 3-dimensional structure called the Input Volume(255x255x3).
A feature is a distinct and useful observation or pattern obtained from the input data that aids in performing the desired image analysis. The CNN learns the features from the input images. Typically, they emerge repeatedly from the data to gain prominence.
- A filter (or kernel) is an integral component of the layered architecture.
- It refers to an operator applied to the entirety of the image such that it transforms the information encoded in the pixels.
- The kernels are then convolved with the input volume to obtain so-called ‘activation maps’.
- Activation maps indicate ‘activated’ regions, i.e. regions where features specific to the kernel have been detected in the input.
- The dimension of the filter is the same with the dimension of the input feature map.
This layer is the core of ConvNet. It does most of the computational heavy lifting.
CONV layer will compute the dot product between the kernel and sub-array of an input image same size as a kernel. Then it’ll sum all the values resulted from the dot product and this will be the single pixel value of an output image. This process is repeated until the whole input image is covered and for all the kernels.
- The depth of an output volume represents the number of layers present. This value depends on the number of filters used. In the above image, the depth is 1 as it’s a 2-D image.
- Filter size (f) represents the height and width of the filters used. The depth of a filter is not defined as it’s same as the depth of an input image. For a given image, the filter size is 3.
- Stride (s) represents the step size to take when traversing horizontally and vertically along the height and weight of an input image. For a given image, the stride is equal to 1.
- Padding (p) helps to retain the height and width of an input image. As after processing the size of an input image reduces and hence we cannot model deep networks with many of these layers for small size images. To make . deep networks feasible padding is used. In padding, we add multiple rows and columns around the image. This added pixel has values 0 by default as shown below. One more advantage of padding is that it helps us retain more information on corner pixels as now they will be processed multiple times instead of single time.
All the above hyperparameters are fixed for a given layer. Multiple convolutional layers may have different values. The filter matrix is also a weight matrix for which values need to be trained using backpropagation.
- The input 3-D volume is passed to this layer. The dimension would be H*W*C. H, W, and C represent height, width, and the number of channels respectively.
- There can be K filter used where K represents the depth of an output volume. The dimension of all the K filters is the same which is f*f*C. f is the filter size and c is the number of channels input image has.
- If we have configured padding then will add padding to the input volume. If padding is equal to same then will add one row and one column at each side of the dimension and the value would be zero. Padding is applicable only along the height and width of an input dimension and is applied along each layer.
- After padding, the computation begins. Now we’ll slide our filter starting from the top-left corner. The corresponding values of filter and input volume are multiplied and then the summation of all the multiplied value takes place. Now the filter is slide horizontally taking stride number of step in each slide. So, if stride is 2 we’ll slide 2 columns horizontally. The same process is repeated vertically until the whole image is covered.
- After getting all the values from the filter computation they are passed through Relu activation which is max(0,x). In this, all the negative values obtained are replaced by zero as negative values have no significance in the pixel.
- Step 4 & 5 generates just one layer of an output volume that is the 3-D input volume is transformed into a 2-D volume.
- Now, step 4 & 5 get’s repeated for K filters. And the output of each filter is stacked above one another and hence the depth of an output image is of dimension k.
- Now, to calculate the dimension of an output volume we require all the hyperparameters of a convolutional layer. All the filters used at this layer needs to be trained and are initialized with random small numbers.
The height and weight of an output volume is given by
height, weight = floor( ( W+2*P-F )/S +1 )
depth = K (number of filters used)
This is how the convolutional layer works.
The pooling layer is used for reducing the dimension of an input volume. This layer does not reduce the depth of an input. This layer can be used to reduce the spatial size so that the computational power required to process the image is reduced.
Pooling layer does not lose the important property of an image. Pooling layer extracts the most dominant information and hence maintains the process of effective training of the model.
Two types of pooling are used: Max Pooling and Average Pooling.
In max pooling, the maximum value present in a selected kernel is retained and all the other values are discarded. While on average pooling the average of all the values present in a kernel selected is stored.
Pooling also acts as a noise suppressant. But max pooling performs better than average pooling and hence, it is more frequently used.
- Filter size (f) represents the size of the kernel to be used.
- Stride (s) represents the number of steps to take while sliding the kernel window.
- Padding (p) represents how much padding to apply on an input image. Usually at this layer padding is not used.
The filters used does not need to train. Hence, backpropagation has no effect on this layer. And once the hyperparameters are fixed they never change.
Now, the dimension of an output volume can be calculated similarly as calculated at the convolutional layer. Here, the depth of an output volume is similar to the depth of an input volume.
At the end of convolution and pooling layers, networks generally use fully-connected layers in which each pixel is considered as a separate neuron just like a regular neural network. The last fully-connected layer will contain as many neurons as the number of classes to be predicted. For instance, in CIFAR-10 case, the last fully-connected layer will have 10 neurons.
From here on, we are actually going to do the classification process.
Now that we have converted our input image into a suitable form for our Multi-Level fully connected architecture, we shall flatten the image into one column vector. The flattened output is fed to a feed-forward neural network and backpropagation applied to every iteration of training. Over a series of epochs, the model can distinguish between dominating and certain low-level features in images and classify them.
- Provide the input image into convolution layer.
- Take convolution with featured kernel/filters.
- Apply pooling layer to reduce the dimensions.
- Add these layers multiple times.
- Flatten the output and feed into a fully connected layer.
- Now train the model with backpropagation using logistic regression.
And you have made your convolutional neural network.
There are various architectures of CNNs available which have been key in building algorithms which power and shall power AI as a whole in the foreseeable future. Some of them have been listed below:
1.Decoding Facial Recognition
Facial recognition is broken down by a convolutional neural network into the following major components -
- Identifying every face in the picture
- Focusing on each face despite external factors, such as light, angle, pose, etc.
- Identifying unique features
- Comparing all the collected data with already existing data in the database to match a face with a name.
Convolutional neural networks can also be used for document analysis. This is not just useful for handwriting analysis, but also has a major stake in recognizers. For a machine to be able to scan an individual’s writing, and then compare that to the wide database it has, it must execute almost a million commands a minute. It is said with the use of CNNs and newer models and algorithms, the error rate has been brought down to a minimum of 0.4% at a character level, though it’s complete testing is yet to be widely seen.
CNNs can be used to play a major role in the fight against climate change, especially in understanding the reasons why we see such drastic changes and how we could experiment in curbing the effect. It is said that the data in such natural history collections can also provide greater social and scientific insights, but this would require skilled human resources such as researchers who can physically visit these types of repositories. There is a need for more manpower to carry out deeper experiments in this field.
Introduction of the grey area into CNNs is posed to provide a much more realistic picture of the real world. Currently, CNNs largely function exactly like a machine, seeing a true and false value for every question. However, as humans, we understand that the real world plays out in a thousand shades of grey. Allowing the machine to understand and process fuzzier logic will help it understand the grey area us humans live in and strive to work against. This will help CNNs get a more holistic view of what human sees.
CNNs have already brought in a world of difference to advertising with the introduction of programmatic buying and data-driven personalized advertising.
7.Other Interesting Fields
CNNs are poised to be the future with their introduction into driverless cars, robots that can mimic human behavior, aides to human genome mapping projects, predicting earthquakes and natural disasters, and maybe even self-diagnoses of medical problems. So, you wouldn’t even have to drive down to a clinic or schedule an appointment with a doctor to ensure your sneezing attack or high fever is just the simple flu and not symptoms of some rare disease. One problem that researchers are working on with CNNs is brain cancer detection. The earlier detection of brain cancer can prove to be a big step in saving more lives affected by this illness.
I hope you understand the basic architecture of a CNN and its various application’s now. There are many variations to this architecture but as I mentioned before, the basic concept remains the same. In case you have any doubts/feedback, please comment.