Home

Deep Learning

- Anshul Kashyap; August 6, 2018

Overview:

In today’s entry, we will be looking at the basic concepts of deep learning, a sub field of machine learning that specializes in using artificial neural networks that progressively improve over time. In today’s world, everyone is starting to move towards deep learning because of the potential that it poses for solving real world AI problems. Today we will be learning about different types of neural networks and what each one is really good at.


Deep Learning. What’s the big deal??:

Deep learning is the subfield of Machine Learning that focuses on using artificial neural networks to solve the problem at hand. The reason Deep Learning is becoming such an active field today is because of the amount of positive results it yields, considering the compiling and running time. An intelligent entity in computer science is an agent consisting of multiple algorithms that progressively improve over time. Through training, the algorithms become more efficient and have a higher probability of yielding better results.


An artificial (feed-forward) neural network is a programmable paradigm that consists of:


Let’s Get Started!!:

Let’s start with the basics. There are multiple types of neural networks. For now, I will be focusing on one called the feed-forward neural network. This network can be split up into three parts. As stated above, the input layer, hidden layers, and output layer.

Each neuron in the input layer is connected to each neuron in the corresponding incremented hidden layer. It’s the same thing between the hidden layers and output layer.


Neurons:

The input layer is made up of numerous artificial neurons, each containing a numeric value. When it comes to image recognition, the gray-scale is used meaning that the value of each input neuron is between 0-1. There are two different types of neurons, a sigmoid neuron and a normal perceptron. A sigmoid neuron contains a float value while a normal perceptron contains some sort of threshold value that controls whether or not the neuron should activate or not. If the value is above the threshold, the neuron fires, else it doesn’t. The thing about sigmoid neurons is that it helps complex neural networks come up with more accurate answers as the network progressively gets better. We will be using the concept of sigmoid neurons today.


Synapses:

Each synapse contains some sort of weight that can be changed which allows the network to become “better”. Biases are also another numeric factor that affect the outcome of a network run. In the beginning, the weights and biases are randomly set based on normal distribution.


The Main Idea:

If I started talking about Deep Learning and neural networks in depth, I would keep writing for long time. I just want to give you the basic ideas and concepts to hopefully get you interested into this subject.

In the beginning, the weights and biases are basically set to random numeric values. They are set based on normal distribution. As we progress in the training set, we will be able to filter and refine the network to give it a higher probability of actually solving and classifying a random problem.


Training:

We will be able to train the neural network through two algorithms. The Gradient descent algorithm and the cost function.


Gradient Descent:

Gradient descent is an algorithm that is utilized to reduce the cost function which will thus refine the algorithm’s weights and biases until the changes made to the cost function makes are too minor to be implemented onto the actual neural network. As gradient descent progresses, the cost function decreases. This means that the local minimum we are trying to find gets closer and closer which in turn means that the need for a large cost function is unnecessary because the neural network is progressively getting closer to its’ optimal state.


Cost Function:

The cost function, also known as the loss function is used to find the current effectiveness of the neural network. The cost function helps find the difference between what an optimal neural network’s weight and biases should be versus the current implementation you have. The cost, or loss, is calculated via the difference between the squared values of the current weights and biases and the optimal network’s weights and biases. We then use our gradient descent function to output a vector quantity which leads us to the fastest descending point in relation to your current position. The gradient descent function reduces the output of the loss function over time. Gradient descent can be represented by a 3d topographical graph.


Conclusion:

Over time, the weights and biases start to reach their optimal state thus refining the neural network enough to recognize handwritten digits in this case with a high probability.


Extra Resources: