In my previous article, we introduced the multi-layer perceptron (MLP), which is just a set of stacked interconnected perceptrons. I highly recommend you check my previous post if you are unfamiliar with the perceptron and MLP as will discuss it quite a bit in this article:
An example MLP with two hidden layers is shown below:
However, the problem with the MLP is that it can only fit a linear classifier. This is because the individual perceptrons have a step function as their activation function, which is linear:
So despite stacking our perceptrons may look like a modern-day neural network, it is still a linear classifier and not that much different from regular linear regression!
Another problem is that it is not fully differentiable over the whole domain range.
So, what do we do about it?
Non-Linear Activation Functions!
What is Linearity?
Let’s quickly state what linearity means to build some context. Mathematically, a function is considered linear if it satisfies the following condition:
There is also another condition:
But, we will work with the previously equation for this demonstration.
Take this very simple case: