Activation Functions & Non-Linearity: Neural Networks 101 | by Egor Howell

Explaining why neural networks can learn (nearly) anything and everything

Photo by Google DeepMind: https://www.pexels.com/photo/an-artist-s-illustration-of-artificial-intelligence-ai-this-image-was-inspired-by-neural-networks-used-in-deep-learning-it-was-created-by-novoto-studio-as-part-of-the-visualising-ai-pr-17483874/

In my previous article, we introduced the multi-layer perceptron (MLP), which is just a set of stacked interconnected perceptrons. I highly recommend you check my previous post if you are unfamiliar with the perceptron and MLP as will discuss it quite a bit in this article:

An example MLP with two hidden layers is shown below:

A basic two-hidden multi-layer perceptron. Diagram by author.

However, the problem with the MLP is that it can only fit a linear classifier. This is because the individual perceptrons have a step function as their activation function, which is linear:

The Perceptron, which is the simplest neural network. Diagram by author.

So despite stacking our perceptrons may look like a modern-day neural network, it is still a linear classifier and not that much different from regular linear regression!

Another problem is that it is not fully differentiable over the whole domain range.

So, what do we do about it?

Non-Linear Activation Functions!

What is Linearity?

Let’s quickly state what linearity means to build some context. Mathematically, a function is considered linear if it satisfies the following condition:

There is also another condition:

But, we will work with the previously equation for this demonstration.

Take this very simple case:

Source link

What's Hot

Microsoft Released LLM2CLIP: A New AI Technique in which a LLM Acts as a Teacher for CLIP’s Visual Encoder

This Machine Learning Paper Transforms Embodied AI Efficiency: New Scaling Laws for Optimizing Model and Dataset Proportions in Behavior Cloning and World Modeling Tasks

Gradient Boosting | Towards Data Science

Activation Functions & Non-Linearity: Neural Networks 101 | by Egor Howell | Oct, 2023

Gradient Boosting | Towards Data Science

A Practical Framework for Data Analysis: 6 Essential Principles | by Pararawendy Indarjo | Nov, 2024

How I Created a Data Science Project Following CRISP-DM Lifecycle | by Gustavo Santos | Nov, 2024

Leave A Reply Cancel Reply

How ML AI Can Help Businesses Reduce Overhead Costs

How the AI Surge May Help Current WFH Employees

The ultimate contact center automation guide

Top 5AI Development Companies To Transform Your Business | by Amyra Sheldon

Microsoft Released LLM2CLIP: A New AI Technique in which a LLM Acts as a Teacher for CLIP’s Visual Encoder

This Machine Learning Paper Transforms Embodied AI Efficiency: New Scaling Laws for Optimizing Model and Dataset Proportions in Behavior Cloning and World Modeling Tasks

Gradient Boosting | Towards Data Science

No Train, All Gain: Enhancing Deep Frozen Representations with Self-Supervised Gradients

Our Picks

Microsoft Released LLM2CLIP: A New AI Technique in which a LLM Acts as a Teacher for CLIP’s Visual Encoder

This Machine Learning Paper Transforms Embodied AI Efficiency: New Scaling Laws for Optimizing Model and Dataset Proportions in Behavior Cloning and World Modeling Tasks

Gradient Boosting | Towards Data Science

What's Hot

Activation Functions & Non-Linearity: Neural Networks 101 | by Egor Howell | Oct, 2023

Explaining why neural networks can learn (nearly) anything and everything

What is Linearity?

Related Posts

Leave A Reply Cancel Reply