The Proof of Learning in Machine Learning/AI | by Rômulo Pauliv

This algorithm is known as “Gradient Descent” or “Method of Steepest Descent,” being an optimization method to find the minimum of a function where each step is taken in the direction of the negative gradient. This method does not guarantee that the global minimum of the function will be found, but rather a local minimum.

Discussions about finding the global minimum could be developed in another article, but here, we have mathematically demonstrated how the gradient can be used for this purpose.

Now, applying it to the cost function E that depends on the n weights w, we have:

To update all elements of W based on gradient descent, we have:

And for any nth element 𝑤 of the vector W, we have:

Therefore, we have our theoretical learning algorithm. Logically, this is not applied to the hypothetical idea of the cook, but rather to numerous machine learning algorithms that we know today.

Based on what we have seen, we can conclude the demonstration and the mathematical proof of the theoretical learning algorithm. Such a structure is applied to numerous learning methods such as AdaGrad, Adam, and Stochastic Gradient Descent (SGD).

This method does not guarantee finding the n-weight values w where the cost function yields a result of zero or very close to it. However, it assures us that a local minimum of the cost function will be found.

To address the issue of local minima, there are several more robust methods, such as SGD and Adam, which are commonly used in deep learning.

Nevertheless, understanding the structure and the mathematical proof of the theoretical learning algorithm based on gradient descent will facilitate the comprehension of more complex algorithms.

References

Carreira-Perpinan, M. A., & Hinton, G. E. (2005). On contrastive divergence learning. In R. G. Cowell & Z. Ghahramani (Eds.), Artificial Intelligence and Statistics, 2005. (pp. 33–41). Fort Lauderdale, FL: Society for Artificial Intelligence and Statistics.

García Cabello, J. Mathematical Neural Networks. Axioms 2022, 11, 80.

Geoffrey E. Hinton, Simon Osindero, Yee-Whye Teh. A Fast Learning Algorithm for Deep Belief Nets. Neural Computation 18, 1527–1554. Massachusetts Institute of Technology

LeCun, Y., Bottou, L., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.

Source link

What's Hot

What is Test Time Training

Excel Spreadsheets Are Dead for Big Data. Companies Need More Python Instead. | by Ari Joury, PhD | Nov, 2024

Balancing Accuracy and Speed in RAG Systems: Insights into Optimized Retrieval Techniques

The Proof of Learning in Machine Learning/AI | by Rômulo Pauliv | May, 2024

Excel Spreadsheets Are Dead for Big Data. Companies Need More Python Instead. | by Ari Joury, PhD | Nov, 2024

How to Reduce Python Runtime for Demanding Tasks | by Jiayan Yin | Nov, 2024

Exploring Music Transcription with Multi-Modal Language Models | by Jon Flynn | Nov, 2024

Leave A Reply Cancel Reply

How ML AI Can Help Businesses Reduce Overhead Costs

How the AI Surge May Help Current WFH Employees

The ultimate contact center automation guide

Top 5AI Development Companies To Transform Your Business | by Amyra Sheldon

What is Test Time Training

Excel Spreadsheets Are Dead for Big Data. Companies Need More Python Instead. | by Ari Joury, PhD | Nov, 2024

Balancing Accuracy and Speed in RAG Systems: Insights into Optimized Retrieval Techniques

Meet Memoripy: A Python Library that Brings Real Memory Capabilities to AI Applications

Our Picks

What is Test Time Training

Excel Spreadsheets Are Dead for Big Data. Companies Need More Python Instead. | by Ari Joury, PhD | Nov, 2024

Balancing Accuracy and Speed in RAG Systems: Insights into Optimized Retrieval Techniques

What's Hot

The Proof of Learning in Machine Learning/AI | by Rômulo Pauliv | May, 2024

References

Related Posts

Leave A Reply Cancel Reply