Managing the Technical Debt of Machine Learning Systems | by John Leung

Explore the practices for sustainably mitigating the cost of speedy delivery—with implementation codes

As the machine learning (ML) community advances over the years, the resources available for developing ML projects are plentiful. For example, we can rely on the generic Python package scikit-learn, which is built on NumPy, SciPy, and matplotlib, for data preprocessing and basic predictive tasks. Or we can leverage the open-source collection of pre-trained models from Hugging Face for analyzing diverse types of datasets. These empower current data scientists to quickly and effortlessly tackle standard ML tasks while achieving moderately good model performance.

However, the abundance of ML tools often leads business stakeholders and even practitioners to underestimate the effort required to build enterprise-level ML systems. Particularly when faced with tight project deadlines, the teams may expedite deploying systems to production without giving sufficient technical considerations. Consequently, the ML system often does not address the business needs in a technically sustainable and maintainable manner.

As the system evolves and deploys over time, technical debts accumulate — The longer the implied cost remains unaddressed, the more costly it becomes to rectify them.

There are multiple sources of technical debts in the ML system. Some are included below.

#1 Inflexible code design to cater to unforeseen requirements

To validate if ML can address the enterprise challenges at hand, many ML projects commence with a proof of concept (PoC). We initially created a Jupyter Notebook or Google Colab environment to explore data, then developed several ad-hoc functions, and created the illusion of nearing project completion for stakeholders. Such systems building directly from PoC may end up consisting mostly of glue code — the supporting code that connects specific incompatible components but itself does not have the functionality of data analysis. They can be spaghetti-like, hard to maintain, and prone to errors.

Source link

What's Hot

Salesforce AI Research Introduces LaTRO: A Self-Rewarding Framework for Enhancing Reasoning Capabilities in Large Language Models

Beginners Guide to The Gemini LLM

Techniques for Chat Data Analytics with Python | by Robin von Malottki | Nov, 2024

Managing the Technical Debt of Machine Learning Systems | by John Leung | Sep, 2023

Techniques for Chat Data Analytics with Python | by Robin von Malottki | Nov, 2024

Gradient Boosting | Towards Data Science

A Practical Framework for Data Analysis: 6 Essential Principles | by Pararawendy Indarjo | Nov, 2024

Leave A Reply Cancel Reply

How ML AI Can Help Businesses Reduce Overhead Costs

How the AI Surge May Help Current WFH Employees

The ultimate contact center automation guide

Top 5AI Development Companies To Transform Your Business | by Amyra Sheldon

Salesforce AI Research Introduces LaTRO: A Self-Rewarding Framework for Enhancing Reasoning Capabilities in Large Language Models

Beginners Guide to The Gemini LLM

Techniques for Chat Data Analytics with Python | by Robin von Malottki | Nov, 2024

Microsoft Released LLM2CLIP: A New AI Technique in which a LLM Acts as a Teacher for CLIP’s Visual Encoder

Our Picks

Salesforce AI Research Introduces LaTRO: A Self-Rewarding Framework for Enhancing Reasoning Capabilities in Large Language Models

Beginners Guide to The Gemini LLM

Techniques for Chat Data Analytics with Python | by Robin von Malottki | Nov, 2024

What's Hot

Managing the Technical Debt of Machine Learning Systems | by John Leung | Sep, 2023

Explore the practices for sustainably mitigating the cost of speedy delivery—with implementation codes

#1 Inflexible code design to cater to unforeseen requirements

Related Posts

Leave A Reply Cancel Reply