Explore the practices for sustainably mitigating the cost of speedy delivery—with implementation codes
As the machine learning (ML) community advances over the years, the resources available for developing ML projects are plentiful. For example, we can rely on the generic Python package scikit-learn, which is built on NumPy, SciPy, and matplotlib, for data preprocessing and basic predictive tasks. Or we can leverage the open-source collection of pre-trained models from Hugging Face for analyzing diverse types of datasets. These empower current data scientists to quickly and effortlessly tackle standard ML tasks while achieving moderately good model performance.
However, the abundance of ML tools often leads business stakeholders and even practitioners to underestimate the effort required to build enterprise-level ML systems. Particularly when faced with tight project deadlines, the teams may expedite deploying systems to production without giving sufficient technical considerations. Consequently, the ML system often does not address the business needs in a technically sustainable and maintainable manner.
As the system evolves and deploys over time, technical debts accumulate — The longer the implied cost remains unaddressed, the more costly it becomes to rectify them.
There are multiple sources of technical debts in the ML system. Some are included below.
#1 Inflexible code design to cater to unforeseen requirements
To validate if ML can address the enterprise challenges at hand, many ML projects commence with a proof of concept (PoC). We initially created a Jupyter Notebook or Google Colab environment to explore data, then developed several ad-hoc functions, and created the illusion of nearing project completion for stakeholders. Such systems building directly from PoC may end up consisting mostly of glue code — the supporting code that connects specific incompatible components but itself does not have the functionality of data analysis. They can be spaghetti-like, hard to maintain, and prone to errors.