Image by Author
Traditionally, computers used to follow an explicit set of instructions. For instance, if you wanted the computer to perform a simple task of adding two numbers, you had to spell out every step. However, as our data became more complex, this manual approach of giving instructions for each situation became inadequate.
This is where Machine Learning emerged as a game changer. We wanted computers to learn from examples just like we learn from our experiences. Imagine teaching a child how to ride a bicycle by showing it a few times and then letting him fall, figure it out, and learn on his own. That’s the idea behind Machine Learning. This innovation has not only transformed industries but has become an indispensable necessity in today’s world.
Now that we have a basic understanding of the term ”Machine learning“, let us familiarize ourselves with some fundamental terms:
Data
Data is the lifeblood of Machine learning. It refers to the information that a computer uses to learn. This information can be numbers, pictures, or anything else that a computer can understand. This is further divided into 2 categories:
- Training Data: This data refers to the examples that we use to teach the computer.
- Testing Data: After learning, we test the performance of the computer using some new, unseen data referred to as the test data.
Label and Features
Imagine that you are teaching a kid how to differentiate between different animals. The name of the animals (dog, cat, etc) would be the labels while the characteristics of these animals (number of legs, fur, etc) that help you recognize them are the features.
Models
It is the outcome of the Machine Learning process. It is the mathematical representation of the patterns and relationships within the data. It’s like making a map after exploring a new place.
There are four main types of Machine Learning:
Supervised Machine Learning
It is also referred to as guided learning. We provide the labeled dataset to our Machine Learning algorithm where the correct output is already known. Based on these examples it learns the hidden patterns in the data and can predict or correctly classify the new data. The common categories within supervised learning are:
- Classification: Sorting things into separate distinct categories for example classifying pictures as cats or dogs, emails as spam or not spam, etc.
- Regression: It involves predicting numerical values for example price of the house, your GPA, or the number of sales based on certain features.
Unsupervised Machine Learning
Here the computer is provided the unlabelled data without prior hints and it explores the hidden patterns on its own. Just consider that you are handed a box of puzzle pieces with no picture and your task is to group similar pictures to form a complete picture. Clustering is the most common type of unsupervised learning where similar data points are grouped into a group. For example, we can employ clustering to group similar kinds of social media posts and users can follow the sub-topics of their interest.
Semi-Supervised Machine Learning
Semi-supervised learning contains a mix of labeled and unlabelled datasets where the labeled dataset acts as the guiding point in identifying the patterns in data. For example, you give a chef a list of the main ingredients to use but do not provide the complete recipe. So although they don’t have the recipe some hints that might help them to get started.
Reinforcement Learning
Reinforcement learning is also called learning by doing. It interacts with the environment and gets a reward as a penalty for its actions. With time, it learns to maximize the reward and perform well. Imagine that you are training a puppy and you give positive feedback by rewarding him when he behaves well and negative feedback in the form of withholding rewards. Over time, the puppy learns the actions that lead to rewards and also the ones that don’t
Machine Learning, much like the art of cooking, possesses the magical ability to transform raw, disparate elements into profound insights. Just as a skilled chef adeptly combines various ingredients to craft a delicious dish. These are the 6 basic steps used to perform a Machine Learning Task:
Image by Author
1. Data Collection
Data is an important resource and its quality matters a lot. Diverse, more relevant data yields better results. You can think of it as the Chef gathering various ingredients from different markets.
2. Data Preprocessing
Most of our data is not in the desired form. Like washing, chopping, and preparing ingredients before cooking, data preprocessing involves cleaning and organizing data for the learning process. Some common issues that you might face are missing data, outliers, incorrect format, etc.
3. Choosing an Algorithm
Similar to selecting the recipe for a specific dish, you choose an algorithm based on the problem that you are trying to solve. This choice may also be influenced by the type of data that you have.
4. Training the Model
Think of it as the cooking process where we wait unless the flavors come together. Similarly, we let the model learn from the training data. An important concept of learning rate also comes into play here that determines how big of a step your model takes during each iteration of training. If you add too much salt or spice at once, the dish could become overpowering. Conversely, if you add too little, the flavors might not develop fully. The learning rate finds the perfect balance for gradual flavor enhancement.
5. Testing & Evaluation
Once the learning process wraps up, we put it to the test using special test data, much like tasting a dish and examining its appearance before sharing it with others. Common evaluation metrics include accuracy, precision, recall, and F1 score, depending on the problem at hand.
6. Tuning and Iteration
Adjusting the seasoning or ingredients to perfect the dish, you fine-tune your models by introducing more variables, choosing a different learning algorithm, and adjusting parameters or the learning rate.
As we wrap up our exploration of the basics of Machine learning, remember that it’s all about empowering the computers to learn and make decisions with minimal human intervention. Stay curious and keep an eye out for our next articles, where we’ll dive deeper into the various types of machine learning algorithms. Here are some beginner-friendly resources for you to explore further:
Kanwal Mehreen is an aspiring software developer with a keen interest in data science and applications of AI in medicine. Kanwal was selected as the Google Generation Scholar 2022 for the APAC region. Kanwal loves to share technical knowledge by writing articles on trending topics, and is passionate about improving the representation of women in tech industry.