WDIS AI-ML Series: Module 2 Lesson 1: Objective function - AI is nothing but an optimization problem

Written by

Vinay Roy

Published on

27th Mar 2024

Okay from Module 1, we understand a lot of concepts already. Here is a summary.

So now we can get started and start thinking about the end-to-end process map of implementing a machine learning model. To do that, we will have to understand the most important aspect of a Machine learning Model - **The objective function**.

Imagine you want to teach algebra to your 10-year-old. So that is your objective - To teach your 10-year-old Algebra. You started teaching. A few months have gone by, now how do you know that the kid is on the right track?

A conventional way is to **measure the progress through a scoring method**. Each problem she solves correctly earns her some points, and each mistake takes away some points. The objective function here then is the **total score she gets on the test**. It helps you know how well she is learning algebra. So, if she gets a high score, it means she solved most of the problems correctly and understood the algebra concepts well. But if she gets a low score, it tells you that she might need some more practice and help with algebra. In short, the objective function, in this case, is a measure of how much she has learned and how well she is doing in algebra!

Like humans, Machines need a measure to understand how well it is doing at any given task for which they are being taught/trained.

**Defn: **An objective function in the context of machine learning is **a mathematical function that quantifies how well the algorithm is performing at a particular task**. In simple terms, the Objective function **acts as a thermometer** to measure how well the algorithm is doing. We will look at tons of objective functions in the next few modules.

We all have read some objective functions in our lives even though we may not know them as such or don’t remember them. So let us look at two of them.

- Accuracy Score: Suppose you want to develop a machine learning algorithm that takes images of animals as input and categorizes them as Cat or Dog. How would you evaluate whether the algorithm is performing well?You can see how many Cat images are categorized as ‘Cat’ by the algorithm and how many Dog images are classified as ‘Dog’. This is called the ‘Accuracy Score’. The higher the accuracy score the better the algorithm is said to be performing. Later we will discuss why Accuracy is not a sufficient measure for such cases but for now this is a good enough start.
- Mean Square Error (MSE) score: Let us go back to our introductory statistics class. We want to fit a line to some points as shown below. So which lines L1, L2, L3 describes the data set the best?

You are right. It is the line L2. But how did we know that? Just a visual inspection shows that Line L2 is the best-fitting line. But what does the best fitting line mean? How do we measure it? Let us look at it:

**Defn:** The **best fitting line**, also known as the **regression line** or the **line of best fit**, is a straight line that represents the relationship between two variables, Say X and Y in the above Fig, in a dataset in such a way that it minimizes the overall distance between the observed data points and the line itself.

Okay, but what does minimizing the overall distance between the observed data points and the line mean? Let us look at the same data set as in Fig above through a different lens

In the fig above, there is an observed data point P1. The best-fitting line should be as close to this point P1 as possible. So if Line L2 is the best fitting line then the distance between P1 and L2 i.e. d1 as shown above should be as small as possible. So our goal is to minimize d1. But if we observe more closely, Line L1 is closer to point P1 than Line L2 is. So does that make Line L1 a better fitting line?

Yes, it does, but only for point P1. But remember, our goal was not to fit the line to P1 but to fit the line to all the data points or green dots in the figure above. And that is where the trade-off comes in. Said simply, we can move Line L2 close to P1 and thus minimize d1 but not without increasing the distance d2 from point P2.

So how do we ensure L2 is just at the right distance so that both d1 and d2 get minimized? Not only d1, d2 get minimized but all other points are also at the least possible distance away from the best-fitting line?

This is where we need our objective function. Our goal thus could be to minimize: |d1| + |d2| + |d3| + … + |dn|

Where |d1| is the mathematical notation for absolute distance i.e. how far apart the point is from the line without worrying about which direction the point is. It's just the total distance between a point and the line, plain and simple. Distance is also called error in statistics books because it shows how much error is there in the observed data point when explained by the best fitting line.

Statisticians found that while the above objective function, Absolute Error, is a good approximation, there is an even better objective function **MSE**.

Let's compare Absolute Error (AE), Least Squares Error (LSE), and Mean Squared Error (MSE), which are commonly used in regression analysis to evaluate the performance of predictive models:

**Absolute Error (AE)**:- Absolute error measures the absolute difference between the predicted values and the actual values.
- It is calculated as the sum of the absolute differences between each predicted value and its corresponding actual value.
- AE is robust to outliers because it does not square the errors, but it can be less sensitive to differences between predictions for small errors.

**Least Squares Error (LSE)**:- LSE, also known as the sum of squared errors or the sum of squared residuals, measures the squared difference between the predicted values and the actual values.
- It is calculated as the sum of the squared differences between each predicted value and its corresponding actual value.
- LSE penalizes large errors more heavily than small errors, making it less sensitive to outliers. It is commonly used in ordinary least squares (OLS) regression.

**Mean Squared Error (MSE)**:- MSE is similar to LSE but divides the sum of squared errors by the number of observations, resulting in the average squared error.
- It is calculated as the mean of the squared differences between each predicted value and its corresponding actual value.
- MSE provides a measure of the average magnitude of errors in the predictions, making it easier to compare models with different numbers of observations.

**Salient Differences between AE, LSE, and MSE**:

- AE is the simplest measure and directly represents the average magnitude of errors, but it can be less sensitive to differences between predictions for small errors.
- LSE squares the errors, penalizing large errors more heavily and providing a balance between robustness to outliers and sensitivity to differences.
- MSE is similar to LSE but normalized by the number of observations, making it easier to interpret and compare across different datasets and models.

As we navigate the rest of the course, we will realize that creating an objective function is critical to building the right machine-learning model. However, a harsh truth is that many product and data scientist teams struggle to choose the right objective function because of two reasons:

**Trade-offs and right priorities**: For one problem statement, there could be many objective functions with a trade-off between them. For example, in the above example of classifying Dog and Cat images, we used the Accuracy score and it served our purpose. But later in the modules, we will see some other objective functions for classification models such as Accuracy vs Precision vs Recall vs. Specificity. Don't worry, if you do not know these metrics, we will talk about these metrics in the later modules but for now, it is enough to realize that data scientists do face tough decisions of choosing the right objective function among a myriad of options.In the other example, we also saw a best-fitting line can have multiple objective functions that we can use - Absolute Error (AE), Least Squares Error (LSE), Mean Squared Error (MSE). Which one to choose depends upon many factors and intimacy with the data set and the problem statement.**Finding connect between the Business Objective and the Machine Learning Objective function**: Data scientists love optimizing machine learning metrics. But the truth is companies don't care about Machine learning objectives. They care about business objectives. The only reason they should care about a machine learning objective function is that by moving accuracy from 90% to 92% helps achieve the business objective. This is where the biggest disconnect between data science and the business team is often seen. They are two teams with their respective objective function and without a connecting layer of how one relates to the other.

**💡 Ideally, the machine learning objective function should have a direct or indirect effect on the business metric and both business teams should collaborate as equal partners to make sure that happens**

**Unclear Goals**: Sometimes, the goals of the project or analysis may not be well-defined or may be ambiguous. Without clear objectives, it can be difficult to determine which objective function aligns best with the desired outcomes.**Complexity of the Problem**: In many cases, the problem being addressed by the data scientist is complex and multifaceted. Understanding the nuances of the problem and how different objective functions may impact the results can be challenging.

The objective function is the heart of a machine learning model. Choose the wrong objective function and you will end up wasting a massive amount of technology effort and causing a disappointment about what machine learning can do for the business. However, choosing the right objective function is not just a technical challenge but a cultural challenge that connects business imperatives, data quality, autonomy of the data science team, and an enviable collaboration.

In the next set of lessons, we will talk about the whole end to end process map of taking a problem statement from building a business objective function to choosing the right machine learning objective function to picking the right machine learning model to deploying the model into production. It will get exciting, hang on and keep reading. The best is yet to come.

As a photographer, it’s important to get the visuals right while establishing your online presence. Having a unique and professional portfolio will make you stand out to potential clients. The only problem? Most website builders out there offer cookie-cutter options — making lots of portfolios look the same.

That’s where a platform like Webflow comes to play. With Webflow you can either design and build a website from the ground up (without writing code) or start with a template that you can customize every aspect of. From unique animations and interactions to web app-like features, you have the opportunity to make your photography portfolio site stand out from the rest.

So, we put together a few photography portfolio websites that you can use yourself — whether you want to keep them the way they are or completely customize them to your liking.

Here are 12 photography portfolio templates you can use with Webflow to create your own personal platform for showing off your work.

Subscribe to our newsletter to receive our latest blogs, recommended digital courses, and more to unlock growth Mindset

Thank you for subscribing to our newsletter!

Oops! Something went wrong while submitting the form.

By clicking Subscribe, you agree to our Terms and Conditions