ML-Gradient Descent in Machine Learning and AI

5 min readAug 29, 2024

Gradient Descent is one of the most fundamental optimization techniques used in machine learning. It is an iterative algorithm used to minimize a loss function, which measures how well a model’s predictions align with the actual outcomes.

Explain with simple eg.

Think like we’re standing at the top of a hill, and we want to get to the bottom as quickly as possible. But it’s very foggy, so you can’t see where you’re going. What do you do?

Look Around: we look around to see which direction goes downhill the most. This is like finding the steepest slope.
Take a Step: we take a small step in that direction. Now we’re a little closer to the bottom.
Repeat: we look around again, find the steepest direction, and take another step.
Keep Going: we keep doing this — step by step, always going downhill — until you reach the bottom of the hill.

This is just like Gradient Descent:

Looking Around: Finding the steepest slope is like calculating the gradient. It tells you which way to go to get downhill fastest.
Taking a Step: Moving a little bit in that direction is like updating your guess to be a better one.
Keep Going: Repeating this process over and over helps you get closer to the best solution.

In Math and Computers:

When we use Gradient Descent in math or on computers, we’re trying to find the lowest point (the minimum) of a function, which is like the bottom of the hill. We do this to make our predictions as accurate as possible!

Why Do We Need Gradient Descent?

Optimization of Parameters:

In machine learning, models are trained by adjusting parameters (weights and biases) to reduce the difference between the predicted and actual outcomes.
Gradient Descent helps in finding the optimal parameters that minimize this difference.

Scalability:

Gradient Descent can handle large datasets and high-dimensional data, making it suitable for training complex models like deep neural network.

Convergence to Minimum:

The goal of Gradient Descent is to reach the global minimum (or a local minimum) of the loss function, where the error between the predicted and actual values is minimized in the data.

How Does Gradient Descent Work?

Gradient Descent works by iteratively adjusting the model parameters in the direction that reduces the loss function the most rapidly, based on the gradient (or derivative) of the function of math .

Steps of Gradient Descent:

Initialize Parameters:

Start with random values for the model parameters (weights and biases).

Compute the Gradient:

Calculate the gradient of the loss function with respect to each parameter. The gradient represents the slope of the loss function, indicating the direction and rate of the fastest increase.

Update Parameters:

Adjust the parameters in the opposite direction of the gradient by a factor known as the learning rate. This step is repeated iteratively.
Update Rule:

Repeat Until Convergence:

Continue updating the parameters until the change in the loss function is below a certain threshold, indicating convergence to a minimum loss .

Types of Gradient Descent:

Batch Gradient Descent:

How it works: Computes the gradient of the loss function with respect to the entire dataset, which can say offline data.
Pros: Stable updates, smooth convergence.
Cons: Can be slow and computationally expensive for large datasets.

Stochastic Gradient Descent (SGD):

How it works: Computes the gradient using a single data point at a time.
Pros: Faster updates, can escape local minima due to noisy updates.
Cons: Updates are noisy, leading to potential oscillations in the loss function.

Mini-batch Gradient Descent:

How it works: Computes the gradient using a small subset (mini-batch) of the dataset.
Pros: Balances the stability of batch gradient descent and the speed of SGD.
Cons: Choosing the mini-batch size can be challenging.

Example of Gradient Descent in Python:

Let’s see an example of implementing Batch Gradient Descent to find the minimum of a simple quadratic function.

Python Code Example:

import numpy as np
import matplotlib.pyplot as plt

# Objective function: f(x) = x^2
def objective_function(x):
    return x ** 2

# Gradient of the objective function: f'(x) = 2x
def gradient(x):
    return 2 * x

# Gradient Descent Algorithm
def gradient_descent(starting_point, learning_rate, iterations):
    x = starting_point
    x_history = [x]
    for _ in range(iterations):
        x -= learning_rate * gradient(x)  # Update the parameter x
        x_history.append(x)
    return x, x_history

# Parameters
starting_point = 10  # Starting value for x
learning_rate = 0.1  # Learning rate
iterations = 50      # Number of iterations

# Run Gradient Descent
final_x, x_history = gradient_descent(starting_point, learning_rate, iterations)

# Print the final value of x
print(f"Optimized x: {final_x}")

# Plotting the function and the optimization path
x_values = np.linspace(-10, 10, 400)
y_values = objective_function(x_values)

plt.plot(x_values, y_values, label="f(x) = x^2")
plt.plot(x_history, [objective_function(x) for x in x_history], 'ro-', label="Gradient Descent Path")
plt.title("Gradient Descent Optimization")
plt.xlabel("x")
plt.ylabel("f(x)")
plt.legend()
plt.show()

Explanation of the Code:

Objective Function:

f(x) = x^2 is the function we want to minimize. It represents a parabola, with the minimum at x=0 x=0.

Gradient Function:

f'(x) = 2x is the derivative (gradient) of the objective function. The gradient shows the slope of the function at any point xx.

Gradient Descent Algorithm:

Initialization: Start from an initial point (here, starting_point = 10).
Iteration: For each iteration, adjust xx by moving in the opposite direction of the gradient (x -= learning_rate * gradient(x)).
Convergence: Continue the iterations until the function converges to the minimum.

Visualization:

The plot shows the function and the path taken by Gradient Descent to reach the minimum, demonstrating how the algorithm converges to the optimal value.

When to Use Gradient Descent?

Training Machine Learning Models: Gradient Descent is widely used to train linear regression, logistic regression, neural networks, and more.
Minimizing Loss Functions: It is applied to minimize any differentiable loss function.
Large Datasets and High-Dimensional Data: Suitable for large-scale machine learning problems where the dataset is too big to fit into memory.

Gradient Descent is a powerful tool for optimization in machine learning, and understanding its fundamentals is crucial for building effective models.