Gradient Descent algorithm is an iterative first-order optimisation method to find the function’s local minimum (ideally global). Its basic implementation and behaviour I’ve described in my other article here. This one focuses on three main variants in terms of the amount of data the algorithm uses to calculate the gradient and to make steps.

These 3 variants are:

- Batch Gradient Descent (BGD)
- Stochastic Gradient Descent (SGD)
- Mini-Batch Gradient Descent (mBGD)

In this article, we will see their performance in a simple linear regression task.

A quick recap — a univariate linear function is defined as:

**Gradient descent** (GD) is an iterative first-order optimisation algorithm used to find a local minimum/maximum of a given function. This method is commonly used in *machine learning* (ML) and *deep learning*(DL) to minimise a cost/loss function (e.g. in a linear regression). Due to its importance and ease of implementation, this algorithm is usually taught at the beginning of almost all machine learning courses.

However, its use is not limited to ML/DL only, it’s being widely used also in areas like:

- control engineering (robotics, chemical, etc.)
- computer games
- mechanical engineering

That’s why today we will get a deep dive into the…

Have you ever wondered how our life in the future may look like? What technology will bring us? How will it influence our lifestyle? Let us look at one possible scenario.

**Smart Homes**

It’s Saturday morning, automatic window curtains open when your alarm clock goes off. You open your eyes and your voice assistant Amazon Alexa welcomes you and briefly summarizes your today’s schedule. Meantime a smart cafe machine starts preparing your favorite café Americano and a toast roaster heats two slices of bread for your breakfast. When you go under the digital shower the intelligent system remembers your preferred…

On a job market, every job-seeker tries to maximize their chances of a successful outcome (getting a work offer). One of the popular discussion topics between themselves and job advisors is a recommended number of active applications and job interview per month. Having it too small means not giving yourself enough opportunities/chances for the success but also having too many appointment means not being able to properly prepare for them.

So how to strike a happy medium? So here’s where a particular statistical approach comes to play - thanks to a binomial distribution and some assumptions about ourselves we can…

**1. Introduction**

Linear regression is one of the most important and popular predictive techniques in data analysis. It’s also one of the oldest - famous C.F. Gauss at the beginning of 19th-century was using it in the astronomy for calculation of orbits (more).

Its objective is to fit the best line (or a hyper-/plane) to the set of given points (observations) by calculating regression function parameters that minimize specific cost function (error), e.g. mean squared error (MSE).

As a reminder, below there is a linear regression equation in the expanded form.

An aerospace design engineer and a data science enthusiast. www.linkedin.com/in/robertkwiatkowski01