Linear Algebra for Machine Learning: Vectors, Matrices, Dot Product & Cosine Similarity Explained

Intro

Linear algebra concepts are hard to grasp theoretically. Visualising them as transformations of space makes them easier to understand. This is my breakdown of key concepts I studied over the first few days.

The above playlist is what I am using to learn these concepts and understand them, and then later ask ChatGPT to help me learn those concepts on how they work in machine learning.

Learnings

Let’s get to the part of my understanding of my day 1 to day 3 of learning linear algebra for machine learning.

Vectors

These are an ordered list of numbers that represent the magnitude and the direction in a space or a plane.

Vectors representation in Machine Learning:

We can say that vectors are used in embeddings in machine learning, i.e. a set of data turned into points in a space
Example: [height, weight, name] => [0.1, -1.8, …..]

Matrix

The matrix represents how the basis vectors are transformed. The term transformed here can be elaborated on how the entire space is stretched, rotated, or skewed.

Determinants

The determinant measures how much a transformation scales the area (or volume in higher dimensions).

If the determinant = 0, the transformation collapses space, losing information.

The term determinants tells us how much change has occurred in an area after a transformation is done.

For example:

The i-cap and the j-cap are currently with the value of (1, 0) and (0, 1), respectively, then the area of those is said to be 1, which is the determinant of i-cap and j-cap. Let us say that a transformation has been made such that the value of i-cap and j-cap scales, so that they become 2i-cap and 2j-cap, then the determinant is 4.

In the above example, when the transformation scaled the space/plane, the cap values produced a different determinant with respect to the scale transformation.

Dot product

The dot product measures how strongly two vectors align, combining both their direction and magnitude.

It is computed by multiplying corresponding elements/vectors and summing them.

Geometrically:

0 → vectors are perpendicular
Large positive → vectors are aligned
Negative → vectors point in opposite directions

In machine learning, the dot product is used to measure similarity between embeddings.

Examples:

a = [1, 0]
b = [0, 1]
Dot product = 0
What does that mean geometrically - perpendicular to each other, and hence the similarity is 0

a = [1, 2]
b = [2, 4]
Question:
Dot product = 8
Why is it large? = It is large because both vectors point in the same direction, maximising alignment.

My final understanding of the dot product:

Let us say that, if we have two vectors of the same dimension and length, calculating their dot product means pairing up all the coordinates, multiplying them, and adding the results of each pair.

Cosine similarity

In the earlier example, the dot product between two aligned vectors was large because both their direction and magnitude contributed to the result.

This creates a problem: vectors with larger magnitudes produce higher dot products, even if their directional similarity is the same.

To remove the effect of magnitude and focus only on directional alignment, we use cosine similarity.

Cosine similarity normalises both vectors and measures the cosine of the angle between them, giving a value between -1 and 1.

Dot product → “alignment + size”

Cosine similarity → “alignment only”

a = [1, 2]
b = [2, 4] → dot = 8, cosine = 1

c = [10, 20] → dot = 50, cosine = 1

Even though magnitudes changed, cosine similarity stays the same because direction didn’t change.

That’s all for this blog. I will share further learnings in my upcoming blogs. Thanks for reading.

PS: If you find any misunderstandings I have in any concepts, it will mean a lot to let me know about it here: sharathlingam.s@gmail.com

Linear Algebra - Part 1