cosine similarity vs. Euclidean distance

In NLP, we often come across the concept of cosine similarity. Especially when we need to measure the distance between the vectors. I was always wondering why don’t we use Euclidean distance instead. I understand cosine similarity is a 2D measurement, whereas, with Euclidean, you can add up all the dimensions.

d(p, q) = \sqrt{(p_1- q_1)^2 + (p_2 - q_2)^2+\cdots+(p_i - q_i)^2+\cdots+(p_n - q_n)^2}.

So here I find a ‘Grok’ explanation on Quora.

You are a very polite person and you liked my in the comment section you have written “good” 4 times and “helpful” 8 times(just numbers!! :))…something like….” a very good answer which is too much helpful. It will be helpful for good understanding. People who are not that good in maths..Can find the answer helpful…”…and so on….

A friend of you..Who doesn’t talk much..Might write just- “good and helpful..I found it helpful for my studies”

What is the count? “Good”-1, and “helpful”-2

If I try to find the cosine similarities between these comments(or..Documents, as told in a miner’s term :))..It will be exactly 1! (Refer Google to see the formula, it’s ultra easy)

There you go, with cosine similarity, you measure the similarity of the direction instead of magnitude. 

Author: Lucia

Twitter: ML_made_simple, YouTube channel: ML_made_simple

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: