Measuring similarity with Euclidean distance in Ruby

Written by Keith McDonnell. Last updated on Monday, November 02, 2009.

Ringo Pulp Fiction 5
Ringo Blade Runner 2
Ringo Casablanca 3
Paul Pulp Fiction 4
Paul Blade Runner 5
Paul Casablanca 3
John Metropolis 3
John Casablanca 3
George Pulp Fiction 2
George Casablanca 5

Comparing films

To compare two films, use each reviewer rating as a point in multidimensional space; e.g. Pulp Fiction P(5,4,2) and Casablanca Q(3,3,5) :

√ (P 1 – Q 1 ) 2 + (P 2 – Q 2 ) 2 + (P 3 – Q 3 ) 2
= √ (5-3) 2 + (4-3) 2 + (2-5) 2
= √ (2) 2 + (1) 2 + (-3) 2
= √ 4 + 1 + 9 ≈ 3.742

Comparing reviewers

To compare two reviewers, only use the films they have both rated. In the case of Ringo and Geogre that’s Pulp Fiction & Casablanca, giving Ringo P(5,3) and George Q(2,5) :

√ (P 1 – Q 1 ) 2 + (P 2 – Q 2 ) 2
= √ (5-2) 2 + (3-5) 2
= √ (3) 2 + (-2) 2
= √ 9 + 4 ≈ 3.605

Therefore, the closer together two films, the more similar they are. Conversely, the further apart, the less similar they are. i.e. Reviewers with a similarity of 0.1 would have similar tastes, whereas reviewers with 10.50 would be dissimilar.

References

Futher reading

You can also use Euclidean distance to:

I’m working through the examples in the Collective Intelligence book by Segaran. You can find them at my Collective Intelligence github repository

If you'd like to discuss this article, you can send me an email keith@dancingtext.com and/or publish an article online and link back to this page.