Distance Metrics

Description

Different metrics of distance are convenient for different types of analysis. Flink ML providesbuilt-in implementations for many standard distance metrics. You can create customdistance metrics by implementing the DistanceMetric trait.

Built-in Implementations

Currently, FlinkML supports the following metrics:

MetricDescription
Euclidean Distance$$d(\x, \y) = \sqrt{\sum{i=1}^n \left(x_i - y_i \right)^2}$$
Squared Euclidean Distance$$d(\x, \y) = \sum{i=1}^n \left(xi - y_i \right)^2$$
Cosine Similarity$$d(\x, \y) = 1 - \frac{\x^T \y}{\Vert \x \Vert \Vert \y \Vert}$$
Chebyshev Distance$$d(\x, \y) = \max{i}\left(\left \vert xi - y_i \right\vert \right)$$
Manhattan Distance$$d(\x, \y) = \sum{i=1}^n \left\vert xi - y_i \right\vert$$
Minkowski Distance$$d(\x, \y) = \left( \sum{i=1}^{n} \left( x_i - y_i \right)^p \right)^{\rfrac{1}{p}}$$
Tanimoto Distance$$d(\x, \y) = 1 - \frac{\x^T\y}{\Vert \x \Vert^2 + \Vert \y \Vert^2 - \x^T\y}$$with $\x$ and $\y$ being bit-vectors

Custom Implementation

You can create your own distance metric by implementing the DistanceMetric trait.

  1. class MyDistance extends DistanceMetric {
  2. override def distance(a: Vector, b: Vector) = ... // your implementation for distance metric
  3. }
  4. object MyDistance {
  5. def apply() = new MyDistance()
  6. }
  7. val myMetric = MyDistance()

Back to top