Deeplearning Algorithms tutorial

谷歌的人工智能位于全球前列,在图像识别、语音识别、无人驾驶等技术上都已经落地。而百度实质意义上扛起了国内的人工智能的大旗,覆盖无人驾驶、智能助手、图像识别等许多层面。苹果业已开始全面拥抱机器学习,新产品进军家庭智能音箱并打造工作站级别Mac。另外,腾讯的深度学习平台Mariana已支持了微信语音识别的语音输入法、语音开放平台、长按语音消息转文本等产品,在微信图像识别中开始应用。全球前十大科技公司全部发力人工智能理论研究和应用的实现,虽然入门艰难,但是一旦入门,高手也就在你的不远处! AI的开发离不开算法那我们就接下来开始学习算法吧!

拉普拉斯特征映射(Laplacian Eigenmaps)

拉普拉斯特征映射(Laplacian Eigenmaps,LE),就是将核函数方法应用在局部保持投影的非监督降维方法,也是无监督降维的一种方法,LE方法的直观思想是希望相互间有关系的点(在图中相连的点)在降维后的空间中尽可能的靠近。LE可以反映出数据内在的流形结构。

拉普拉斯特征映射是一种基于图的降维算法,它希望相互间有关系的点(在图中相连的点)在降维后的空间中尽可能的靠近,从而在降维后仍能保持原有的数据结构。 借鉴了图论里面的Laplacian矩阵,把每个样本点看成图中的一个节点,通过样本距离或者邻接矩阵得到W(可以表征样本间的相似关系,用来重构数据流形的局部结构特征),再通过D = diag(sum(W,2)),得到对角的度矩阵(每个元素都是行和或者列和),然后通过L = D - W 得到Laplacian矩阵。

Laplacian Eigenmaps算法的主要思想是,如果两个数据实例i和j很相似,那么i和j在降维后目标子空间中应该尽量接近。设数据实例的数目为n,目标子空间即最终的降维目标的维度为m。定义n×m大小的矩阵Y,其中每一个行向量yTi是数据实例i在目标m维子空间中的向量表示(即降维后的数据实例i)。

Laplacian Eigenmap算法主要步骤:

  1. 建立邻接图:这里采用ϵ−最近邻图和K−最近邻图都能够解决问题。两种方法各有利弊,前者几何解释清楚,并且邻接矩阵天然是对称的,然而参数选择有困难,并且图的连通性未必有好的保证;后者参数选择容易,且图的连接性较好,但是几何解释不清晰。

  2. 赋权:我们常用热核来赋给权重,即对于连通的两个点i,i′,令拉普拉斯特征映射(Laplacian Eigenmaps) - 图1;而对其它点对权值赋0. 最终我们可以用邻接矩阵W来表示这个图.

  3. 计算Laplace算子矩阵:完成建图后,我们定义顶点vi的度是:

    拉普拉斯特征映射(Laplacian Eigenmaps) - 图2

并构造一个度矩阵:

拉普拉斯特征映射(Laplacian Eigenmaps) - 图3

由此即可得到未归一化的图Laplace矩阵:

拉普拉斯特征映射(Laplacian Eigenmaps) - 图4

容易说明, L是对称半正定矩阵,特征值是非负实数,且0是L的特征值,而特征向量是常1向量。值得说明的是我们经常采用的是经过归一化的L矩阵,规范化的方法是:

拉普拉斯特征映射(Laplacian Eigenmaps) - 图5

  1. 计算特征向量:即解以下问题:

拉普拉斯特征映射(Laplacian Eigenmaps) - 图6

得到特征值拉普拉斯特征映射(Laplacian Eigenmaps) - 图7

  1. 特征映射:(先验地)假定低维流形维数d的值,并将数据映射到特征空间上,即

拉普拉斯特征映射(Laplacian Eigenmaps) - 图8

这就是拉普拉斯特征映射(Laplacian Eigenmaps)算法进行数据降维处理的全过程。

应用示例:

  1. from numpy import exp, median
  2. from scipy.sparse.csgraph import laplacian
  3. from sklearn.manifold.locally_linear import (
  4. null_space, LocallyLinearEmbedding)
  5. from sklearn.metrics.pairwise import pairwise_distances, rbf_kernel
  6. from sklearn.neighbors import kneighbors_graph, NearestNeighbors
  7. def ler(X, Y, n_components=2, affinity='nearest_neighbors',
  8. n_neighbors=None, gamma=None, mu=1.0, y_gamma=None,
  9. eigen_solver='auto', tol=1e-6, max_iter=100,
  10. random_state=None):
  11. """
  12. Laplacian Eigenmaps for Regression (LER)
  13. Parameters
  14. ----------
  15. X : ndarray, 2-dimensional
  16. The data matrix, shape (num_points, num_dims)
  17. Y : ndarray, 1 or 2-dimensional
  18. The response matrix, shape (num_points, num_responses).
  19. n_components : int
  20. Number of dimensions for embedding. Default is 2.
  21. affinity : string or callable, default : "nearest_neighbors"
  22. How to construct the affinity matrix.
  23. - 'nearest_neighbors' : construct affinity matrix by knn graph
  24. - 'rbf' : construct affinity matrix by rbf kernel
  25. n_neighbors : int, optional, default=None
  26. Number of neighbors for kNN graph construction on X.
  27. gamma : float, optional, default=None
  28. Scaling factor for RBF kernel on X.
  29. mu : float, optional, default=1.0
  30. Influence of the Y-similarity penalty.
  31. y_gamma : float, optional
  32. Scaling factor for RBF kernel on Y.
  33. Defaults to the inverse of the median distance between rows of Y.
  34. Returns
  35. -------
  36. embedding : ndarray, 2-dimensional
  37. The embedding of X, shape (num_points, n_components)
  38. """
  39. if eigen_solver not in ('auto', 'arpack', 'dense'):
  40. raise ValueError("unrecognized eigen_solver '%s'" % eigen_solver)
  41. nbrs = NearestNeighbors(n_neighbors=n_neighbors + 1)
  42. nbrs.fit(X)
  43. X = nbrs._fit_X
  44. Nx, d_in = X.shape
  45. Ny = Y.shape[0]
  46. if n_components > d_in:
  47. raise ValueError("output dimension must be less than or equal "
  48. "to input dimension")
  49. if Nx != Ny:
  50. raise ValueError("X and Y must have same number of points")
  51. if affinity == 'nearest_neighbors':
  52. if n_neighbors >= Nx:
  53. raise ValueError("n_neighbors must be less than number of points")
  54. if n_neighbors == None or n_neighbors <= 0:
  55. raise ValueError("n_neighbors must be positive")
  56. elif affinity == 'rbf':
  57. if gamma != None and gamma <= 0:
  58. raise ValueError("n_neighbors must be positive")
  59. else:
  60. raise ValueError("affinity must be 'nearest_neighbors' or 'rbf' must be positive")
  61. if Y.ndim == 1:
  62. Y = Y[:, None]
  63. if y_gamma is None:
  64. dists = pairwise_distances(Y)
  65. y_gamma = 1.0 / median(dists)
  66. if affinity == 'nearest_neighbors':
  67. affinity = kneighbors_graph(X, n_neighbors, include_self=True)
  68. else:
  69. if gamma == None:
  70. dists = pairwise_distances(X)
  71. gamma = 1.0 / median(dists)
  72. affinity = kneighbors_graph(X, n_neighbors, mode='distance', include_self=True)
  73. affinity.data = exp(-gamma * affinity.data ** 2)
  74. K = rbf_kernel(Y, gamma=y_gamma)
  75. lap = laplacian(affinity, normed=True)
  76. lapK = laplacian(K, normed=True)
  77. embedding, _ = null_space(lap + mu * lapK, n_components,
  78. k_skip=1, eigen_solver=eigen_solver,
  79. tol=tol, max_iter=max_iter,
  80. random_state=random_state)
  81. return embedding
  82. class LER(LocallyLinearEmbedding):
  83. """Scikit-learn compatible class for LER."""
  84. def __init__(self, n_components=2, affinity='nearest_neighbors',
  85. n_neighbors=2, gamma=None, mu=1.0, y_gamma=None,
  86. eigen_solver='auto', tol=1E-6, max_iter=100,
  87. random_state=None, neighbors_algorithm='auto'):
  88. self.n_components = n_components
  89. self.affinity = affinity
  90. self.n_neighbors = n_neighbors
  91. self.gamma = gamma
  92. self.mu = mu
  93. self.y_gamma = y_gamma
  94. self.eigen_solver = eigen_solver
  95. self.tol = tol
  96. self.max_iter = max_iter
  97. self.random_state = random_state
  98. self.neighbors_algorithm = neighbors_algorithm
  99. def fit_transform(self, X, Y):
  100. self.fit(X, Y)
  101. return self.embedding_
  102. def fit(self, X, Y):
  103. # NN necessary for out-of-sample extensions
  104. self.nbrs_ = NearestNeighbors(self.n_neighbors,
  105. algorithm=self.neighbors_algorithm)
  106. self.nbrs_.fit(X)
  107. self.embedding_ = ler(
  108. X, Y, n_components=self.n_components,
  109. affinity=self.affinity, n_neighbors=self.n_neighbors,
  110. gamma=self.gamma, mu=self.mu, y_gamma=self.y_gamma,
  111. eigen_solver=self.eigen_solver, tol=self.tol,
  112. max_iter=self.max_iter, random_state=self.random_state)
  113. return self