Deeplearning Algorithms tutorial

谷歌的人工智能位于全球前列,在图像识别、语音识别、无人驾驶等技术上都已经落地。而百度实质意义上扛起了国内的人工智能的大旗,覆盖无人驾驶、智能助手、图像识别等许多层面。苹果业已开始全面拥抱机器学习,新产品进军家庭智能音箱并打造工作站级别Mac。另外,腾讯的深度学习平台Mariana已支持了微信语音识别的语音输入法、语音开放平台、长按语音消息转文本等产品,在微信图像识别中开始应用。全球前十大科技公司全部发力人工智能理论研究和应用的实现,虽然入门艰难,但是一旦入门,高手也就在你的不远处! AI的开发离不开算法那我们就接下来开始学习算法吧!

线性判别分析(Linear Discriminant Analysis)

线性判别分析(Linear Discriminant Analysis),是一种监督学习(supervised learning)算法。线性判别式分析也叫做Fisher线性判别(Fisher Linear Discriminant ,FLD),是模式识别的经典算法,它是在1996年由Belhumeur引入模式识别和人工智能领域的。其鉴别分析的基本思想是将高维的模式样本投影到最佳鉴别矢量空间,以达到抽取分类信息和压缩特征空间维数的效果,投影后保证模式样本在新的子空间有最大的类间距离和最小的类内距离,即模式在该空间中有最佳的可分离性。因此,它是一种有效的特征抽取方法。使用这种方法能够使投影后模式样本的类间散布矩阵最大,并且同时类内散布矩阵最小。

线性判别分析(Linear Discriminant Analysis)是对费舍尔的线性鉴别方法的归纳,这种方法使用统计学,模式识别和机器学习方法,试图找到两类物体或事件的特征的一个线性组合,以能够特征化或区分它们。所得的组合可用来作为一个线性分类器,或者,更常见的是,为后续的分类做降维处理。

线性判别分析(Linear Discriminant Analysis)与方差分析(ANOVA)和回归分析紧密相关,这两种分析方法也试图通过一些特征或测量值的线性组合来表示一个因变量。然而,方差分析使用类别自变量和连续数因变量,而判别分析连续自变量和类别因变量(即类标签)。[3] 逻辑回归和概率回归比方差分析更类似于LDA,因为他们也是用连续自变量来解释类别因变量的。LDA的基本假设是自变量是正态分布的,当这一假设无法满足时,在实际应用中更倾向于用上述的其他方法。

线性判别分析(Linear Discriminant Analysis)也与主成分分析(PCA)和因子分析紧密相关,它们都在寻找最佳解释数据的变量线性组合。[4] LDA明确的尝试为数据类之间不同建立模型。 另一方面,PCA不考虑类的任何不同,因子分析是根据不同点而不是相同点来建立特征组合。判别的分析不同因子分析还在于,它不是一个相互依存技术:即必须区分出自变量和因变量(也称为准则变量)的不同。

在对自变量每一次观察测量值都是连续量的时候,LDA能有效的起作用。当处理类别自变量时,与LDA相对应的技术称为判别反应分析。

通常它能够保证投影后模式样本在新的空间中有最小的类内距离和最大的类间距离,即模式在该空间中有最佳的可分离性。

应用示例:

  1. from __future__ import print_function
  2. import os
  3. import time
  4. import pickle
  5. import itertools
  6. import numpy as np
  7. import matplotlib.pyplot as plt
  8. import theano
  9. import lasagne
  10. # init color printer
  11. class BColors:
  12. """
  13. Colored command line output formatting
  14. """
  15. HEADER = '\033[95m'
  16. OKBLUE = '\033[94m'
  17. OKGREEN = '\033[92m'
  18. WARNING = '\033[93m'
  19. FAIL = '\033[91m'
  20. ENDC = '\033[0m'
  21. BOLD = '\033[1m'
  22. UNDERLINE = '\033[4m'
  23. def __init__(self):
  24. """ Constructor """
  25. pass
  26. def print_colored(self, string, color):
  27. """ Change color of string """
  28. return color + string + BColors.ENDC
  29. col = BColors()
  30. def threaded_generator(generator, num_cached=10):
  31. """
  32. Threaded generator
  33. """
  34. import Queue
  35. queue = Queue.Queue(maxsize=num_cached)
  36. queue = Queue.Queue(maxsize=num_cached)
  37. end_marker = object()
  38. # define producer
  39. def producer():
  40. for item in generator:
  41. #item = np.array(item) # if needed, create a copy here
  42. queue.put(item)
  43. queue.put(end_marker)
  44. # start producer
  45. import threading
  46. thread = threading.Thread(target=producer)
  47. thread.daemon = True
  48. thread.start()
  49. # run as consumer
  50. item = queue.get()
  51. while item is not end_marker:
  52. yield item
  53. queue.task_done()
  54. item = queue.get()
  55. def generator_from_iterator(iterator):
  56. """
  57. Compile generator from iterator
  58. """
  59. for x in iterator:
  60. yield x
  61. def threaded_generator_from_iterator(iterator, num_cached=10):
  62. """
  63. Compile threaded generator from iterator
  64. """
  65. generator = generator_from_iterator(iterator)
  66. return threaded_generator(generator, num_cached)
  67. def accuracy_score(t, p):
  68. """
  69. Compute accuracy
  70. """
  71. return float(np.sum(p == t)) / len(p)
  72. class LDA(object):
  73. """ LDA Class """
  74. def __init__(self, r=1e-3, n_components=None, verbose=False, show=False):
  75. """ Constructor """
  76. self.r = r
  77. self.n_components = n_components
  78. self.scalings_ = None
  79. self.coef_ = None
  80. self.intercept_ = None
  81. self.means = None
  82. self.verbose = verbose
  83. self.show = show
  84. def fit(self, X, y, X_te=None):
  85. """ Compute lda on hidden layer """
  86. # split into semi- and supervised- data
  87. X_all = X.copy()
  88. X = X[y >= 0]
  89. y = y[y >= 0]
  90. # get class labels
  91. classes = np.unique(y)
  92. # set number of components
  93. if self.n_components is None:
  94. self.n_components = len(classes) - 1
  95. # compute means
  96. means = []
  97. for group in classes:
  98. Xg = X[y == group, :]
  99. means.append(Xg.mean(0))
  100. self.means = np.asarray(means)
  101. # compute covs
  102. covs = []
  103. for group in classes:
  104. Xg = X[y == group, :]
  105. Xg = Xg - np.mean(Xg, axis=0)
  106. covs.append(np.cov(Xg.T))
  107. # within scatter
  108. Sw = np.average(covs, axis=0)
  109. # total scatter
  110. X_all = X_all - np.mean(X_all, axis=0)
  111. if X_te is not None:
  112. St = np.cov(np.concatenate((X_all, X_te)).T)
  113. else:
  114. St = np.cov(X_all.T)
  115. # between scatter
  116. Sb = St - Sw
  117. # cope for numerical instability
  118. Sw += np.identity(Sw.shape[0]) * self.r
  119. # compute eigen decomposition
  120. from scipy.linalg.decomp import eigh
  121. evals, evecs = eigh(Sb, Sw)
  122. # sort eigen vectors according to eigen values
  123. evecs = evecs[:, np.argsort(evals)[::-1]]
  124. # normalize eigen vectors
  125. evecs /= np.apply_along_axis(np.linalg.norm, 0, evecs)
  126. # compute lda data
  127. self.scalings_ = evecs
  128. self.coef_ = np.dot(self.means, evecs).dot(evecs.T)
  129. self.intercept_ = (-0.5 * np.diag(np.dot(self.means, self.coef_.T)))
  130. if self.verbose:
  131. top_k_evals = evals[-self.n_components:]
  132. print("LDA-Eigenvalues (Train):", np.array_str(top_k_evals, precision=2, suppress_small=True))
  133. print("Ratio min(eigval)/max(eigval): %.3f, Mean(eigvals): %.3f" % (top_k_evals.min() / top_k_evals.max(), top_k_evals.mean()))
  134. if self.show:
  135. plt.figure("Eigenvalues")
  136. ax = plt.subplot(111)
  137. top_k_evals /= np.sum(top_k_evals)
  138. plt.plot(range(self.n_components), top_k_evals, 'bo-')
  139. plt.grid('on')
  140. plt.xlabel('Eigenvalue', fontsize=20)
  141. plt.ylabel('Explained Discriminative Variance', fontsize=20)
  142. plt.ylim([0.0, 1.05 * np.max(top_k_evals)])
  143. ax.tick_params(axis='x', labelsize=18)
  144. ax.tick_params(axis='y', labelsize=18)
  145. return evals
  146. def transform(self, X):
  147. """ transform data """
  148. X_new = np.dot(X, self.scalings_)
  149. return X_new[:, :self.n_components]
  150. def predict_proba(self, X):
  151. """ estimate probability """
  152. prob = -(np.dot(X, self.coef_.T) + self.intercept_)
  153. np.exp(prob, prob)
  154. prob += 1
  155. np.reciprocal(prob, prob)
  156. prob /= prob.sum(axis=1).reshape((prob.shape[0], -1))
  157. return prob
  158. def predict_log_proba(self, X):
  159. """ estimate log probability """
  160. return np.log(self.predict_proba(X))
  161. def create_iter_functions(l_out, l_in, y_tensor_type, objective, learning_rate, l_2, compute_updates):
  162. """ Create functions for training, validation and testing to iterate one epoch. """
  163. # init target tensor
  164. targets = y_tensor_type('y')
  165. # compute train costs
  166. tr_output = lasagne.layers.get_output(l_out, deterministic=False)
  167. tr_cost = objective(tr_output, targets)
  168. # compute validation costs
  169. va_output = lasagne.layers.get_output(l_out, deterministic=True)
  170. va_cost = objective(va_output, targets)
  171. # collect all parameters of net and compute updates
  172. all_params = lasagne.layers.get_all_params(l_out, trainable=True)
  173. # add weight decay
  174. if l_2 is not None:
  175. tr_cost += l_2 * lasagne.regularization.apply_penalty(all_params, lasagne.regularization.l2)
  176. # compute updates from gradients
  177. all_grads = lasagne.updates.get_or_compute_grads(tr_cost, all_params)
  178. updates = compute_updates(all_grads, all_params, learning_rate)
  179. # compile iter functions
  180. tr_outputs = [tr_cost]
  181. iter_train = theano.function([l_in.input_var, targets], tr_outputs, updates=updates)
  182. va_outputs = [va_cost, va_output]
  183. iter_valid = theano.function([l_in.input_var, targets], va_outputs)
  184. # compile output function
  185. compute_output = theano.function([l_in.input_var], va_output)
  186. return dict(train=iter_train, valid=iter_valid, test=iter_valid, compute_output=compute_output)
  187. def train(iter_funcs, dataset, train_batch_iter, valid_batch_iter, r):
  188. """
  189. Train the model with `dataset` with mini-batch training.
  190. Each mini-batch has `batch_size` recordings.
  191. """
  192. import sys
  193. import time
  194. for epoch in itertools.count(1):
  195. # iterate train batches
  196. batch_train_losses = []
  197. iterator = train_batch_iter(dataset['X_train'], dataset['y_train'])
  198. generator = threaded_generator_from_iterator(iterator)
  199. start, after = time.time(), time.time()
  200. for i_batch, (X_b, y_b) in enumerate(generator):
  201. batch_res = iter_funcs['train'](X_b, y_b)
  202. batch_train_losses.append(batch_res[0])
  203. after = time.time()
  204. train_time = (after-start)
  205. # report loss during training
  206. perc = 100 * (float(i_batch) / train_batch_iter.n_batches)
  207. dec = int(perc // 4)
  208. progbar = "|" + dec * "#" + (25-dec) * "-" + "|"
  209. vals = (perc, progbar, train_time, np.mean(batch_train_losses))
  210. loss_str = " (%d%%) %s time: %.2fs, loss: %.5f" % vals
  211. print(col.print_colored(loss_str, col.WARNING), end="\r")
  212. sys.stdout.flush()
  213. print("\x1b[K", end="\r")
  214. avg_train_loss = np.mean(batch_train_losses)
  215. # lda evaluation (accuracy based)
  216. # iterate validation batches
  217. batch_valid_losses = []
  218. iterator = valid_batch_iter(dataset['X_valid'], dataset['y_valid'])
  219. generator = threaded_generator_from_iterator(iterator)
  220. net_output_va, y_va = None, np.zeros(0, dtype=np.int32)
  221. for X_b, y_b in generator:
  222. batch_res = iter_funcs['valid'](X_b, y_b)
  223. batch_valid_losses.append(batch_res[0])
  224. y_va = np.concatenate((y_va, y_b))
  225. net_output = iter_funcs['compute_output'](X_b)
  226. if net_output_va is None:
  227. net_output_va = net_output
  228. else:
  229. net_output_va = np.vstack((net_output_va, net_output))
  230. avg_valid_loss = np.mean(batch_valid_losses)
  231. # compute train set net output
  232. iterator = train_batch_iter(dataset['X_train'], dataset['y_train'])
  233. generator = threaded_generator_from_iterator(iterator)
  234. net_output_tr, y_tr = None, np.zeros(0, dtype=np.int32)
  235. for i_batch, (X_b, y_b) in enumerate(generator):
  236. y_tr = np.concatenate((y_tr, y_b))
  237. net_output = iter_funcs['compute_output'](X_b)
  238. if net_output_tr is None:
  239. net_output_tr = net_output
  240. else:
  241. net_output_tr = np.vstack((net_output_tr, net_output))
  242. # fit lda on net output
  243. print("")
  244. dlda = LDA(r=r, n_components=None, verbose=True)
  245. evals = dlda.fit(net_output_tr, y_tr)
  246. # predict on train set
  247. proba = dlda.predict_proba(net_output_tr[y_tr >= 0])
  248. y_tr_pr = np.argmax(proba, axis=1)
  249. tr_acc = 100 * accuracy_score(y_tr[y_tr >= 0], y_tr_pr)
  250. # predict on validation set
  251. proba = dlda.predict_proba(net_output_va)
  252. y_va_pr = np.argmax(proba, axis=1)
  253. va_acc = 100 * accuracy_score(y_va, y_va_pr)
  254. # estimate overfitting
  255. overfit = va_acc / tr_acc
  256. # collect results
  257. yield {
  258. 'number': epoch,
  259. 'train_loss': avg_train_loss,
  260. 'train_acc': tr_acc,
  261. 'valid_loss': avg_valid_loss,
  262. 'valid_acc': va_acc,
  263. 'overfitting': overfit,
  264. 'eigenvalues': evals
  265. }
  266. def fit(l_out, l_in, data, objective, y_tensor_type,
  267. train_batch_iter, valid_batch_iter,
  268. r=1e-3, num_epochs=100, patience=20,
  269. learn_rate=0.01, update_learning_rate=None,
  270. l_2=None, compute_updates=None,
  271. exp_name='ff', out_path=None, dump_file=None):
  272. """ Train model """
  273. # log model evolution
  274. log_file = os.path.join(out_path, 'results.pkl')
  275. print("\n")
  276. print(col.print_colored("Running Test Case: " + exp_name, BColors.UNDERLINE))
  277. # adaptive learning rate
  278. learning_rate = theano.shared(np.float32(learn_rate))
  279. if update_learning_rate is None:
  280. def update_learning_rate(lr, e):
  281. return lr
  282. learning_rate.set_value(update_learning_rate(learn_rate, 0))
  283. # initialize evaluation output
  284. pred_tr_err, pred_val_err, overfitting = [], [], []
  285. tr_accs, va_accs = [], []
  286. eigenvalues = []
  287. print("Building model and compiling functions...")
  288. iter_funcs = create_iter_functions(l_out, l_in, y_tensor_type, objective, learning_rate=learning_rate,
  289. l_2=l_2, compute_updates=compute_updates)
  290. print("Starting training...")
  291. now = time.time()
  292. try:
  293. # initialize early stopping
  294. last_improvement = 0
  295. best_model = lasagne.layers.get_all_param_values(l_out)
  296. # iterate training epochs
  297. prev_acc_tr, prev_acc_va = 0.0, 0.0
  298. for epoch in train(iter_funcs, data, train_batch_iter, valid_batch_iter, r):
  299. print("Epoch {} of {} took {:.3f}s".format(
  300. epoch['number'], num_epochs, time.time() - now))
  301. now = time.time()
  302. # update learning rate
  303. learn_rate = update_learning_rate(learn_rate, epoch['number'])
  304. learning_rate.set_value(learn_rate)
  305. # --- collect train output ---
  306. tr_loss, va_loss = epoch['train_loss'], epoch['valid_loss']
  307. train_acc, valid_acc = epoch['train_acc'], epoch['valid_acc']
  308. overfit = epoch['overfitting']
  309. # prepare early stopping
  310. if valid_acc >= prev_acc_va:
  311. last_improvement = 0
  312. best_model = lasagne.layers.get_all_param_values(l_out)
  313. # dump net parameters during training
  314. if dump_file is not None:
  315. with open(dump_file, 'w') as fp:
  316. params = lasagne.layers.get_all_param_values(l_out)
  317. pickle.dump(params, fp)
  318. # increase improvement counter
  319. last_improvement += 1
  320. # plot train output
  321. if train_acc is None:
  322. txt_tr = 'costs_tr %.5f' % tr_loss
  323. else:
  324. txt_tr = 'costs_tr %.5f (%.3f), ' % (tr_loss, train_acc)
  325. if train_acc >= prev_acc_tr:
  326. txt_tr = col.print_colored(txt_tr, BColors.OKGREEN)
  327. prev_acc_tr = train_acc
  328. if valid_acc is None:
  329. txt_val = ''
  330. else:
  331. txt_val = 'costs_val %.5f (%.3f), tr/val %.3f' % (va_loss, valid_acc, overfit)
  332. if valid_acc >= prev_acc_va:
  333. txt_val = col.print_colored(txt_val, BColors.OKGREEN)
  334. prev_acc_va = valid_acc
  335. print(' lr: %.5f' % learn_rate)
  336. print(' ' + txt_tr + txt_val)
  337. # collect model evolution data
  338. tr_accs.append(train_acc)
  339. va_accs.append(valid_acc)
  340. pred_tr_err.append(tr_loss)
  341. pred_val_err.append(va_loss)
  342. overfitting.append(overfit)
  343. eigenvalues.append(epoch['eigenvalues'])
  344. # --- early stopping: preserve best model ---
  345. if last_improvement > patience:
  346. print(col.print_colored("Early Stopping!", BColors.WARNING))
  347. status = "Epoch: %d, Best Validation Accuracy: %.3f" % (epoch['number'], prev_acc_va)
  348. print(col.print_colored(status, BColors.WARNING))
  349. break
  350. # maximum number of epochs reached
  351. if epoch['number'] >= num_epochs:
  352. break
  353. # shuffle train data
  354. if not hasattr(data['X_train'], 'reset_batch_generator'):
  355. rand_idx = np.random.permutation(data['X_train'].shape[0])
  356. data['X_train'] = data['X_train'][rand_idx]
  357. data['y_train'] = data['y_train'][rand_idx]
  358. # save results
  359. exp_res = dict()
  360. exp_res['pred_tr_err'] = pred_tr_err
  361. exp_res['tr_accs'] = tr_accs
  362. exp_res['pred_val_err'] = pred_val_err
  363. exp_res['va_accs'] = va_accs
  364. exp_res['overfitting'] = overfitting
  365. exp_res['eigenvalues'] = eigenvalues
  366. with open(log_file, 'w') as fp:
  367. pickle.dump(exp_res, fp)
  368. except KeyboardInterrupt:
  369. pass
  370. # set net to best weights
  371. lasagne.layers.set_all_param_values(l_out, best_model)
  372. # evaluate on test set
  373. test_losses, test_acc = [], []
  374. iterator = valid_batch_iter(data['X_test'], data['y_test'])
  375. for X_b, y_b in iterator:
  376. loss_te = iter_funcs['test'](X_b, y_b)
  377. test_losses.append(loss_te[0])
  378. if len(loss_te) > 1:
  379. test_acc.append(loss_te[1])
  380. # compute evaluation measures
  381. avg_loss_te = np.mean(test_losses)
  382. avg_acc_te = np.mean(test_acc)
  383. print("--------------------------------------------")
  384. print('Loss on Test-Set: %.5f' % avg_loss_te)
  385. print("--------------------------------------------\n")
  386. if out_path is not None:
  387. # add test results and save results
  388. exp_res['avg_loss_te'] = avg_loss_te
  389. exp_res['avg_acc_te'] = avg_acc_te
  390. with open(log_file, 'w') as fp:
  391. pickle.dump(exp_res, fp)
  392. return l_out, prev_acc_va