Deeplearning Algorithms tutorial

谷歌的人工智能位于全球前列,在图像识别、语音识别、无人驾驶等技术上都已经落地。而百度实质意义上扛起了国内的人工智能的大旗,覆盖无人驾驶、智能助手、图像识别等许多层面。苹果业已开始全面拥抱机器学习,新产品进军家庭智能音箱并打造工作站级别Mac。另外,腾讯的深度学习平台Mariana已支持了微信语音识别的语音输入法、语音开放平台、长按语音消息转文本等产品,在微信图像识别中开始应用。全球前十大科技公司全部发力人工智能理论研究和应用的实现,虽然入门艰难,但是一旦入门,高手也就在你的不远处! AI的开发离不开算法那我们就接下来开始学习算法吧! 回归方法是对数值型连续随机变量进行预测和建模的监督学习算法。其特点是标注的数据集具有数值型的目标变量。回归的目的是预测数值型的目标值。

常用的回归方法包括:

  • 线性回归:使用超平面拟合数据集
  • 最近邻算法:通过搜寻最相似的训练样本来预测新样本的值
  • 决策树和回归树:将数据集分割为不同分支而实现分层学习
  • 集成方法:组合多个弱学习算法构造一种强学习算法,如随机森林(RF)和梯度提升树(GBM)等
  • 深度学习:使用多层神经网络学习复杂模型

逻辑回归

逻辑回归对应线性回归,但旨在解决分类问题,即将模型的输出转换为0/1值。逻辑回归直接对分类的可能性进行建模,无需事先假设数据的分布。

最理想的转换函数是单位阶跃函数(也称Heaviside函数),但单位阶跃函数是不连续的,没法在实际计算中使用。故而,在分类过程中更常使用对数几率函数(即sigmoid函数):

逻辑回归 - 图1

这样,模型就变成了

逻辑回归 - 图2

如果将$y$看作是样本$x$作为正例的可能性,那么可以得到反应$x$作为正例的相对可能性对数几率(logit)

逻辑回归 - 图3

逻辑回归算法

可以使用极大似然估计法来估计参数$w$和$b$

假设逻辑回归 - 图4,似然函数为逻辑回归 - 图5,那么对数似然函数为逻辑回归 - 图6逻辑回归 - 图7 然后就可以使用梯度下降算法、牛顿法或者BFGS等拟牛顿法来求解了。

正则化

在模型过于复杂的情况下,模型会学习到很多特征,从而导致可能把所有训练样本都拟合到,这样就导致了过拟合。解决过拟合可以从两个方面入手,一是减少模型复杂度,一是增加训练集个数。而正则化就是减少模型复杂度的一个方法。

一般是在目标函数(经验风险)中加上一个正则化项逻辑回归 - 图8,即

逻辑回归 - 图9 而这个正则化项一般会采用L1范数或者L2范数。其形式分别为:逻辑回归 - 图10

类别不均衡问题

类别不均衡是指分类任务中不同类别的训练样例数目差别很大的情况。解决这类问题的基本思路是“再缩放(rescaling)”,即令

逻辑回归 - 图11 其中,$m^-$为反例数目,$m^+$为正例数目)

然而,这个方法的实际操作却很难。实际使用上通常使用的方法:

  • 欠采样:去除一些样例使得不同类别的训练样例数目平衡。注意随机丢弃样例可能会导致丢失一些重要信息。
  • 过采样:增加一些样例使得不同类别的训练样例数目平衡。注意不能简单对原样本重复采样,否则会导致严重的过拟合
  • 直接基于原始训练集进行学习,但在使用最终模型预测时使用再缩放(也称为阈值移动)

示例代码

  1. import os
  2. import tensorflow as tf
  3. # initialize variables/model parameters
  4. W = tf.Variable(tf.zeros([5, 1]), name="weights")
  5. b = tf.Variable(0., name="bias")
  6. def read_csv(batch_size, file_name, record_defaults):
  7. filename_queue = tf.train.string_input_producer(
  8. [os.path.dirname(__file__) + "/" + file_name])
  9. reader = tf.TextLineReader(skip_header_lines=1)
  10. key, value = reader.read(filename_queue)
  11. # decode_csv will convert a Tensor from type string (the text line) in
  12. # a tuple of tensor columns with the specified defaults, which also
  13. # sets the data type for each column
  14. decoded = tf.decode_csv(value, record_defaults=record_defaults)
  15. # batch actually reads the file and loads "batch_size" rows in a single
  16. # tensor
  17. return tf.train.shuffle_batch(decoded,
  18. batch_size=batch_size,
  19. capacity=batch_size * 50,
  20. min_after_dequeue=batch_size)
  21. def inference(X):
  22. # compute inference model over data X and return the result
  23. return tf.sigmoid(tf.matmul(X, W) + b)
  24. def loss(X, Y):
  25. # compute loss over training data X and expected outputs Y
  26. return tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(
  27. tf.matmul(X, W) + b, Y))
  28. def inputs():
  29. # data is downloaded from https://www.kaggle.com/c/titanic/data.
  30. passenger_id, survived, pclass, name, sex, age, sibsp, parch, ticket, fare,\
  31. cabin, embarked = read_csv(100,
  32. "train.csv",
  33. [[0.0], [0.0], [0], [""],
  34. [""], [0.0], [0.0], [0.0],
  35. [""], [0.0], [""], [""]])
  36. # convert categorical data
  37. is_first_class = tf.to_float(tf.equal(pclass, [1]))
  38. is_second_class = tf.to_float(tf.equal(pclass, [2]))
  39. is_third_class = tf.to_float(tf.equal(pclass, [3]))
  40. gender = tf.to_float(tf.equal(sex, ["female"]))
  41. # Finally we pack all the features in a single matrix;
  42. # We then transpose to have a matrix with one example per row and one
  43. # feature per column.
  44. features = tf.transpose(
  45. tf.pack([is_first_class,
  46. is_second_class,
  47. is_third_class,
  48. gender,
  49. age]))
  50. survived = tf.reshape(survived, [100, 1])
  51. return features, survived
  52. def train(total_loss):
  53. # train / adjust model parameters according to computed total loss
  54. learning_rate = 0.01
  55. return tf.train.GradientDescentOptimizer(learning_rate).minimize(
  56. total_loss)
  57. def evaluate(sess, X, Y):
  58. # evaluate the resulting trained model
  59. predicted = tf.cast(inference(X) > 0.5, tf.float32)
  60. print sess.run(tf.reduce_mean(tf.cast(tf.equal(predicted, Y), tf.float32)))
  61. # Create a saver.
  62. # saver = tf.train.Saver()
  63. # Launch the graph in a session, setup boilerplate
  64. with tf.Session() as sess:
  65. tf.initialize_all_variables().run()
  66. X, Y = inputs()
  67. total_loss = loss(X, Y)
  68. train_op = train(total_loss)
  69. coord = tf.train.Coordinator()
  70. threads = tf.train.start_queue_runners(sess=sess, coord=coord)
  71. # actual training loop
  72. training_steps = 1000
  73. for step in range(training_steps):
  74. sess.run([train_op])
  75. # for debugging and learning purposes, see how the loss gets decremented
  76. # through training steps
  77. if step % 100 == 0:
  78. print "loss at step ", step, ":", sess.run([total_loss])
  79. # save training checkpoints in case loosing them
  80. # if step % 1000 == 0:
  81. # saver.save(sess, 'my-model', global_step=step)
  82. evaluate(sess, X, Y)
  83. coord.request_stop()
  84. coord.join(threads)
  85. # saver.save(sess, 'my-model', global_step=training_steps)
  1. loss at step 0 : [1.0275139]
  2. loss at step 100 : [1.389969]
  3. loss at step 200 : [1.4667224]
  4. loss at step 300 : [0.67178178]
  5. loss at step 400 : [0.568793]
  6. loss at step 500 : [0.48835525]
  7. loss at step 600 : [1.0899736]
  8. loss at step 700 : [0.84278578]
  9. loss at step 800 : [1.0500686]
  10. loss at step 900 : [0.89417559]
  11. 0.72

minst回归示例

  1. import tensorflow as tf
  2. import numpy as np
  3. import input_data
  4. # Import MINST data
  5. mnist = input_data.read_data_sets("../MNIST_data/", one_hot=True)
  6. Extracting ../MNIST_data/train-images-idx3-ubyte.gz
  7. Extracting ../MNIST_data/train-labels-idx1-ubyte.gz
  8. Extracting ../MNIST_data/t10k-images-idx3-ubyte.gz
  9. Extracting ../MNIST_data/t10k-labels-idx1-ubyte.gz
  10. # Parameters
  11. learning_rate = 0.01
  12. training_epochs = 25
  13. batch_size = 100
  14. display_step = 1
  15. # tf Graph Input
  16. x = tf.placeholder("float", [None, 784]) # mnist data image of shape 28*28=784
  17. y = tf.placeholder("float", [None, 10]) # 0-9 digits recognition => 10 classes
  18. # Create model
  19. def init_weights(shape):
  20. return tf.Variable(tf.random_normal(shape, stddev=0.01))
  21. def model(X, w):
  22. return tf.matmul(X, w)
  23. # like in linear regression, we need a shared variable weight matrix
  24. # for logistic regression
  25. w = init_weights([784, 10])
  26. # Construct model
  27. # compute mean cross entropy (softmax is applied internally)
  28. cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(model(x, w), y))
  29. train_op = tf.train.GradientDescentOptimizer(learning_rate).minimize(cost) # construct optimizer
  30. predict_op = tf.argmax(model(x, w), 1) # at predict time, evaluate the argmax of the logistic regression
  31. # Launch the graph
  32. with tf.Session() as sess:
  33. tf.initialize_all_variables().run()
  34. # Training cycle
  35. for epoch in range(training_epochs):
  36. avg_cost = 0.
  37. total_batch = int(mnist.train.num_examples/batch_size)
  38. # Loop over all batches
  39. for i in range(total_batch):
  40. batch_xs, batch_ys = mnist.train.next_batch(batch_size)
  41. # Fit training using batch data
  42. sess.run(train_op, feed_dict={x: batch_xs, y: batch_ys})
  43. # Compute average loss
  44. avg_cost += sess.run(cost, feed_dict={x: batch_xs, y: batch_ys})/total_batch
  45. # Display logs per epoch step
  46. if epoch % display_step == 0:
  47. print "Epoch:", '%04d' % (epoch+1), "cost=", "{:.9f}".format(avg_cost)
  48. print "Optimization Finished!"
  49. # Test model
  50. correct_prediction = tf.equal(predict_op, tf.argmax(y, 1))
  51. # Calculate accuracy
  52. accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
  53. print "Accuracy:", accuracy.eval({x: mnist.test.images, y: mnist.test.labels})
  1. Epoch: 0001 cost= 1.181141054
  2. Epoch: 0002 cost= 0.664358092
  3. Epoch: 0003 cost= 0.553026987
  4. Epoch: 0004 cost= 0.499294951
  5. Epoch: 0005 cost= 0.466518660
  6. Epoch: 0006 cost= 0.443856266
  7. Epoch: 0007 cost= 0.427351894
  8. Epoch: 0008 cost= 0.414347254
  9. Epoch: 0009 cost= 0.403219846
  10. Epoch: 0010 cost= 0.394844531
  11. Epoch: 0011 cost= 0.387121435
  12. Epoch: 0012 cost= 0.380693078
  13. Epoch: 0013 cost= 0.375634897
  14. Epoch: 0014 cost= 0.369904718
  15. Epoch: 0015 cost= 0.365776612
  16. Epoch: 0016 cost= 0.361626607
  17. Epoch: 0017 cost= 0.358361928
  18. Epoch: 0018 cost= 0.354674878
  19. Epoch: 0019 cost= 0.351685582
  20. Epoch: 0020 cost= 0.349124772
  21. Epoch: 0021 cost= 0.346287186
  22. Epoch: 0022 cost= 0.344134942
  23. Epoch: 0023 cost= 0.341778976
  24. Epoch: 0024 cost= 0.340130984
  25. Epoch: 0025 cost= 0.337454195
  26. Optimization Finished!
  27. Accuracy: 0.9122

sklearn示例

  1. from sklearn import datasets
  2. from sklearn import metrics
  3. import tensorflow as tf
  4. import tensorflow.contrib.layers.python.layers as layers
  5. import tensorflow.contrib.learn.python.learn as learn
  6. iris = datasets.load_iris()
  7. def my_model(features, labels):
  8. """DNN with three hidden layers."""
  9. # Convert the labels to a one-hot tensor of shape (length of features, 3) and
  10. # with a on-value of 1 for each one-hot vector of length 3.
  11. labels = tf.one_hot(labels, 3, 1, 0)
  12. # Create three fully connected layers respectively of size 10, 20, and 10.
  13. features = layers.stack(features, layers.fully_connected, [10, 20, 10])
  14. # Create two tensors respectively for prediction and loss.
  15. prediction, loss = (
  16. tf.contrib.learn.models.logistic_regression(features, labels)
  17. )
  18. # Create a tensor for training op.
  19. train_op = tf.contrib.layers.optimize_loss(
  20. loss, tf.contrib.framework.get_global_step(), optimizer='Adagrad',
  21. learning_rate=0.1)
  22. return {'class': tf.argmax(prediction, 1), 'prob': prediction}, loss, train_op
  23. classifier = learn.Estimator(model_fn=my_model)
  24. classifier.fit(iris.data, iris.target, steps=1000)
  25. y_predicted = [
  26. p['class'] for p in classifier.predict(iris.data, as_iterable=True)]
  27. score = metrics.accuracy_score(iris.target, y_predicted)
  28. print('Accuracy: {0:f}'.format(score))