2.4.2剖析Python代码

无测量无优化!

  • 测量: 剖析, 计时
  • 你可能会惊讶:最快的代码并不是通常你想的样子

2.4.2.1 Timeit

在IPython中,使用timeit(http://docs.python.org/library/timeit.html)来计时基本的操作:来计时基本的操作:)

In [2]:

  1. import numpy as np
  2. a = np.arange(1000)
  3. %timeit a ** 2
  1. The slowest run took 60.37 times longer than the fastest. This could mean that an intermediate result is being cached
  2. 100000 loops, best of 3: 1.99 µs per loop

In [3]:

  1. %timeit a ** 2.1
  1. 10000 loops, best of 3: 45.1 µs per loop

In [4]:

  1. %timeit a * a
  1. The slowest run took 12.79 times longer than the fastest. This could mean that an intermediate result is being cached
  2. 100000 loops, best of 3: 1.86 µs per loop

用这个信息来指导在不同策略间进行选择。

笔记:对于运行时间较长的单元,使用%time来代替%timeit; 它准确性较差但是更快。

2.4.2.2 Profiler

当你有个大型程序要剖析时比较有用,例如下面这个程序

In [ ]:

  1. # For this example to run, you also need the 'ica.py' file
  2. import numpy as np
  3. from scipy import linalg
  4. from ica import fastica
  5. def test():
  6. data = np.random.random((5000, 100))
  7. u, s, v = linalg.svd(data)
  8. pca = np.dot(u[:, :10].T, data)
  9. results = fastica(pca.T, whiten=False)
  10. if __name__ == '__main__':
  11. test()

笔记:这种技术是两个非监督学习技术的组合,主成分分析(PCA)和独立成分分析(ICA)。PCA是一种降维技术,即一种用更少的维度解释数据中观察到的变异的算法。ICA是一种源信号分离技术,例如分离由多个传感器记录的多种信号。如果传感器比信号多,那么先进行PCA然后ICA会有帮助。更多的信息请见:来自scikits-learn的FastICA例子

要运行它,你也需要下载ica模块。在IPython我们计时这个脚本:

In [8]:

  1. %run -t demo.py
  1. IPython CPU timings (estimated):
  2. User : 6.62 s.
  3. System : 0.17 s.
  4. Wall time: 3.72 s.
  1. /Users/cloga/Documents/scipy-lecture-notes_cn/ica.py:65: RuntimeWarning: invalid value encountered in sqrt
  2. W = (u * np.diag(1.0/np.sqrt(s)) * u.T) * W # W = (W * W.T) ^{-1/2} * W
  3. /Users/cloga/Documents/scipy-lecture-notes_cn/ica.py:90: RuntimeWarning: invalid value encountered in absolute
  4. lim = max(abs(abs(np.diag(np.dot(W1, W.T))) - 1))

并且剖析它:

  1. %run -p demo.py
  2. 301 function calls in 3.746 seconds
  3. Ordered by: internal time
  4. ncalls tottime percall cumtime percall filename:lineno(function)
  5. 1 3.714 3.714 3.715 3.715 decomp_svd.py:15(svd)
  6. 1 0.019 0.019 3.745 3.745 demo.py:3(<module>)
  7. 1 0.007 0.007 0.007 0.007 {method 'random_sample' of 'mtrand.RandomState' objects}
  8. 14 0.003 0.000 0.003 0.000 {numpy.core._dotblas.dot}
  9. 1 0.001 0.001 0.001 0.001 function_base.py:550(asarray_chkfinite)
  10. 2 0.000 0.000 0.000 0.000 linalg.py:1116(eigh)
  11. 1 0.000 0.000 3.745 3.745 {execfile}
  12. 2 0.000 0.000 0.001 0.000 ica.py:58(_sym_decorrelation)
  13. 2 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects}
  14. 1 0.000 0.000 0.000 0.000 ica.py:195(gprime)
  15. 1 0.000 0.000 0.001 0.001 ica.py:69(_ica_par)
  16. 1 0.000 0.000 3.726 3.726 demo.py:9(test)
  17. 1 0.000 0.000 0.001 0.001 ica.py:97(fastica)
  18. 1 0.000 0.000 0.000 0.000 ica.py:192(g)
  19. 23 0.000 0.000 0.000 0.000 defmatrix.py:290(__array_finalize__)
  20. 4 0.000 0.000 0.000 0.000 twodim_base.py:242(diag)
  21. 1 0.000 0.000 3.746 3.746 interactiveshell.py:2616(safe_execfile)
  22. 10 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array}
  23. 1 0.000 0.000 3.745 3.745 py3compat.py:279(execfile)
  24. 1 0.000 0.000 0.000 0.000 {method 'normal' of 'mtrand.RandomState' objects}
  25. 50 0.000 0.000 0.000 0.000 {isinstance}
  26. 10 0.000 0.000 0.000 0.000 defmatrix.py:66(asmatrix)
  27. 10 0.000 0.000 0.000 0.000 defmatrix.py:244(__new__)
  28. 9 0.000 0.000 0.000 0.000 numeric.py:394(asarray)
  29. 1 0.000 0.000 0.000 0.000 _methods.py:53(_mean)
  30. 1 0.000 0.000 0.000 0.000 {posix.getcwdu}
  31. 4 0.000 0.000 0.000 0.000 {method 'astype' of 'numpy.ndarray' objects}
  32. 6 0.000 0.000 0.000 0.000 defmatrix.py:338(__mul__)
  33. 2 0.000 0.000 0.000 0.000 linalg.py:139(_commonType)
  34. 4 0.000 0.000 0.000 0.000 {method 'view' of 'numpy.ndarray' objects}
  35. 1 0.000 0.000 0.000 0.000 posixpath.py:329(normpath)
  36. 5 0.000 0.000 0.000 0.000 {abs}
  37. 1 0.000 0.000 0.000 0.000 {open}
  38. 1 0.000 0.000 0.000 0.000 blas.py:172(find_best_blas_type)
  39. 1 0.000 0.000 0.000 0.000 blas.py:216(_get_funcs)
  40. 1 0.000 0.000 0.000 0.000 syspathcontext.py:64(__exit__)
  41. 3 0.000 0.000 0.000 0.000 {max}
  42. 6 0.000 0.000 0.000 0.000 {method 'transpose' of 'numpy.ndarray' objects}
  43. 1 0.000 0.000 0.000 0.000 posixpath.py:120(dirname)
  44. 2 0.000 0.000 0.000 0.000 linalg.py:101(get_linalg_error_extobj)
  45. 2 0.000 0.000 0.000 0.000 linalg.py:106(_makearray)
  46. 3 0.000 0.000 0.000 0.000 {numpy.core.multiarray.zeros}
  47. 6 0.000 0.000 0.000 0.000 defmatrix.py:928(getT)
  48. 1 0.000 0.000 0.000 0.000 syspathcontext.py:57(__enter__)
  49. 2 0.000 0.000 0.000 0.000 linalg.py:209(_assertNdSquareness)
  50. 7 0.000 0.000 0.000 0.000 {issubclass}
  51. 4 0.000 0.000 0.000 0.000 {getattr}
  52. 1 0.000 0.000 0.000 0.000 posixpath.py:358(abspath)
  53. 5 0.000 0.000 0.000 0.000 {method 'startswith' of 'unicode' objects}
  54. 2 0.000 0.000 0.000 0.000 linalg.py:198(_assertRankAtLeast2)
  55. 2 0.000 0.000 0.000 0.000 {method 'encode' of 'unicode' objects}
  56. 10 0.000 0.000 0.000 0.000 {method 'get' of 'dict' objects}
  57. 1 0.000 0.000 0.000 0.000 _methods.py:43(_count_reduce_items)
  58. 1 0.000 0.000 0.000 0.000 {method 'all' of 'numpy.ndarray' objects}
  59. 4 0.000 0.000 0.000 0.000 linalg.py:124(_realType)
  60. 1 0.000 0.000 0.000 0.000 syspathcontext.py:54(__init__)
  61. 1 0.000 0.000 0.000 0.000 posixpath.py:61(join)
  62. 1 0.000 0.000 3.746 3.746 <string>:1(<module>)
  63. 1 0.000 0.000 0.000 0.000 _methods.py:40(_all)
  64. 4 0.000 0.000 0.000 0.000 linalg.py:111(isComplexType)
  65. 2 0.000 0.000 0.000 0.000 {method '__array_prepare__' of 'numpy.ndarray' objects}
  66. 4 0.000 0.000 0.000 0.000 {min}
  67. 1 0.000 0.000 0.000 0.000 py3compat.py:19(encode)
  68. 1 0.000 0.000 0.000 0.000 defmatrix.py:872(getA)
  69. 2 0.000 0.000 0.000 0.000 numerictypes.py:949(_can_coerce_all)
  70. 6 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
  71. 1 0.000 0.000 0.000 0.000 numerictypes.py:970(find_common_type)
  72. 1 0.000 0.000 0.000 0.000 {method 'mean' of 'numpy.ndarray' objects}
  73. 11 0.000 0.000 0.000 0.000 {len}
  74. 1 0.000 0.000 0.000 0.000 numeric.py:464(asanyarray)
  75. 1 0.000 0.000 0.000 0.000 {method '__array__' of 'numpy.ndarray' objects}
  76. 1 0.000 0.000 0.000 0.000 {method 'rfind' of 'unicode' objects}
  77. 2 0.000 0.000 0.000 0.000 {method 'upper' of 'str' objects}
  78. 1 0.000 0.000 0.000 0.000 posixpath.py:251(expanduser)
  79. 3 0.000 0.000 0.000 0.000 {method 'setdefault' of 'dict' objects}
  80. 1 0.000 0.000 0.000 0.000 {method 'diagonal' of 'numpy.ndarray' objects}
  81. 1 0.000 0.000 0.000 0.000 lapack.py:239(get_lapack_funcs)
  82. 1 0.000 0.000 0.000 0.000 {method 'rstrip' of 'unicode' objects}
  83. 1 0.000 0.000 0.000 0.000 py3compat.py:29(cast_bytes)
  84. 1 0.000 0.000 0.000 0.000 posixpath.py:52(isabs)
  85. 1 0.000 0.000 0.000 0.000 {method 'split' of 'unicode' objects}
  86. 1 0.000 0.000 0.000 0.000 {method 'endswith' of 'unicode' objects}
  87. 1 0.000 0.000 0.000 0.000 {sys.getdefaultencoding}
  88. 1 0.000 0.000 0.000 0.000 {method 'insert' of 'list' objects}
  89. 1 0.000 0.000 0.000 0.000 {method 'remove' of 'list' objects}
  90. 1 0.000 0.000 0.000 0.000 {method 'join' of 'unicode' objects}
  91. 1 0.000 0.000 0.000 0.000 {method 'index' of 'list' objects}
  92. 1 0.000 0.000 0.000 0.000 misc.py:126(_datacopied)
  93. 1 0.000 0.000 0.000 0.000 {sys.getfilesystemencoding}
  94. 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}

很明显svddecomp.py中)占用了最多的时间,换句话说,是瓶颈。我们要找到方法让这个步骤跑的更快,或者避免这个步骤(算法优化)。在其他部分花费时间是没用的。

2.4.2.3 Line-profiler

profiler很棒:它告诉我们哪个函数花费了最多的时间,但是,不是它在哪里被调用。

关于这一点,我们使用line_profiler:在源文件中,[email protected](不需要导入它)修饰了一些想要用检查的函数:

In [ ]:

  1. @profile
  2. def test():
  3. data = np.random.random((5000, 100))
  4. u, s, v = linalg.svd(data)
  5. pca = np.dot(u[: , :10], data)
  6. results = fastica(pca.T, whiten=False)

接着我们用kernprof.py来运行这个脚本,开启-l, —line-by-line-v, —view来使用逐行profiler,并且查看结果并保存他们:

  1. kernprof.py -l -v demo.py
  2. Wrote profile results to demo.py.lprof
  3. Timer unit: 1e-06 s
  4. File: demo.py
  5. Function: test at line 5
  6. Total time: 14.2793 s
  7. Line # Hits Time Per Hit % Time Line Contents
  8. ==============================================================
  9. 5 @profile
  10. 6 def test():
  11. 7 1 19015 19015.0 0.1 data = np.random.random((5000, 100))
  12. 8 1 14242163 14242163.0 99.7 u, s, v = linalg.svd(data)
  13. 9 1 10282 10282.0 0.1 pca = np.dot(u[:10, :], data)
  14. 10 1 7799 7799.0 0.1 results = fastica(pca.T, whiten=False)

SVD占用了几乎所有时间,我们需要优化这一行。

2.4.2.4 运行cProfile

在上面的IPython例子中,Ipython只是调用了内置的Python剖析器cProfileprofile。如果你想要用一个可视化工具来处理剖析器的结果,这会有帮助。

  1. python -m cProfile -o demo.prof demo.py

使用-o开关将输入剖析器结果到文件demo.prof

2.4.2.5 使用gprof2dot

如果你想要更加视觉化的剖析器输入结果,你可以使用gprof2dot工具:

In [ ]:

  1. gprof2dot -f pstats demo.prof | dot -Tpng -o demo-prof.png

这会生成下面的图片:

2.4.2剖析Python代码 - 图1

这种方法打印了一个类似前一种方法的图片。