最小化函数¶

minimize 函数¶

In [1]:

  1. %pylab inline
  2. set_printoptions(precision=3, suppress=True)
  1. Populating the interactive namespace from numpy and matplotlib

已知斜抛运动的水平飞行距离公式:

$d = 2 \frac{v_0^2}{g} \sin(\theta) \cos (\theta)$

  • $d$ 水平飞行距离
  • $v_0$ 初速度大小
  • $g$ 重力加速度
  • $\theta$ 抛出角度
    希望找到使 $d$ 最大的角度 $\theta$。

定义距离函数:

In [2]:

  1. def dist(theta, v0):
  2. """calculate the distance travelled by a projectile launched
  3. at theta degrees with v0 (m/s) initial velocity.
  4. """
  5. g = 9.8
  6. theta_rad = pi * theta / 180
  7. return 2 * v0 ** 2 / g * sin(theta_rad) * cos(theta_rad)
  8. theta = linspace(0,90,90)
  9. p = plot(theta, dist(theta, 1.))
  10. xl = xlabel(r'launch angle $\theta (^{\circ})$')
  11. yl = ylabel('horizontal distance traveled')

04.05 最小化函数 - 图1

因为 Scipy 提供的是最小化方法,所以最大化距离就相当于最小化距离的负数:

In [3]:

  1. def neg_dist(theta, v0):
  2. return -1 * dist(theta, v0)

导入 scipy.optimize.minimize

In [4]:

  1. from scipy.optimize import minimize
  2. result = minimize(neg_dist, 40, args=(1,))
  3. print "optimal angle = {:.1f} degrees".format(result.x[0])
  1. optimal angle = 45.0 degrees

minimize 接受三个参数:第一个是要优化的函数,第二个是初始猜测值,第三个则是优化函数的附加参数,默认 minimize 将优化函数的第一个参数作为优化变量,所以第三个参数输入的附加参数从优化函数的第二个参数开始。

查看返回结果:

In [5]:

  1. print result
  1. status: 0
  2. success: True
  3. njev: 18
  4. nfev: 54
  5. hess_inv: array([[ 8110.515]])
  6. fun: -0.10204079220645729
  7. x: array([ 45.02])
  8. message: 'Optimization terminated successfully.'
  9. jac: array([ 0.])

Rosenbrock 函数¶

Rosenbrock 函数是一个用来测试优化函数效果的一个非凸函数:

$f(x)=\sum\limits{i=1}^{N-1}{100\left(x{i+1}^2 - xi\right) ^2 + \left(1-x{i}\right)^2 }$

导入该函数:

In [6]:

  1. from scipy.optimize import rosen
  2. from mpl_toolkits.mplot3d import Axes3D

使用 N = 2 的 Rosenbrock 函数:

In [7]:

  1. x, y = meshgrid(np.linspace(-2,2,25), np.linspace(-0.5,3.5,25))
  2. z = rosen([x,y])

图像和最低点 (1,1)

In [8]:

  1. fig = figure(figsize=(12,5.5))
  2. ax = fig.gca(projection="3d")
  3. ax.azim = 70; ax.elev = 48
  4. ax.set_xlabel("X"); ax.set_ylabel("Y")
  5. ax.set_zlim((0,1000))
  6. p = ax.plot_surface(x,y,z,rstride=1, cstride=1, cmap=cm.jet)
  7. rosen_min = ax.plot([1],[1],[0],"ro")

04.05 最小化函数 - 图2

传入初始值:

In [9]:

  1. x0 = [1.3, 1.6, -0.5, -1.8, 0.8]
  2. result = minimize(rosen, x0)
  3. print result.x
  1. [ 1. 1. 1. 1. 1.]

随机给定初始值:

In [10]:

  1. x0 = np.random.randn(10)
  2. result = minimize(rosen, x0)
  3. print x0
  4. print result.x
  1. [ 0.815 -2.086 0.297 1.079 -0.528 0.461 -0.13 -0.715 0.734 0.621]
  2. [-0.993 0.997 0.998 0.999 0.999 0.999 0.998 0.997 0.994 0.988]

对于 N > 3,函数的最小值为 $(x_1,x_2, …, x_N) = (1,1,…,1)$,不过有一个局部极小值点 $(x_1,x_2, …, x_N) = (-1,1,…,1)$,所以随机初始值如果选的不好的话,有可能返回的结果是局部极小值点:

优化方法¶

BFGS 算法¶

minimize 函数默认根据问题是否有界或者有约束,使用 'BFGS', 'L-BFGS-B', 'SLSQP' 中的一种。

可以查看帮助来得到更多的信息:

In [11]:

  1. info(minimize)
  1. minimize(fun, x0, args=(), method=None, jac=None, hess=None, hessp=None,
  2. bounds=None, constraints=(), tol=None, callback=None, options=None)
  3.  
  4. Minimization of scalar function of one or more variables.
  5.  
  6. Parameters
  7. ----------
  8. fun : callable
  9. Objective function.
  10. x0 : ndarray
  11. Initial guess.
  12. args : tuple, optional
  13. Extra arguments passed to the objective function and its
  14. derivatives (Jacobian, Hessian).
  15. method : str or callable, optional
  16. Type of solver. Should be one of
  17.  
  18. - 'Nelder-Mead'
  19. - 'Powell'
  20. - 'CG'
  21. - 'BFGS'
  22. - 'Newton-CG'
  23. - 'Anneal (deprecated as of scipy version 0.14.0)'
  24. - 'L-BFGS-B'
  25. - 'TNC'
  26. - 'COBYLA'
  27. - 'SLSQP'
  28. - 'dogleg'
  29. - 'trust-ncg'
  30. - custom - a callable object (added in version 0.14.0)
  31.  
  32. If not given, chosen to be one of ``BFGS``, ``L-BFGS-B``, ``SLSQP``,
  33. depending if the problem has constraints or bounds.
  34. jac : bool or callable, optional
  35. Jacobian (gradient) of objective function. Only for CG, BFGS,
  36. Newton-CG, L-BFGS-B, TNC, SLSQP, dogleg, trust-ncg.
  37. If `jac` is a Boolean and is True, `fun` is assumed to return the
  38. gradient along with the objective function. If False, the
  39. gradient will be estimated numerically.
  40. `jac` can also be a callable returning the gradient of the
  41. objective. In this case, it must accept the same arguments as `fun`.
  42. hess, hessp : callable, optional
  43. Hessian (matrix of second-order derivatives) of objective function or
  44. Hessian of objective function times an arbitrary vector p. Only for
  45. Newton-CG, dogleg, trust-ncg.
  46. Only one of `hessp` or `hess` needs to be given. If `hess` is
  47. provided, then `hessp` will be ignored. If neither `hess` nor
  48. `hessp` is provided, then the Hessian product will be approximated
  49. using finite differences on `jac`. `hessp` must compute the Hessian
  50. times an arbitrary vector.
  51. bounds : sequence, optional
  52. Bounds for variables (only for L-BFGS-B, TNC and SLSQP).
  53. ``(min, max)`` pairs for each element in ``x``, defining
  54. the bounds on that parameter. Use None for one of ``min`` or
  55. ``max`` when there is no bound in that direction.
  56. constraints : dict or sequence of dict, optional
  57. Constraints definition (only for COBYLA and SLSQP).
  58. Each constraint is defined in a dictionary with fields:
  59. type : str
  60. Constraint type: 'eq' for equality, 'ineq' for inequality.
  61. fun : callable
  62. The function defining the constraint.
  63. jac : callable, optional
  64. The Jacobian of `fun` (only for SLSQP).
  65. args : sequence, optional
  66. Extra arguments to be passed to the function and Jacobian.
  67. Equality constraint means that the constraint function result is to
  68. be zero whereas inequality means that it is to be non-negative.
  69. Note that COBYLA only supports inequality constraints.
  70. tol : float, optional
  71. Tolerance for termination. For detailed control, use solver-specific
  72. options.
  73. options : dict, optional
  74. A dictionary of solver options. All methods accept the following
  75. generic options:
  76. maxiter : int
  77. Maximum number of iterations to perform.
  78. disp : bool
  79. Set to True to print convergence messages.
  80. For method-specific options, see :func:`show_options()`.
  81. callback : callable, optional
  82. Called after each iteration, as ``callback(xk)``, where ``xk`` is the
  83. current parameter vector.
  84.  
  85. Returns
  86. -------
  87. res : OptimizeResult
  88. The optimization result represented as a ``OptimizeResult`` object.
  89. Important attributes are: ``x`` the solution array, ``success`` a
  90. Boolean flag indicating if the optimizer exited successfully and
  91. ``message`` which describes the cause of the termination. See
  92. `OptimizeResult` for a description of other attributes.
  93.  
  94.  
  95. See also
  96. --------
  97. minimize_scalar : Interface to minimization algorithms for scalar
  98. univariate functions
  99. show_options : Additional options accepted by the solvers
  100.  
  101. Notes
  102. -----
  103. This section describes the available solvers that can be selected by the
  104. 'method' parameter. The default method is *BFGS*.
  105.  
  106. **Unconstrained minimization**
  107.  
  108. Method *Nelder-Mead* uses the Simplex algorithm [1]_, [2]_. This
  109. algorithm has been successful in many applications but other algorithms
  110. using the first and/or second derivatives information might be preferred
  111. for their better performances and robustness in general.
  112.  
  113. Method *Powell* is a modification of Powell's method [3]_, [4]_ which
  114. is a conjugate direction method. It performs sequential one-dimensional
  115. minimizations along each vector of the directions set (`direc` field in
  116. `options` and `info`), which is updated at each iteration of the main
  117. minimization loop. The function need not be differentiable, and no
  118. derivatives are taken.
  119.  
  120. Method *CG* uses a nonlinear conjugate gradient algorithm by Polak and
  121. Ribiere, a variant of the Fletcher-Reeves method described in [5]_ pp.
  122. 120-122. Only the first derivatives are used.
  123.  
  124. Method *BFGS* uses the quasi-Newton method of Broyden, Fletcher,
  125. Goldfarb, and Shanno (BFGS) [5]_ pp. 136. It uses the first derivatives
  126. only. BFGS has proven good performance even for non-smooth
  127. optimizations. This method also returns an approximation of the Hessian
  128. inverse, stored as `hess_inv` in the OptimizeResult object.
  129.  
  130. Method *Newton-CG* uses a Newton-CG algorithm [5]_ pp. 168 (also known
  131. as the truncated Newton method). It uses a CG method to the compute the
  132. search direction. See also *TNC* method for a box-constrained
  133. minimization with a similar algorithm.
  134.  
  135. Method *Anneal* uses simulated annealing, which is a probabilistic
  136. metaheuristic algorithm for global optimization. It uses no derivative
  137. information from the function being optimized.
  138.  
  139. Method *dogleg* uses the dog-leg trust-region algorithm [5]_
  140. for unconstrained minimization. This algorithm requires the gradient
  141. and Hessian; furthermore the Hessian is required to be positive definite.
  142.  
  143. Method *trust-ncg* uses the Newton conjugate gradient trust-region
  144. algorithm [5]_ for unconstrained minimization. This algorithm requires
  145. the gradient and either the Hessian or a function that computes the
  146. product of the Hessian with a given vector.
  147.  
  148. **Constrained minimization**
  149.  
  150. Method *L-BFGS-B* uses the L-BFGS-B algorithm [6]_, [7]_ for bound
  151. constrained minimization.
  152.  
  153. Method *TNC* uses a truncated Newton algorithm [5]_, [8]_ to minimize a
  154. function with variables subject to bounds. This algorithm uses
  155. gradient information; it is also called Newton Conjugate-Gradient. It
  156. differs from the *Newton-CG* method described above as it wraps a C
  157. implementation and allows each variable to be given upper and lower
  158. bounds.
  159.  
  160. Method *COBYLA* uses the Constrained Optimization BY Linear
  161. Approximation (COBYLA) method [9]_, [10]_, [11]_. The algorithm is
  162. based on linear approximations to the objective function and each
  163. constraint. The method wraps a FORTRAN implementation of the algorithm.
  164.  
  165. Method *SLSQP* uses Sequential Least SQuares Programming to minimize a
  166. function of several variables with any combination of bounds, equality
  167. and inequality constraints. The method wraps the SLSQP Optimization
  168. subroutine originally implemented by Dieter Kraft [12]_. Note that the
  169. wrapper handles infinite values in bounds by converting them into large
  170. floating values.
  171.  
  172. **Custom minimizers**
  173.  
  174. It may be useful to pass a custom minimization method, for example
  175. when using a frontend to this method such as `scipy.optimize.basinhopping`
  176. or a different library. You can simply pass a callable as the ``method``
  177. parameter.
  178.  
  179. The callable is called as ``method(fun, x0, args, **kwargs, **options)``
  180. where ``kwargs`` corresponds to any other parameters passed to `minimize`
  181. (such as `callback`, `hess`, etc.), except the `options` dict, which has
  182. its contents also passed as `method` parameters pair by pair. Also, if
  183. `jac` has been passed as a bool type, `jac` and `fun` are mangled so that
  184. `fun` returns just the function values and `jac` is converted to a function
  185. returning the Jacobian. The method shall return an ``OptimizeResult``
  186. object.
  187.  
  188. The provided `method` callable must be able to accept (and possibly ignore)
  189. arbitrary parameters; the set of parameters accepted by `minimize` may
  190. expand in future versions and then these parameters will be passed to
  191. the method. You can find an example in the scipy.optimize tutorial.
  192.  
  193. .. versionadded:: 0.11.0
  194.  
  195. References
  196. ----------
  197. .. [1] Nelder, J A, and R Mead. 1965. A Simplex Method for Function
  198. Minimization. The Computer Journal 7: 308-13.
  199. .. [2] Wright M H. 1996. Direct search methods: Once scorned, now
  200. respectable, in Numerical Analysis 1995: Proceedings of the 1995
  201. Dundee Biennial Conference in Numerical Analysis (Eds. D F
  202. Griffiths and G A Watson). Addison Wesley Longman, Harlow, UK.
  203. 191-208.
  204. .. [3] Powell, M J D. 1964. An efficient method for finding the minimum of
  205. a function of several variables without calculating derivatives. The
  206. Computer Journal 7: 155-162.
  207. .. [4] Press W, S A Teukolsky, W T Vetterling and B P Flannery.
  208. Numerical Recipes (any edition), Cambridge University Press.
  209. .. [5] Nocedal, J, and S J Wright. 2006. Numerical Optimization.
  210. Springer New York.
  211. .. [6] Byrd, R H and P Lu and J. Nocedal. 1995. A Limited Memory
  212. Algorithm for Bound Constrained Optimization. SIAM Journal on
  213. Scientific and Statistical Computing 16 (5): 1190-1208.
  214. .. [7] Zhu, C and R H Byrd and J Nocedal. 1997. L-BFGS-B: Algorithm
  215. 778: L-BFGS-B, FORTRAN routines for large scale bound constrained
  216. optimization. ACM Transactions on Mathematical Software 23 (4):
  217. 550-560.
  218. .. [8] Nash, S G. Newton-Type Minimization Via the Lanczos Method.
  219. 1984. SIAM Journal of Numerical Analysis 21: 770-778.
  220. .. [9] Powell, M J D. A direct search optimization method that models
  221. the objective and constraint functions by linear interpolation.
  222. 1994. Advances in Optimization and Numerical Analysis, eds. S. Gomez
  223. and J-P Hennart, Kluwer Academic (Dordrecht), 51-67.
  224. .. [10] Powell M J D. Direct search algorithms for optimization
  225. calculations. 1998. Acta Numerica 7: 287-336.
  226. .. [11] Powell M J D. A view of algorithms for optimization without
  227. derivatives. 2007.Cambridge University Technical Report DAMTP
  228. 2007/NA03
  229. .. [12] Kraft, D. A software package for sequential quadratic
  230. programming. 1988. Tech. Rep. DFVLR-FB 88-28, DLR German Aerospace
  231. Center -- Institute for Flight Mechanics, Koln, Germany.
  232.  
  233. Examples
  234. --------
  235. Let us consider the problem of minimizing the Rosenbrock function. This
  236. function (and its respective derivatives) is implemented in `rosen`
  237. (resp. `rosen_der`, `rosen_hess`) in the `scipy.optimize`.
  238.  
  239. >>> from scipy.optimize import minimize, rosen, rosen_der
  240.  
  241. A simple application of the *Nelder-Mead* method is:
  242.  
  243. >>> x0 = [1.3, 0.7, 0.8, 1.9, 1.2]
  244. >>> res = minimize(rosen, x0, method='Nelder-Mead')
  245. >>> res.x
  246. [ 1. 1. 1. 1. 1.]
  247.  
  248. Now using the *BFGS* algorithm, using the first derivative and a few
  249. options:
  250.  
  251. >>> res = minimize(rosen, x0, method='BFGS', jac=rosen_der,
  252. ... options={'gtol': 1e-6, 'disp': True})
  253. Optimization terminated successfully.
  254. Current function value: 0.000000
  255. Iterations: 52
  256. Function evaluations: 64
  257. Gradient evaluations: 64
  258. >>> res.x
  259. [ 1. 1. 1. 1. 1.]
  260. >>> print res.message
  261. Optimization terminated successfully.
  262. >>> res.hess
  263. [[ 0.00749589 0.01255155 0.02396251 0.04750988 0.09495377]
  264. [ 0.01255155 0.02510441 0.04794055 0.09502834 0.18996269]
  265. [ 0.02396251 0.04794055 0.09631614 0.19092151 0.38165151]
  266. [ 0.04750988 0.09502834 0.19092151 0.38341252 0.7664427 ]
  267. [ 0.09495377 0.18996269 0.38165151 0.7664427 1.53713523]]
  268.  
  269.  
  270. Next, consider a minimization problem with several constraints (namely
  271. Example 16.4 from [5]_). The objective function is:
  272.  
  273. >>> fun = lambda x: (x[0] - 1)**2 + (x[1] - 2.5)**2
  274.  
  275. There are three constraints defined as:
  276.  
  277. >>> cons = ({'type': 'ineq', 'fun': lambda x: x[0] - 2 * x[1] + 2},
  278. ... {'type': 'ineq', 'fun': lambda x: -x[0] - 2 * x[1] + 6},
  279. ... {'type': 'ineq', 'fun': lambda x: -x[0] + 2 * x[1] + 2})
  280.  
  281. And variables must be positive, hence the following bounds:
  282.  
  283. >>> bnds = ((0, None), (0, None))
  284.  
  285. The optimization problem is solved using the SLSQP method as:
  286.  
  287. >>> res = minimize(fun, (2, 0), method='SLSQP', bounds=bnds,
  288. ... constraints=cons)
  289.  
  290. It should converge to the theoretical solution (1.4 ,1.7).

默认没有约束时,使用的是 BFGS 方法

利用 callback 参数查看迭代的历史:

In [12]:

  1. x0 = [-1.5, 4.5]
  2. xi = [x0]
  3. result = minimize(rosen, x0, callback=xi.append)
  4. xi = np.asarray(xi)
  5. print xi.shape
  6. print result.x
  7. print "in {} function evaluations.".format(result.nfev)
  1. (37L, 2L)
  2. [ 1. 1.]
  3. in 200 function evaluations.

绘图显示轨迹:

In [13]:

  1. x, y = meshgrid(np.linspace(-2.3,1.75,25), np.linspace(-0.5,4.5,25))
  2. z = rosen([x,y])
  3. fig = figure(figsize=(12,5.5))
  4. ax = fig.gca(projection="3d"); ax.azim = 70; ax.elev = 75
  5. ax.set_xlabel("X"); ax.set_ylabel("Y"); ax.set_zlim((0,1000))
  6. p = ax.plot_surface(x,y,z,rstride=1, cstride=1, cmap=cm.jet)
  7. intermed = ax.plot(xi[:,0], xi[:,1], rosen(xi.T), "g-o")
  8. rosen_min = ax.plot([1],[1],[0],"ro")

04.05 最小化函数 - 图3

BFGS 需要计算函数的 Jacobian 矩阵:

给定 $\left[y_1,y_2,y_3\right] = f(x_0, x_1, x_2)$

J=\left[ \begin{matrix} \frac{\partial y_1}{\partial x_0} & \frac{\partial y_1}{\partial x_1} & \frac{\partial y_1}{\partial x_2} \\ \frac{\partial y_2}{\partial x_0} & \frac{\partial y_2}{\partial x_1} & \frac{\partial y_2}{\partial x_2} \\ \frac{\partial y_3}{\partial x_0} & \frac{\partial y_3}{\partial x_1} & \frac{\partial y_3}{\partial x_2} \end{matrix} \right]

在我们的例子中

J= \left[ \begin{matrix}\frac{\partial rosen}{\partial x_0} & \frac{\partial rosen}{\partial x_1} \end{matrix} \right]

导入 rosen 函数的 Jacobian 函数 rosen_der

In [14]:

  1. from scipy.optimize import rosen_der

此时,我们将 Jacobian 矩阵作为参数传入:

In [15]:

  1. xi = [x0]
  2. result = minimize(rosen, x0, jac=rosen_der, callback=xi.append)
  3. xi = np.asarray(xi)
  4. print xi.shape
  5. print "in {} function evaluations and {} jacobian evaluations.".format(result.nfev, result.njev)
  1. (38L, 2L)
  2. in 49 function evaluations and 49 jacobian evaluations.

可以看到,函数计算的开销大约减少了一半,迭代路径与上面的基本吻合:

In [16]:

  1. x, y = meshgrid(np.linspace(-2.3,1.75,25), np.linspace(-0.5,4.5,25))
  2. z = rosen([x,y])
  3. fig = figure(figsize=(12,5.5))
  4. ax = fig.gca(projection="3d"); ax.azim = 70; ax.elev = 75
  5. ax.set_xlabel("X"); ax.set_ylabel("Y"); ax.set_zlim((0,1000))
  6. p = ax.plot_surface(x,y,z,rstride=1, cstride=1, cmap=cm.jet)
  7. intermed = ax.plot(xi[:,0], xi[:,1], rosen(xi.T), "g-o")
  8. rosen_min = ax.plot([1],[1],[0],"ro")

04.05 最小化函数 - 图4

Nelder-Mead Simplex 算法¶

改变 minimize 使用的算法,使用 Nelder–Mead 单纯形算法

In [17]:

  1. xi = [x0]
  2. result = minimize(rosen, x0, method="nelder-mead", callback = xi.append)
  3. xi = np.asarray(xi)
  4. print xi.shape
  5. print "Solved the Nelder-Mead Simplex method with {} function evaluations.".format(result.nfev)
  1. (120L, 2L)
  2. Solved the Nelder-Mead Simplex method with 226 function evaluations.

In [18]:

  1. x, y = meshgrid(np.linspace(-1.9,1.75,25), np.linspace(-0.5,4.5,25))
  2. z = rosen([x,y])
  3. fig = figure(figsize=(12,5.5))
  4. ax = fig.gca(projection="3d"); ax.azim = 70; ax.elev = 75
  5. ax.set_xlabel("X"); ax.set_ylabel("Y"); ax.set_zlim((0,1000))
  6. p = ax.plot_surface(x,y,z,rstride=1, cstride=1, cmap=cm.jet)
  7. intermed = ax.plot(xi[:,0], xi[:,1], rosen(xi.T), "g-o")
  8. rosen_min = ax.plot([1],[1],[0],"ro")

04.05 最小化函数 - 图5

Powell 算法¶

使用 Powell 算法

In [19]:

  1. xi = [x0]
  2. result = minimize(rosen, x0, method="powell", callback=xi.append)
  3. xi = np.asarray(xi)
  4. print xi.shape
  5. print "Solved Powell's method with {} function evaluations.".format(result.nfev)
  1. (31L, 2L)
  2. Solved Powell's method with 855 function evaluations.

In [20]:

  1. x, y = meshgrid(np.linspace(-2.3,1.75,25), np.linspace(-0.5,4.5,25))
  2. z = rosen([x,y])
  3. fig = figure(figsize=(12,5.5))
  4. ax = fig.gca(projection="3d"); ax.azim = 70; ax.elev = 75
  5. ax.set_xlabel("X"); ax.set_ylabel("Y"); ax.set_zlim((0,1000))
  6. p = ax.plot_surface(x,y,z,rstride=1, cstride=1, cmap=cm.jet)
  7. intermed = ax.plot(xi[:,0], xi[:,1], rosen(xi.T), "g-o")
  8. rosen_min = ax.plot([1],[1],[0],"ro")

04.05 最小化函数 - 图6

原文: https://nbviewer.jupyter.org/github/lijin-THU/notes-python/blob/master/04-scipy/04.05-minimization-in-python.ipynb