**Format****Pages****Chapters**

# Implementation of Derivative Free Optimization Methods

**Implementation of Derivative Free Optimization Methods**

**ABSTRACT**

Let f be a continuous function on Rn, and supposed f is a smooth nonlinear function, such functions arise in many applications, and very often minimizers are points at which f is not differentiable. Of particular interest is the case where the gradient and the Hessian cannot be computed for any x. In this thesis, two methods are presented for implementation of derivative free optimization. The finite difference representation of the gradient and Hessian in Quasi Newton method and the derivative free Trust Region method. We showed that if f has a unique solution, then the set of all the step length (h) generated by the algorithm converges globally. Three test problems are presented and with the use of MATLAB (R2007b) software the effectiveness of the methods is shown. Numerical results are presented demonstrating the robustness of the algorithm and the result compared favourably with some existing algorithms.

**TABLE OF CONTENTS**

Title page——————————————————————————————–ii

Certification—————————————————————————————-iii

Dedication——————————————————————————————iv

Acknowledgement———————————————————————————v

Table of contents————————————————————————————vi

Abstract——————————————————————————————–viii

Chapter One : Introduction

Background to the study——————————————————————1

Definitions of terms ———————————————————————-2

Justification of study———————————————————————-5

Objective of the study——————————————————————-12

Research method————————————————————————-13

Chapter Two: Literature Review

2.1 Introduction——————————————————————————-14

2.2 History of Derivative-free Optimization method————————————18

2.3 Features of Derivative-free Optimization———————————————20

2.4 Need for Derivative-free Optimization ———————————————–22

2.5 Quasi-Newton methods —————————————————————–23

2.6 The Trust-Region methods ————————————————————-24

Chapter Three: Derivative based optimization

3.1 Quasi-Newton method——————————————————————-25

3.2 Quasi-Newton Algorithm —————————————————————25

3.3 Trust-Region method ——————————————————————–28

3.4 Trust-Region Algorithm —————————————————————–31

Chapter Four: Methodology and results——————————————————–36

Chapter Five: Summary and Conclusion ——————————————————-42

5.1 Summary ———————————————————————————–42

5.2 Findings ————————————————————————————42

5.3 Contribution to Knowledge ————————————————————–43

5.4 Conclusion ———————————————————————————-43

References——————————————————————————————–44

Appendix (program source code- MATLAB code R2007b) ———————————–46

**CHAPTER ONE**

**INTRODUCTION**

**Background to the Study**

This research is centered on optimizing a function of several variables, whose derivative is unavailable.

Each and every one of us takes decisions in the course of our day-to-day activities, in order to accomplish certain tasks. Usually, there are several, perhaps many possible ways of accomplishing these tasks. Although some choices will generally be better than others, consciously or unconsciously, we must therefore decide upon the best-or optimal-way to realize our objectives.

For example, all of us at one time or another, find it necessary to drive through city traffic. We could attempt to find the shortest possible route from point A to point B without concern for the time required to traverse this route, or alternately, we could seek out the quickest though not necessarily the shortest route between A and B. As a compromise, we might attempt to find the shortest path from A to B subject to the auxiliary condition that the transit time, does not exceed some prescribed value.

In a classical sense, Optimization can be defined as the art of obtaining best policies to satisfy certain objectives, at the same time satisfying fixed requirements. However, recent advances in Applied Mathematics, Operations Research, and Digital-Computer Technology enable many complex industrial problems in engineering and economics to be optimized successfully by the application of logical and systematic techniques. The development of new and increasingly powerful optimization techniques is proliferating rapidly.

Optimization has been playing an important role in many branches of science and technology such as engineering, finance, probability and statistics. There are many optimization algorithms that have been developed to locate the optima of continuous objective functions. It is obvious that if a point x* corresponds to the maximum of a function f(x), the same point corresponding to the minimum value of the function – f(x). Thus, optimization can be taken to be maximization.

There is no single method available for solving all optimization problems efficiently. Hence, a number of methods have been developed for solving different types of problems.

The existence of optimization can be traced back to Newton, Lagrange and Cauchy. The development of differential methods for optimization was possible because of the contribution of Newton and Leibnitz. The foundations of the calculus of variations were laid by Bernoulli, Euler, Lagrange and Weierstrass. Constrained optimization was first studied by Lagrange and the notion of descent was introduced by Cauchy.

Despite these early contributions, very little progress was made till the 20th century, when computer power made the implementation of optimization procedures possible and this turn stimulated further research methods.

**DEFINITIONS OF TERMS:**

**Optimization:** Optimization can be defined as the process that selects the best possible decision for a given set of circumstances without having to enumerate all of the possibilities. From experience, designers learn to recognize good proportions and critical restrictions so that their preliminary work will not require significant modification and improvement. It can also be defined as the art of obtaining best policies to satisfy certain objectives, at the same time satisfying fixed requirements.

Derivative-free optimization, the goal is to optimize a function defined on a subset of ?n for which derivative information is neither symbolically available nor numerically computable.

Unconstrained minimization is the problem of finding a vector x that is a local minimum to a scalar function f(x):

Min f(x) (1.1) x

The term unconstrained means that no restriction is placed on the range of x.

Quasi Newton method means methods, which begin the search along a gradient line and use gradient information to build a quadratic fit to the economic model (profit function). Consequently, to understand these methods it is helpful to discuss the gradient search algorithm and Newton’s method as background for the extension to the quasi-Newton algorithms.

Trust region is a term used in mathematical optimization to denote the subset of the region of the objective function that is approximated using a model function (often a quadratic). Trust region methods are also known as restricted step methods.

A quadratic model is mk (xk + pk) = f(xk) + gk, p + p, Hk

Where, at iteration k, xk is the current iterate, gk∈ Rn, and Hk is a symmetric matrix of dimension n.

Line Search can be defined, as one of two basic iterative approaches to find a local minimum x ∗ of an objective function. It first finds a descent direction along which the objective function f will be reduced and then computes a step size that determines how far x should move along that direction.

Descent direction is a vector p p ∈ R n that moves us closer towards a local minimum x* x ∗ of our objective function f : Rf: R.

f : R n → R Constrained Optimization (in some contexts called constraint optimization) is the process of optimizing an objective function with respect to some variables in the presence of constraints on those variables. The objective function is either a cost function or energy function which is to be minimized, or a reward function or utility function, which is to be maximized.

**Lipschitz Constant:**The term is used for a bound on the modulus of a continuous In particular, a function f:[a,b] →R is said to satisfy the Lipschitz condition if there is a constant M such that

|f(x)−f(x′)|≤M|x−x′|∀x,x′∈[a,b] . (1.2)

The smallest constant M satisfying (1.2) is called Lipschitz constant.

(10) Symmetric positive definite matrix: A matrix A is symmetric positive definite if xTAx > 0 for any nonzero vector x.

(11) Steepest descent method: It is an algorithms for finding the local minimum of a function which presuppose that gradient of the can be computed.

**1.3 Justification of study**

In the past decades, it is well known that extensive useful information is contained in the derivatives of any function one wishes to optimize. However, for a variety of reasons there have always been many instances where (at least some) derivatives are unavailable or numerically unreliable.

Derivative free optimization (DFO) methods are designed for solving nonlinear optimization problems where the derivatives of the involved function are not available. Increasing complexity in mathematical modeling, higher sophistication of scientific computing, and an abundance of legacy codes are some of the reasons why derivative-free optimization is currently an area of great demand. There is a high demand from practitioners for such algorithms because this kind of problems occur relatively frequently in the industry. Optimization problems, in which the derivatives cannot be computed, arise in modern physical, chemical and econometric measurements and in engineering applications, where computer simulation is employed for the evaluation of the objective functions.

In derivative-free optimization (also known as black-box optimization), the goal is to optimize a function defined on a subset of Rn for which derivative information is neither symbolically available nor numerically computable, and bounds on Lipchitz constants are not known. Since the beginning of the field in 1961 with the algorithm by Hooke and Jeeves, numerous algorithms and software implementations have been proposed. There are numerous derivative free optimization methods in practice. Some of the methods are Pattern search, Nelder and Mead (Torczon, 1991), Generating Set Search ( Kolda et al; 2003), Genetic method (Eshelman and Schaffer, 1993), stochastic clustering algorithms ( Csendes et al., 2008), particle swarm methods ( Eberhart and Kennedy, 1995), ant colony optimization ( Dorigo and Stutzle, 2008) and tabu search (Glover, 1995) to mention a few.

In this dissertation we are concern with the methods of Quasi Newton and Trust-Region Derivative-free Optimization. Just as in derivative-based optimization, the trust-region method is an iterative process in which a local minimization of the cost functional f is conducted. The region around the current iterate where the next iterate is trusted to be is called the trust-region. When an adequate subsequent iterate is found, the Trust-Region is expanded. Conversely, if the approximation is poor, the method is contracted. Trust region methods overcome the problems that line search methods encounter with non-speed approximate Hessians. In particular, a Newton trust region strategy allows the use of complete Hessian information even in regions where the Hessian has negative curvature. The specific trust region methods we will present effect a smooth transition from the steepest descent direction to the Newton direction in a way that gives the global convergence properties of steepest descent and the fast local convergence of Newton’s method.

The idea is very simple. We let be the radius of the ball about xk in which the quadratic model

Mk(x) = f(xk) + f(xk)T(x – xk) + (x – xk)THk(x – xk)

can be trusted to accurately represent the function. is called the trust region radius and the ball

( = { x | ⃦ x – xk ⃦ }

is called the trust region.

We compute the new point x+ by (approximately) minimizing mk over (The trust region problem for doing that is usually posed in terms of the difference pt between xk and the minimize of mk in the trust region

Min mk(xk+p). (1.3)

⃦x ⃦

We will refer to either the trial step pk or the trial solution xk = xt + pk as the solution to be trust region problem.

Having solved the trust region problem, one must decide whether to accept the step and/or to change the trust region radius. The trust region methods that we will discuss in detail approximate the solution of the trust region problem with the minimizer of the quadratic model along a piecewise linear path contained in the trust region.

Quasi-Newton methods update an approximation of 2f(x*) as the iteration progresses. In general the transition from current approximations xk and Hk of x* and 2f(x*) to new approximations x+ and H+ is given (using line search paradigm) by the following steps:

Compute a search direction h = – Hk-1f(xk).

Find xk = xt + using a line search to insure sufficient decrease.

Use xk, x+, and Hk to update Hk and obtain H+.

The way in which in H+ is computed determined the method.

If N = 1, all secant methods reduce to the classical secant method for the single nonlinear equation fI(x) = 0, i.e,

X+ = xk – , (1.4)

Where x– is the iterate previous to xk.

The standard quasi-Newton update for nonlinear is a rank-one update,

H+ = Hk + , (1.5)

This method does not preserve the structural properties needed for line search methods in optimization, namely, symmetry and positive definiteness, and could, in fact, encourage convergence to a local maximum. For that reason quasi-Newton methods in optimization are more complex than those used for nonlinear equations. The methods of analysis and implementation are more complex as well (Kelly, 1999).

Quasi-Newton methods are methods used to either find zeroes or local maxima and minima of functions, as an alternative to Newton’s method. They can be used if the Jacobian or Hessian is unavailable or is too expensive to compute at every iteration. The “full” Newton’s method requires the Jacobian in order to search for zeros, or the Hessian for finding extrema. More recently, Quasi-Newton methods have been applied to finding the solution of multiple coupled systems of equations (e.g. fluid-structure interaction problems or interaction problems in physics). They allow the solution to be found by solving each constituent system separately (which is simpler than the global system) in a cyclic, iterative fashion until the solution of the global system is found ( Xiaogang , Dong , Xingdong and Lu , 2006).

Differences between Newton’s and Quasi-Newton Methods

While similar to the full Newton’s Method, the Quasi-Newton Method has some distinct differences. These differences are summarized briefly in the table below,

Table 1.1 The differences between Newton and Quasi Newton methods

Newton’s Method Quasi-Newton Method

Computationally expensive Computationally cheap

Slow computation Fast(er) computation

Need to iteratively calculate second derivative No need for second derivative

Need to iteratively solve linear system of equations No need to solve linear system of equations

Less convergence steps More convergence steps

More precise convergence path Less precise convergence path

FOR NEWTON METHOD, WE HAVE XN=XN-1 – – (1.6)

While the Quasi-Newton method is h(Zk) = −B-1k∇ f(xk) (1.7)

Where Bk is Hessian which is symmetric and positive definite while ∇ f(xk) stands as the gradient.

**ADVANTAGES:**

While the last two rows in the table appear to be disadvantages of the Quasi-Newton Method, the faster computation time can end up balancing them out. For large and complicated problems, this balance is the benefit of the Quasi-Newton Methods over the full Newton’s Method owing to the overall faster solution time.

**DISADVANTAGES:**

The lack of precision in the Hessian calculation leads to slower convergence in terms of steps. Because of this, Quasi-Newton Methods can be slower (and thus worse) than using the full Newton’s Method. This occurs for simple problems where the extra computation time to actually compute the Hessian inverse is low. In this case, the full Newton’s Method is better anyway. An additional potential disadvantage to the Quasi-Newton Methods is the need to store the inverse Hessian approximation. This could require a large amount of memory and could thus be detrimental in the cases of large complicated systems.

Unconstrained problems arise as reformulations of constrained optimization problems, in which the constraints are replaced by penalization terms in the objective function that have the effect of discouraging constraint violations. The unconstrained optimization such as (Find Minimum and Find Maximum) and solve nonlinear equations (Find Root) and nonlinear fitting problems (Find Fit). All these functions work, in general, by doing a search, starting at some initial values and taking steps that decrease (or for Find Maximum, increase) an objective or merit function.

In this thesis, we compared our results with work of other authors. The works of Davidon (1972) and developed by Fletcher and Powell (1963), has been so successful that it has attracted a great deal of interest. Various theoretical studies, as well as new, related algorithms have been in the literature. In their publication, it was shown how DFP (Davidon – Fletcher – Powell) formula could be derived by solving a certain variational problem.

Powell (2009) also worked on local convergence of positive definite secant. The most widely used and well- known of them, is the Nelder-Mead algorithms (1965), popular for its simplicity and effectiveness, but at the same time notorious for its failure to converge even in simple cases. The following researchers Katya (2009), Necodal (2005), Wright and Necodal (2009), centered on the model-based algorithm for derivative-free unconstrained optimization for the theoretical stand point, and as a result, it was concluded that the problem of iterations and implementation of method be needed. The trust-region first appeared in papers by Levenberg (1944) and Marquardt (1963) for solving nonlinear least squares problems.

Powell, Fletcher, Hebden, Osborne (2002); also consider the class of derivative-free trust-region methods, which has been pioneered by Winfield (1969) and exhaustively studied for unconstrained and box-constrained problems by Powell (2008), Toint , Conn, Scheinberg, Vicente , Fasano, Morales, Nocedal, Gratton, and Troltzsch (2005), and for linearly constrained problems by Powell (2008). Global convergence results for the unconstrained case are presented. The models are based on polynomial interpolation. Essentially, the divergence between them consists in the interpolation set updating and the construction of the models. With regard to derivative based trust-region methods, for both constrained and unconstrained problems, well-established algorithms with global convergence results can be found in the literature. When the derivatives of the objective function are not available, there are also trust-region methods with good practical performance. However, until now, theoretical results have not been established for the constrained case.

Our research direction is focused on Derivative-Free Optimization which is justified by the work of Greenstadt (1978) who worked extensively on a case of unconstrained optimization using a Normalized Vector, Line Search and Gram-Schmidt Orthogonalization in optimizing the unconstrained optimization. He also used “Gershgorin’s Theorem” to gets the eigenvalues of matrix positive but was not implemented.

Here, our interest is to use finite difference approach in Trust Region Method and Quasi-Newton method for the Derivative-Free Optimization.

**1.4 OBJECTIVE OF THE STUDY**

The overall aim of the study is to implement derivative free algorithms in unconstrained problems.

1.4.1 The specific objectives of this study are:

(a). to implement the finite difference approach for derivative in Quasi-Newton algorithm.

(b). to implement the derivative free trust region method using finite difference methods.

(c). to determine the accuracy of the methods.

**1.5 RESEARCH METHODS**

In carrying out the research satisfactorily, we intend to use the finite difference approximation for the Gradient and Hessian. For Finite difference approach (Central difference) If we imagined f to be a function of just two variables, we use the following approximation.

= (1.8)

= (1.9)

= (1.10)

= (1.11)

= (1.12)

h is the step length in the above process.