Course literature:
Optimization (English) (Links to an external site.) by J. Lundgren, M. Rönnqvist, and P. Värbrand; Studentlitteratur 2010
Computer exercises and software
- The purpose of this computer exercise is to make you familiar with the use of software for computing solutions to linear programs.
- The following software is included:
- JuMP - An algebraic modelling language for linear, quadratic, and nonlinear constrained optimization problems embedded in Julia.
- Gurobi - A linear, integer linear, and quadratic programming solver with interfaces to MATLAB, JuMP, and Python.
- Atom - A free and open-source text and source code editor.
Lectures plan
Lecture | Content | Chapters in the course book |
---|---|---|
1 | Introduction, course map, modelling optimization applications | 1, 2.1-2.5, 3 |
2 | a) Julia/JuMP and optimization solvers; computer exercise on linear optimization. b) Supply chain (Assignment 1) |
|
3 | Convexity and linear optimization | 2.4, 4.1 |
4 | Linear optimization: basic feasible solutions | 4.2–4.4, 4.8 |
5 | Linear optimization: the simplex method | 4.5–4.10, (7.1) |
6 | Duality in linear optimization models | 6, (7.3, 7.5) |
7 | Linear optimization: post-optimal and sensitivity analysis | 5.1–5.5, (5.6) |
8 | a) Integer linear optimization: modelling, applications, complexity b) Optimization of maintenance scheduling (Assignment 2) |
13, 2.6 |
9 | Integer linear optimization: relaxations, bounds, branch&bound algorithm | 14.1–14.3, 15.1–15.3 |
10 | Integer linear optimization: good/ideal formulation, cutting planes, Lagrangean duality and relaxation | 14.4–14.5, 17.1–17.2, (17.3–17.4), 13.10–13.11, 15.4, (15.5) |
11 | Integer linear optimization: (non-)convexity, local/global optimality, heuristics and approximation algorithms | 16, 8.3 |
12 | Network flows: linear optimization formulations | 8.1–8.2, 8.4, (8.5), 18.1–18.5, (18.6–18.7), 13.5 |
13 | Minimum cost network flows: linear optimization formulations and algorithms | 8.6–8.7 |
14 | Overview of nonlinear optimization | 2.5.1, 9–12 |
15 | Example exam |
Problem solving sessions
Session | Topics | Exercises recommended for students to solve |
---|---|---|
1 | Mathematical modelling | 3.1 b,c,d; 3.5, 3.10, 3.12, 3.15 |
2 | Linear optimization and the simplex method | 2.4, 2.6, 4.2, 4.6, 4.10, 4.11, 4.15 |
3 | Sensitivity analysis and duality theory | 5.1, 5.5, 6.6, 6.8, 6.10, 6.15 |
4 | Integer linear optimization and the branch-and-bound algorithm | 13.5, 13.6, 13.9, 15.6 (13.8, 13.15, 15.3, 15.12, 15.14) |
5 | Cutting plane methods, minimal cover, Lagrangean duality | 14.4, 14.8, 17.9 (14.1, 14.3, 14.6, 14.9, 17.8) |
6 | Network optimization: minimum spanning tree and shortest path algorithms | 8.10, 8.12, 8.17a (8.18, 8.38ab) |
7 | Nonlinear optimization: convexity and the KKT conditions | 9.8, 9.10, 11.4, (9.4, 11.6) |
Lecture 1 - Introduction, course map, modelling optimization applications (Chapter 1, 2.1-5, 3)
Chapter 1 - Introduction to optimization
- A general optimization problem:
- P is a LP problem if:
- all functions
are linear
- all variables are continuous, i.e.,
- all functions
- P is a nonlinear problem if:
- at least one of the function
is a nonlinear function
- all variables are continuous, i.e.,
- at least one of the function
- P is a linear integer programming problem when a subset of the variables (at least one) is defined as integer variables.
Chapter 2 - Introductional examples and basic concepts
- Level curves of the objective function
- An optimal solution is a feasible solution where the objective function reaches its maximum (or minimal) value.
- Active constaints
- A constraint that doesn't affect the feasible region is called redundant.
- Linear programming solution properties
- Unique optimal solution - The unique solution is always found in a corner point of the feasible region.
- Alternative optimal solutions - When the best optimal objective function value is found in at least two corner points. However, we stop once we have found one optimal solution.
- Unbounded solution - This situation occurs in a practical problem when there is an error in the model formulation or in the input data.
- No solution exits - This situation appears when the constraints are defined so that there is no point that satisfies all constraints. Note that infeasibility only depends on the cojnstraints, and not on the objective function.
- Nonlinear optimization solution properties
- A nonlinear problem may have several local optima, and one or several of these are the global optima.
- If the nonlinear problem is a convex optimization problem, there is only one local optimum. If in addition the objective function is strictly convex (minimization problem) or strictly concave (maximization problem), the optimal solution is unique.
- Integer programming solution properties
- If we remove the integer requirements on the variables, we get an LP problem. This problem is called an LP relaxation of the integer programming problem.
- Some integer programming problems can be solved by first solving the LP relation, and then rounding the solution to obtain a feasible integer solution.
- Most solution methods are developed to find local optimum.
- In a convex problem, each local optimum is also a global optimum.
- All LP problems are convex, while nonlinear problems are often non-convex.
- Note that a set cannot be concave, it is either convex or non-convex sets.
2.5 Search methods - a general description
- Search methods start from a feasible point (solution) to the problem.
- The search method will generate a sequence of feasible points
, whose objective function values become successively better.
- In some search methods, e.g., the simplex method, the objective function value may be the same between two consecutive iterations.
2.5.1 Search method for continuous problems (primal method):
Chosen search direction should be:
- A feasible search direction
- An improving search direction
- If in the given point, we cannot find a feasible and improving search direction, we can conclude that we have found a local optimum.
- General description
- Step 0 - starting point
- Step 1 - search direction (we use some info about the gradient and the Hessian matrix of the objective function
in the point
. A descent direction
has the property
)
- Step 2 - convergence criterion (Predefined optimality conditions for the problem)
- Step 3 - step length
- step 4 - convergence (The general search method converges towards a local optimum if we choose an improving search direction and perform the line search properly. If the problem is convex, we also find the global optimum.)
2.5.2 Search method for integer programming problems (primal method)
- It is more difficult to use the concepts of search direction and step length.
- General description
- Step 0 - Start from a feasible point
. set
- Step 1 - Find the neighbouring integer solution
with the best objective function value
- Step 2 - Check convergence criterion. The point
is a local optimum if the point
does not have a better objective function
- Step 3 - Let
be the new solution
- Step 4 - Update
and go to Step 1
Note: Since integer programming problems are non-convex, we cannot guarantee that the solution obtained is a local optimum.
- Step 0 - Start from a feasible point
2.5.3 Primal and dual methods
- An alternative principal is to move between infeasible points in a systematic way, and ultimately in the last iteration finds a feasible solution. However, we then know that this solution is a local (possible global) optimum. This class of methods is called dual methods.
- Dual methods are also common for various classes of network problems in which the methods can be made very efficient.
- General description - dual search method:
- Step 0 - Relax some constraints and let
define the set of feasible solutions that are defined by the remaining constraints. Set
.
- Step 1 - Solve the relaxed optimization problem min
, s.t.
-> the point
- Step 2 - If the point
, then all constraints in the original problem are satisfied -> Stop! The point
is an optimal solution to the problem.
- Step 3 - Idenfity a constraint that is not ssatisfied by the point
and let
denote all feasible solutions with the respect to this constraint. Add the constraint to the problem, i.e., update
- Step 4 - Update
and go to Step 1.
- Step 0 - Relax some constraints and let
Chapter 3 - Modelling
3.1 Index and summation
- Amount of raw material
that is blended into product
during time period
- Introducing a general notation
for the number of products, number of raw materials, number of time periods and number of resources, respectively.
3.2 Production planning
- General formulation of this problem:
- Variables:
=number of units of model
produced during month
,
=number of units of model
in inventory at the end of month
,
- Parameters:
=cost to produce one unit of product
in time period
,
,
=inventory cost for one unit of product
in period
,
,
=number of hours product
requires in department
,
,
=capacity in department
in period
,
,
- Objective:
min - Constraints:
3.3 Transport and distribution
This optimization problem can be expressed in a network consisting of nodes and arcs.
3.3.1 Transportation problem
- General formulation of this problem
- Variables:
=flow from facility
to customer
,
- Parameters:
=unit transportation cost from facility
to customer
,
=supply at facility
=demand at customer
- Objective:
min - Constraints:
(supply)
(demand)
3.3.2 Transshipment problem
- General formulation of this problem
- Variables:
=flow from facility
to customer
,
=flow from facility
to to distributioncenter
,
=flow from distribution center
to customer
,
- Parameters:
=unit transportation cost between facility
and distribution center
=unit transportation cost between distribution center
and customer
.
- Objective:
min - Constraints:
(supply)
(distribution)
3.3.3 Facility location
- Variables:
=flow from facility
to customer
,
=if facility
is used or not
- Parameters:
=fixed cost for facility
- Objective:
min - constraints:
(supply)
(demand)
The model is an integer programming model.
3.4 Blending problem
3.4.1 Simple metal blending
3.4.2 Blending for fodder
3.4.3 Pooling problem
The model is a nonlinear model
3.5 Allocation models
3.5.1 Investment strategy
The model is an LP model
3.5.2 Investment over several time periods
- Pay attention to the illustration of the possible alternatives
- The model is a LP model
3.5.3 Loading of tankers
The model is a mixed integer programming model because it includes both integer and continuous variables.
Lecture 2 JuMP and Gurobi
- JuMP is an open-source modelling language for mathematical optimization, and is a package in Julia
- Gurobi
- Optimization software package for solving linear and quadratic optimization problems with continuous and integer variables
- Originally based on the simplex method, implemented in C
- The primal and dual simplex methods (Lecture 3-5)
- The barrier method
- Techniques for avoiding degeneracy (Lecture 4)
- Generating cutting planes (Lecture 9)
- The branch & bound algorithm (Lecture 8)
- Heuristic methods (Lecture 10)
- Solvers that work with JuMP
- Gurobi - solves linear and quadratic optimization problems with continuous and integer variables
- CPLEX
- MOSEK
- Clp - free, solves linear optimization problems with continuous variables
- Cbc - free, solves linear optimization problems with continuous and integer variables
- Baron, lpopt, Knitro, Nlopt, Xpress, etc
- The diet problem - mathematical model
- Sets:
- Variables:
- Parameters:
- Sets:
- The sensitivity analysis as described is possible only for linear programs in continuous variables (not for integer/binary; this is due to theoretical properties (Chapter 5))
Lecture 3 Convexity and linear optimization (Chapter 2.4, 4.1)
2.4 Basic convexity analysis
- A general optimization problem written as:
min
s.t. - Definition 2.1
A feasible pointis a global optimum if there is no other feasible points with a better objective function value, i.e.
- Definition 2.2
A feasible pointis a local optimum if in a small neighbourhood
to
there is no other feasible point with a better objective function value, i.e.
- Definition 2.3
The problem
is a convex problem ifis a convex function and
is a convex set. A problem is also convex if a maximization problem and the function
is concave and
is a convex set.
- In a convex problem, each local optimum is also a global optimum.
- All LP problems are convex, while nonlinear problems are often non-convex
- Integer programming problems are always non-convex.
- Definition 2.4
The functionis a convex function on the feasible region
if for all choices of points
and
we have that
- Definition 2.5
A setis a convex set if for any pair of points
and
we have
Chapter 4 - Linear programming and the simplex method
4.1 Mathemetical characteristics of LP problems
- LP problem in general form
- Theorem 4.1: The feasible region of an LP problem is a convex set.
- Definition 4.1: A convex combination of the points
is a point
such that
- Definition 4.2: The point
is an extreme point in
if it is not possible to express
as a strict (0 <
< 1) convex combination of two points in
.
- Theorem 4.2 (Fundamental theorem of linear programming): Assume that the feasible region
of an LP problem is bounded and non-empty, i.e. there exists a bounded optimal solution to the LP problem. Then the minimum value of the objective function
is obtained in (at least) one extreme point
in
.
Lecture 4 Linear optimization: basic feasible solutions (Chapter 4.2-4.4, 4.8)
4.2 Standard form
- Standard form has to satisfy the following criteria:
- All constraints as equality constraints
- Convert all variables to non-negative variables
- In addition, constraints are usually formulated so that right-hand-side(rhs) coefficients are non-negative
- Slack variable (also called surplus variable) is added to left-hand-side in the constraint.
- Slack variable is non-negative and can be interpreted as "exceeded rhs-resource"
- A free variable
can be replaced by tw onew non-negative variables according to the substitution
, where
and
- A general LP problem written in standard form can be formulated as:
- A system of equations in standard form normally has significantly more variables than equations. (under-determined)
- With the assumption that the rows in
are linearly independent, the difference between the number of variables (n) and the number of equations (m),
, expresses the degree of freedom that the system of equations have.
4.3 Basic solutions
- In order to have a mathematical description of the vertices, the basic solutions to the corresponding standard form of the LP problem are introduced.
- Definition 4.3:
A basic solution to the system of equationsis obtained if
variables are set to 0 and the remaining variables get their unique values when we solve the remaining (mxm) system of equations
The variables that are set to 0 are called nonbasic variables (or zero variables) and the remaining m variables are called basic variables.
If the basic solution also satisfies all non-negativity conditions we have a basic feasible solution. Otherwise we have an unfeasible basic solution.
Every basic feasible solution corresponds to an unique vertex to the feasible region.
A basic solution in which one or more basic variables have the value zero is called a degenerated basic solution. If a vertex is represented by more than one basic solution, then all these basic solutions are degenerated. - Summary of basic solutions - given the standard form for linear optimization problems
- Basic solution <-> intersection of constraints
- Basic feasible solution <-> a basic solution such that all variable values are
- Basic infeasible solution <-> basic solution such that at least one variable is
- Linearly dependent constraints -> 'missing' basic solution(s)
4.4 Change of basis
- Optimal solution to a LP problem is always found in a vertex (except when the solution is unbounded)
- Every vertex is presented by one or more basic feasible solution
- The strategy of the simplex method is to search among basic feasible solutions (BFS) until the optimal BFS is found.
- Definition 4.4: Two vertices are adjacent if the set of active constraints in the two points differ in one element only
- Definition 4.5 Two basic solutions are adjacent if they have
basic variables in common.
- Change of basis: When we move between adjacent vertices (basic solutions) exactly one basic variable is replaced (which then becomes a non-basic variable) with one other basic variable (which before was non-basic).
- Entering basic variable: The new basic variable
- Leaving basic variable: The one that is replaced
- In a vertex represented by a unique (non-degenerate) basic solution, there are
possible feasible search directions
- If the basic solution is degenerate, it might be so that the change of basis does not lead to a change of vertex. We can interpret this change of basis as a move in the search direction with a zero step length.
- In simplex method, we strive to make a move (a change of basis) so that the objective function value is improved, i.e., we always try to find a feasible search direction that is an improving search direction. However, in the case of degeneracy, it might be that the objective function value is unchanged after a change of basis.
- Calculating the reduced cost in the point
when we move in direction
as:
- Need to know more about this
, how it is derived. See exercise 4.5 in problem solving session 2. Two steps:
- Expressing the basic variables as a function of the non-basic variables
- Deciding all possible search directions by looking up the coefficient of entering variables in expression at step 1.
- Optimality and feasibility condition for minimization in the simplex method
- Optimality condition (for minimization)
The basisis optimal if
(marginal values = reduced costs
)
- Feasibility condition
For allit holds that
- Optimality condition (for minimization)
4.8 Algebraic description of the simplex method
- These algebraic expressions are for example useful when doing sensitivity analysis
- Rewrite the linear optimization problem equivalently as:
From equation (1b):, then insert this into (1a) and (1c):
- Each non-basic variable takes the value 0, i.e.,
- The basic variables take the values
- The value of the objective function is
- The basic solution is feasible if
- Each non-basic variable takes the value 0, i.e.,
Lecture 5 Linear optimization: the simplex method (Chapter 4.5–4.10, (7.1))
4.5 The simplex method - an example
- Problem expressed in standard form:
- For more detailed algorithm of simplex method, please check course problem solving notes
- Pivot operation
4.6 The simplex method - general algorithm description
- The simplex method can be summarized in the following way:
Step 0: Start from a basic feasible solution. Let
.
Step 1: Determine possible search directions and the reduced costsfor all non-basic variables.
Step 2: Check the convergence criterion: The pointis an optimal solution if:
for all non-basic variables
Step 3: Determine the entering basic variable according to the criteria
which gives the entering basic variableand the search direction
Step 4: Determine a step length by
becomes the leaving basic variable. The step length is calculated in the same way for both minimization and maximization problems.
Note: If all components inare non-negative, the problem is unbounded.
Step 5: The new point is given byand in the new basic solution
is replaced with
. Update
and go to Step 1.
4.7 Using simplex tableaux
4.8 Algebraic description of the simplex method
- See Lecture 4
4.9 Finding a basic feasible solution
- Two-phase method
- Phase 1 - If there is no feasible solution to the given set of constraints, introducing artificial variables
. The goal of Phase-1 is to find a basic solution where no artificial variables are in the basis, i.e. they should all have value 0. We can try to reach that goal by minimizing anaritificial objective function defined as the sum of all artificial variables.
. The interpretation of the objective function is that we try to minimize the total infeasibility of the solution.
Two cases can occur:- If the optimal solution has the value
, we have reached the goal of finding a basic solution in which all artificial variables have the value 0. The resulting basic solution is optimal in the phase 1 problem and feasible in the original problem. Start phase 2
- If the optimal solution has a value
. The conclusion is that the original problem has no feasible solution. What to do then? Modelling errors? Or infeasible from the data, demand > supply?
- If the optimal solution has the value
- Phase 2
- Phase 1 - If there is no feasible solution to the given set of constraints, introducing artificial variables
4.10 Convergence and degeneration
- Definition 4.6: A basic solution is degenerated if (at least) one basic variable has the value 0.
Lecture 6 - Duality in linear optimization models (Chapter 6 (7.3, 7.5))
6.1 Deriving and interpreting the dual problem
- There are two different ways of deriving the dual problem (maximization problem)
- Formulating an LP problem that determines how the combination of contraints should be carried out in order to get the best possible estimation of upper bound of
Defining the following variables:
= weight of constraint 1
= weight of constraint 2
Do better than guess, compute optimal weightsminimize the value of the estimate
- Formulating an LP problem that determines the prices which will be set on the market is called the market problem.
Defining the following variables:
= price per minute for sawing
= price per minute for gluing
The market wishes to minimize the payment
- Formulating an LP problem that determines how the combination of contraints should be carried out in order to get the best possible estimation of upper bound of
- Primal problem <=> Dual problem
- One can always interpret the dual variables as shadow prices or marginal prices as we have seen in connection with the discussion about sensitivity analysis.
6.2 Formulation of the dual problem
- a variable is in normal form if it is defined as non-negative (
and
respectively)
- A constraint is in normal form if it is a
-constraint in a maximization problem and a
-constraint in a minimization problem, respectively.
-
If all variables and constraints in the primal problem are in normal form, then all constraints and variables in the dual problem will also be in normal form.
Rules for constructing the dual problem
6.3 Duality theory
- Primal (P) in normal form:
- Dual (D) in normal form:
- Theorem 6.1 (weak duality)
Assumeis a feasible solution to (P) and
is a feasible solution to (D). We then know that:
- Theorem 6.2 - If
and
are feasible solutions in (P) and (D) respectively and
, we know that
and
are optimal solution to respective problem.
- Theorem 6.3 (Strong duality)
If either (P) or (D) has a bounded optimal solution, the same holds for the other problem, and the two objective function values are equal to optimum, i.e.
- Alternative formulation of problem (P) which is:
The corresponding dual problem (D) can be formulated:
- Theorem 6.4 (Dual theorem)
Assume thatis an optimal basic solution to (P). Then
is an optimal solution to (D) and
- If a constraint is not active in optimum, the corresponding shadow price will have the value zero.
- The general relation between a primal constraint
and the corresponding dual variable
is given by the complementary slackness constraint:
either the dual variable has value zero, or the slack variable has value zero (the constraint is active). It can of course also be the case that both variables in a pair have the value zero. - For the relation between a primal variable
and the corresponding dual constraint
, we can formulate a complementary slackness constraint according to:
saying that either the primal variable has value zero or the dual variable has value zero. - Theorem 6.5 (Complementary slackness theorem)
Assumeand
are feasible solutions in (P) and (D) respectively. Then
and
are optimal solutions to respective problem if and only if the complementary slackness constraints
are satisfied.
Relations between primal and dual optimal solutions
Lecture 7 - Linear optimization: post-optimal and sensitivity analysis (Chapter 5.1–5.5, (5.6))
5.1 Sensitivity analysis and an example
- Sensitivity analysis consists of two aspects:
- How does the optimum change when the right hand sides change?
- When the objective coefficients change?
- To what extent one can draw correct conclusions from the optimization if the input data to the model is uncertain.
- To investigate how sensitive the optimal solution is for changes in input data and other assumptions, i.e. to be able to analyze how robust the solution is.
- By "change" we mean both an increase and a decrease of the coefficient.
5.2 Relaxation and restriction
Definition 5.1: A relaxation is a modification of a problem instance that leads to a larger feasible set (increases the number of fesible solutions) or alternatively that the feasible set remains unchanged.
If a problem is relaxed, the objective function value can never be worse. In a max problem, the objective function value will increase while in a min problem, the value will decrease.
Definition 5.2: A restriction is a modification of a problem instance that leads to a smaller feasible set (decreases the number of feasible solutions) or alternatively that the feasible set remains unchanged.
-
Examples of relaxation and restriction
- The change in a right-hand-side coefficient
Constraint type Increase of Decrease of Relaxation Restriction Restriction Relaxation - The change in a constraint coefficient
in the left-hand-side of a constraint
Constraint type Increase of Decrease of Restriction Relaxation Relaxation Restriction - The change in objective function coefficient
Problem type Increase of Decrease of Max problem unchanged or larger
unchanged or lower
unchanged or larger
unchanged or lower
- Add or remove a constraint or a variable
Adding a constraint: Restriction
Removing a constraint: Relaxation
Adding a variable: Relaxation
Removing a variable: Restriction
5.3 Shadow prices and the connection to duality
- Definition 5.3: The shadow price for a constraint is given by the change in objective function value when making a marginal increase of the right-hand-side.
Note: Other names of shadow prices are dual price, dual value or marginal price - Sign restrictions of dual variable
Primal problem -constraint
-constraint
=-constraint Min problem free
Max problem free
- Only constraints active in the optimum can affect the optimal objkective function value if we change the right-hand-side. The value of dual variables corresponding to non-active constraints are thus zero in the optimum.
- Shadow price
is actually a derivative
- The shadow price is only constant in a certain neighbourhood of the original value of the right-hand-side coefficient and it is very important to calculate the interval in which the shadow price is valid (unchanged).
- Figure 1: The effect of changing the right-hand-side, showing the relation between
and
- Figure 2: The effect of a change in the objective function, showing the relation between
and
5.4 Interpretation of output froma software package
- Some examples of analysis
5.5 Algebraic analysis of changes
-
Changing a right-hand-side coefficient
, only the problem feasibility is affected, not the reduced costs.
Changes in the right-hand-side coefficients -
Changing an objective function coefficient
an
, the reduced costs are affected, but not the feasible region. Thus, the current basic solution will be still feasible.
Changes in the objective coefficients New variables
New constraints
-
More about sensitivity analysis in linear optimization
- The current optimal solution may become infeasible due to
- Change in the right-hand-side, or
- Addition of constraints
new basis matrix:
- The current optimal solution may become non-optimal due to
- Changes in the objective coefficients, or
- Addition of variables (new opportunities may be favourable)
- Several coefficients may change simultaneously
may affect both feasibility and optimality
- Constraint coefficients (i.e., elements of
) may change, which may change the basis matrix, and then also
may affect both feasibility and optimality.
- The current optimal solution may become infeasible due to
Lecture 8 - Integer linear optimization: modelling, applications, complexity (Chapter 13, 2.6)
13.1 What is an integer programming problem
- An integer programming problem is a problem where one or several variables are restricted to integer values (discrete variables)
- LP relaxation of the integer programming problem
- Many integer programming problems have an underlying network structure and can also be formulated as network problems.
-
Integer linear programming (ILP) uses integer variables (
). An ILP (or MILP) possesses linear constraints and integer requirements on the variables
13.2 Knapsack problems
- Objective function
benefit of choosing object
- Budge constraint
13.3 Facility location
13.4 Network design
13.5 Assignment problem
- Assign each task to one resource, and each resource to one task
13.6 Generalized assignment problem
13.7 Matching problem
13.8 Set partitioning, set covering and set packing problems
- Mathematical formulation
13.9 Sequencing problem
13.10 Traveling salesman problem
13.11 Route planning
2.6 Basic complexity theory
- Algorithm complexity: Estimating how the computational time increases as the problem size increases.
- Problem complexity: Classifying how easy or difficult various problem classes are to solve.
2.6.1 Algorithm complexity
- A problem means an optimization problem here.
- The number of computations in an algorithm is proportional to the computation time.
- For an LP problem, the size can be expressed by the number of variables and the number of equations in the problem together with a parameter that states how many bits are required to represent all numerical input data.
-
Ordo function O to express an upper cound of the number of steps. For two functions
and
, we say that
if there exists a cosntant
such that for a large
we have
. If the function
is a polynomial function of the problem size
we say that the algorithm has a polynomial complexity. This type of problems are regarded as "easy" to solve. If the function
is a exponential function of the problem size
(e.g.,
) we say that the algorithm has an exponential complexity.
2.6.2 Problem complexity
- When we analyze and classify optimization problem with respect to their problem complexity we study a transformation of the problem which is called decision problem.
- Optimization problems whose decision problem can be solved with a polynomial algorithm are called "easy" problem and this class of problem is denoted
.
- A larger class of problems, which includes the class
is denoted
from "nondeterministic polynomial".
- The most difficult decision problem in the class
are called
complete (also called
hard problem).
Lecture 9 - Integer linear optimization: relaxation, bounds, branch & bound algorithm (Chapter 14.1-14.3, 15.1-15.3)
14.1 Overview of methods
- There is no similar property like in the simplex method in IP problem
- IP problem is non-convex, and it is not possible to compute a derivative and hence no gradient. It is also difficult to conlcude if a solution that is a local optimal solution is also a global optimal solution
- The most important strategies for solving IP problems are:
- Enumeration methods (implicit enumeration, Branch and Bound methods)
- Relaxation and decomposition methods (LP relaxation and Lagrangian relaxation)
- Cutting plane methods (Repeatedly solving the LP relaxation of the IP problem)
- Heuristics (a class )
- The difficulty to solve IP problem depends on two main factors: the number of integer variables and the problem structure.
- For LP problem, the number of constraints is much more important for the solution time.
14.2 Optimality and relaxations
- For LP problem and nonlinear problems it is possible to formulate optimality conditions (KKT conditions). These conditions together with convexity analysis can be used to identify and prove that a solution is a local or global optimal solution. However, for IP problems there are no corresponding optimiality conditions so other techniques are needed.
- In a minimization problem,
- The optimistic bound (also called dual bounds) provides a so-called lower bound (LBD) (
) from a relaxation of ILP. Relaxation examples are:
- Remove the integer constraints and solve the LP relaxation.
- Make the feasible region larger by removing one or several of the constraints. (constraint relaxation or combinatorial relaxation)
- Make a Lagrangian relaxation, which removes a set of constraints and at the same time changes the objective function by introducing corresponding terms based on Lagrangian multipliers.
- The pessimistic bound (also called primal bound) provides an upper bounds (UBD) (
) from a feasible solution to ILP.
- The optimistic bound (also called dual bounds) provides a so-called lower bound (LBD) (
- The following properties are valid:
- A relaxation always provides a better (or equally good) objetive function value.
- If a relaxation of the IP problem has no feasible solution, then problem IP has no feasible solution.
- If
is a feasible solution to IP, then we have
and
. (May not be true with Lagrangian relaxation)
- Each feasible solution
that is found to IP, by modifying
, provides a pessimistic bound of
.
14.3 To choose the right model
- Many solution methods for integer programming are based on solving a sequence of LP problems whose optimal objective function values gradually reach the optimal objective function value for the IP problem. If we can use a strong formulation, we typically can make such methods more effective.
- Definition 14.1: A point
is a convex combination of a set of points
if
, where
and
.
- Definition 14.2: The convex hull of a set of points
, consists of all possible convex combinations of the points.
Note: Convex hull is an area where feasible integer points are located - The smaller the difference is between the optimal objective function value to the IP problem and the corresponding optimal solution to the LP relaxation, the better (or stronger) model we have.
- The ideal linear program has integer extreme points.
- In general, it is not computationally practical to find an ideal formulation (the convex hull) of an IP problem. However, the difference btween the feasible region (in particular near the optimal solution) and the convex hull is minimized by two methods:
- Initially choose a strong formulation with respect to the LP relaxation of the problem
- Adding problem specific valid inequalities
15.1 Introduction
- Branch and Bound (B&B), optimistic bound of the optimal objective function value and a pessimistic bound of the optimal objective function value are generated.
- Search tree (or B&B tree) consists of nodes representing subproblems and edges representing the new constraints added to define new subproblems.
15.2 General strategy for branch and bound
- Relax
The optimal solution of a relaxed version of the problem provides an optimistic bound of the optimal objective function value. Two alternatives to relax the original problem:- LP relaxation
- Lagrangian relaxation
- Branch
Exclude the solution to a relaxation if it is not feasible in ILPa partitioning of the feasible set.
- Prune (修剪)
- Search
- Depth first
- Breadth first
- Best first
- A termination criterion that expresses the relative deviation between the best pessimistic bound (UBD) and the best optimistic bound (LBD) can be defined as:
where a typical parameter value ofis 0.01.
15.3 The Land-Doig-Dakin algoroithm
- The most common relaxation using branch and bound is LP relaxation, solve this LP relaxation problem first.
- Land-Doig-Dakin algorithm is a branch and bound algorithm based on LP relaxation
Lecture 10 - Integer linear optimization theory and algorthms: good/ideal formulation, cutting planes, lagrangean duality and relaxation (Chapter 14.4-14.5, 17.1-17.2, (17.3-17.4), 13.10-13.11, 15.4, (15.5))
14.4 Valid inequalities
- Definition 14.3: A vaid inequality is a linear inequality
that is satisfied for all. If we add valid inequalities to the LP relaxation of an IP problem, we will get a better approximation of the convex hull.
- Defnition 14.4 : if
is a valid inequality to
, then the set
is the face ofcreated by the valid inequality.
- Definition 14.5: A face F of a convex hull
with dimension dim(
)=n, is called a facet if dim(F)=n-1.
14.5 Cutting plane methods
- A general cutting plane algorithm (p378)
- Solve the linear programming (LP) relaxation
- If the LP solution is integer: stop; an optimal solution to the ILP is found
- Add one or several valid inequalities that cut off the fractional solution but bone of the integer solutions
- Resolve the new problem and go to step 2
- The cuts (or constraints) added in a cutting plane method must be valid inequalities. Otherwise, one or several feasible integer points may be cut away.
- A valid inequality must cut away parts of the feasible region to the LP relaxation (hence the name cutting plane)
- Often only one valid inequality is added in each iteration (after solving each LP relaxation) in cutting plane methods.
- Methods for generating valid inequalities
- Chvatal-Gomory cuts (combine constraints, make beneficial roundings of LHS and RHS)
- Gomory's method: generate cut from an optimal simplex basis (Ch. 14.5.1)
- Pure cutting plane algorithms are usually less efficient than branch-&-bound
- In commercial solvers (e.g. CPLEX), cuts are used to help (presolve) the branch-&bound algorithm
- For problems with specific structures (e.g. TSP and set covering) problem specific classes of cuts are used.
14.5.1 Gomory's method
- Gomory's method was one of the first cutting plane methods developed (Gomory's method
cutting plan methods).
- One requirement needed for generating cuts is that all coefficients in the problem are integers.
- The equation where
is a basic variable,
is the set of non-basic variables and where the right hand side
is non-integer:
which can be rewritten as:
- The constraint is a Gomory cut and it is called the integer cut since it is expressed from the integer parts in equation above:
- The constraint is the fractional cut:
14.5.2 Problem specific cuts: cover inequalities
- Definition 14.6: The set
is a cover if
. The set
is also a minimal cover if for each selection of
, we also have that
is not a cover, i.e.
- Theorem 14.1: If
is a minimal cover, then the constraint
is a valid inequality for
.
- Theorem 14.2: If
is a cover of
, then the augmented cover constraint
is a valid inequality if - Theorem 14.3: If
, then
satisfies all cover inequalities.
If, then the inequality
cuts away the fractional solutionwith the proportion
. We have got a measure of the strength of an inequality.
17.1 Lagrangian duality
- General formulation of an optimization problem
The functionsand
, i=1,...,m are general nonlinear funcitons and may also be non-convex functions.
The Lagrangian function for problem is:
whereare Lagrangian multipliers (or dual variables)
- (Lagrange) dual function:
- (Lagrange) dual problem:
- Theorem 17.1 (Weak duality)
Suppose thatis an optimal solution to (P). For each
, we have
, duality gap
- Theorem 17.2: The dual function
is concave.
17.2 Lagrangian relaxation
- If the original problem is convex, we can guarantee to find solution
and
where
- With non-convex problem, we often have duality gap
- In minimization problem, Lagrangian relaxation provides optimistic estimate of
. The Lagrangian relaxation bound is never worse than the linear programming relaxation bound
17.4 Applications
- There are two ways to generate an optimistic bound of the optimal objective function value.
- Relax the integer requirements and solve the LP relaxation.
- Relax one of the linear constraint by using Lagrangian relaxation.
- The general experience of solving integer programming problems is that the optimistic bound obtained from Lagrangian relaxation is stronger than the bound from the LP relaxation.
Whereis the optimal objective function value to the LP relaxation.
- If the set
has the integrality property (i.e.,
has integral extreme points), then
13.10 Traveling salesman problem (TSP)
- Many applications can be formulated as a traveling salesman problem
- The problem is easy to model but difficult to solve
- Two types of problem:
- Symmetric version of the problem: only one arc is needed.
- Asymmetric version of the problem: the arc cost between two nodes depends on the direction
13.11 Route planning
- Vehicle Routing Problem (VRP)
15.4 Branch and bound methods for structured problems
15.4.1 Land-Doig-Dakin for the knapsack problem
15.4.2 Branch and bound for the traveling salesman problem
15.4.3 Branch and bound for the job scheduling problem
Lecture 11 - Integer linear optimization: (non-) convexity, local/global optimality, heuristics and approximation algorithms (Chapter 16, 8.3)
- Non-convex optimization problems do not fulfull strong duality
(efficient) algorithms cannot generally be constructed to fulfill optimality conditions
16.1 Introduction
- Each heuristic is adapted to fit the specific problem structure.
- Heuristics can also be used as part of advanced methods to find feasible solutions. Examples are branch and bound, Lagrangian relaxation.
16.2 Combinatorial formulations
16.3 Constructive heuristics
- Greedy algorithms
16.3.1 Heuristics for the traveling salesman problem
- Nearest neighbor algorithm
- Nearest addition algorithm
- Nearest merger algorithm
16.3.2 Heuristics for matching problems
- Greedy matching algorithm
16.3.3 Heuristics for the set covering problem
- Greedy covering algorithm
16.3.4 Heuristics for the vehicle routing problem
- Clarke & Wright algorithm
- Sweep algorithm
16.4 Local search
- Defining a neighbourhood
which consists of all solutions in some sense are "close" to x.
- Each solution
is called a nighbour to x.
16.5 Metaheuristics
16.5.1 Tabu search
16.5.2 Simulated annealing
16.5.3 Metaheuristics and infeasible solutions
16.6 Approximation algorithms
16.6.1 Spanning tree heuristic for the traveling salesman problem
16.6.2 Christofides'heuristic traveling salesman problem
8.3 Minimum spanning tree
- Select a subset of the arcs in the graph so that they constitute a tree spanning all nodes and where the sum of the arc costs in the tree is minimal.
- The optimality conditions for the problem can be formulated as:
A treeis a minimum spanning tree if
- for every arc
we have that
for all arcs
included in the unique path between
and
.
- for every arc
we have that
for all arcs
included in the cut
obtained if arc
is removed.
- for every arc
- The cardinality of the selected subset of arcs should be
, where
is the number of nodes in the graph.
- Two algorithms for determining a minimum spanning tree:
- Kruskal's algorithm is based on the first group of conditions
- Prim's algorithm is based on the second group of conditions
- Kruskal's and Prim's algorithm both belong to a type of algorithm known as greedy algorithms.
Lecture 12 - Network flows: linear optimization formulations (Chapter 8.1-8.2, 8.4, (8.5), 18.1-18.5, (18.6-18.7), 13.5)
8.1 Network optimization problem preview
- A network consists of a network consisting of nodes and arcs.
- Network flow problems can be divided into two classes:
- Minimum cost network flow problem (LP problems and considered to be easy to solve. Our focus in this Lecture)
- Combinatorial problem
8.2 Basic concepts
- A graph
is defined by a set of nodes
, a set of arcs
and a relation (mapping) between the nodes and arcs.
- A direct arc only allows flow in the specified direction. An undirected arc (also called link) allows flows in both directions and can be interpreted as two (opposite) directed arcs.
- Undirected graph: Only need the node numbers
- Directed graph: with
nodes and
arcs, we can define a so called
. each row in the matrix corresponds to one node, and each column corresponds to one arc and includes exactly two non-zero elements; one -1 and one +1.
- Node has in-degree and out-degree
- A graph is bipartite if we can partition the node set into two subsets
and
such that for each arc
either
and
or
and
.
- A
is a partition of the node set into two subsets,
and
- A path (sometimes called route) in a directed graph is a connected sequence of arcs from one node to another, where the starting node of each arc is the termination node for the preceding node in the sequence.
- If we disregard the direction of the arcs, we talk about a chain.
- A flow problem is defined by specifying for each node
the total net flow to/from the node. The net flow (also called node strength) is defined as
Ifis negative, we say that node
is a source (or demand node) and if
is positive we say that node
is a sink (or supply node). A node where
is called a transhipment node.
8.4 Shortest path problems
8.4.1 Problem description and Bellman's equations
- When solving a shortest path problem we assume that our network satisfies the following conditions:
- All arcs are directed
- Node
can be reached from node
- There are no cycles with negative costs
- Bellman's equations:
8.4.2 An algorithm for the shortest-path problem
- Theorem 8.1: Dijkstra's algorithm will give an optimal solution to the problem of finding a shortest path in a network with non-negative arc costs only.
8.4.3 Shortest path - all to all
- Algorithm - Floyd-Warshall
8.4.4 Other types of shortest path problems
- Longest path
- Multiplicative objective function
- Path with maximal capacity
18.1 Introduction
- Dynamic programming is a solution strategy which can be used to solve many structured optimization problems.
- An optimal decision is often called a control and the variables are called control variables. Given the current state in one stage, the decision will give a new state in the next stage.
18.2 An initial example
- The shortest path can be computed sequentially (recursively) with respect to defined stages as follows.
(backpropagation)
18.3 Network interpretation and Bellman's equations
- The cost on which we base the decision has two components:
- The first component is the arc cost to go from a node in the current stage to a new node in the next stage
- The second component is the remaining (best) cost to continue to the end node.
- The last stage computed provides the overall best objective function value.
- Principle of optimality: Given the current state, the optimal decision for each of the remaining stages must not depend on previously reached states or previously chosen decisions.
18.4 General formulation
- The total cost is given by the recursion function denoted
The recursion relation corresponds to Bellman's equation in a standard shortest path problem. The first term is the arc cost and the second is the node price. -
Backward recursion and forward recursion are equivalent, both provide the optimal solution. The difference is the side information:
- Backward recursion offers the optimal way to continue from each possible state.
- Forward recursion offers the optimal way to get to each state.
18.5 Formulation of the shortest path problem
- Formulation (backward recursion) for Example 18.1
Stages:
Control variables:=city which we travel to in stage
States:=city we start from in stage
Transition function:
Recursion function:=shortest path from city
to LA
Recursion relation:, where
gives the distance between city
and city
. Search
Initial conditions:=NY,
Restrictions:
13.5 Assignment problems
- Assignment problem is a special case of the transportation problem. Hence it can be classified as a network flow problem.
- Define the variables as:
The model can be formulated as:
Lecture 13 - Linear optimization formulations and algorithms for minimum cost network flows (Chapter 8.6-8.7)
8.6 Minimum cost network flow problems
- To determine a flow with the minimum cost in a network, where
- supply and demand (strengths) of the nodes have to be satisfied
- Fufil the capacity constraints of the arcs.
- Interger properties of MCFP
8.6.1 Examples of models
- A network flow problem
- Maximum flow problem
- Time dynamic networks
8.6.2 LP formulation
- Given a network with
nodes and
arcs, where
and
denote cost, lower bound and upper bound on the flows on arc
and where
denotes the net flow to/from node
, we can formulate a minimum cost network flow problem as:
8.6.3 Integer properties
-
Integer property holds for for the minimum cost network flow problem. We study a minimum cost network flow problem stated as:
- Theorem 8.2: If a minimum cost network flow problem has integer bounds
and
, all extreme points of the feasible region are integer.
- Definition 8.1: A matrix is totally unimodular if each quadratic sub-matrix has a determinant with value -1, 0, +1.
- Theorem 8.3: The incidence matrix (关联矩阵 from graph theory)
8.6.4 Extensions of the minimum cost network flow problem
- Network with several commodities
- Network with gains and losses
8.7 The simplex method for network flow problems
- MCFP problems can be solved with a special version of the standard simplex method for LP problems.
- In a minimum cost network flow problem we may have both lower and upper bounds of the variables (flows) and the simplex method for network flow problem takes these into account.
8.7.1 Optimality conditions
- Let us study the following LP formulation of the minimum cost network flow problem
- Summary of the optimality conditions
A solutionis an optimal solution to a minimum cost network flow problem is the following relations are satisfied (two alternative sets of conditions are provided)
where is the reduced cost for arc
8.7.2 Solution principle
8.7.3 Algorithm - Simplex method for the network flow problem
8.7.4 Determining a basic feasible solution
When solving a minimum cost network flow problem with the simplex method we require a basic feasible solution, i.e. a solution that satisfies the node balance constraints and the capacity constraints.
Lecture 14 - Overview of nonlinear optimization (Chapter 2.5.1, 9-12)
2.5.1 Search method for continuous problems
- For continuous problems (linear programming problems and nonlinear problem), the sequence of points is determined by the general formula:
whereand
are the current and next points,
is a search direction and
is a step length. - The search direction
should be:
- A feasible search direction
- An improving search direction
9 Introduction to nonlinear optimization
- Separating nonlinear optimization problems by with and without constraints:
- unconstrained optimization
- constrained optimization
- If the problem has an quadratic objective function and linear constraints, we call it a quadratic programming problem.
- Separating nonlinear optimization problems by convex and nonconvex problem.
9.1 Examples of nonlinear models
- Least square problem
- Two dimensional cutting pattern
- Construction of a bar-truss structure
9.2 Approximations of functions
- First and second order approximations based on a Taylor expansion.
- Other equivalent names for these approximations are linear and quadratic approximations.
- Definition 9.1: The gradient
is a vector consisting of all partial derivatives to the function
and can be expressed as
- Definition 9.2: The Hessian H (
) is a quadratic matrix consisting of all partial second order derivatives to the function
and can be expressed as
- Definition 9.3: A first order Taylor approximation of the function of the function
around the point
is given by
- Definition 9.4: A second order Taylor approximation of the function
around the point
is given by
9.3 Convex analysis
- The convexity properties decide if we with certainty or not can announce that the solution we have found is the global optimum.
- The problem
is a convex problem ifis a convex function and if the feasible region
defined by the constraints
is a convex set.
- Theorem 9.1: If the problem
is convex, then each local minimum is also a global minimum.
Convexity of a set
- Theorem 9.2: Let
be convex sets. Then the intersection
is a convex set - Theorem 9.3: The set
is a convex set if the functionis convex.
Convexity of a function
- Theorem 9.4: If
are convex functions and we have
, then the function
is convex.
- Definition 9.6: The quadratic matrix
is positive definite (positive semi-definite) if
, for all vectors
.
- Theorem 9.5: Suppose the function
is twice differentiable defined on a convex set
. Then we have:
-
is a convex function on
if the Hessian matrix
is positive semi-definite for all
-
is a strict convex function on
if the Hessian matrix
is positive definite for all
-
is a concave function on
if the Hessian matrix
is negative semi-definite for all
-
is a strict concave function on
if the Hessian matrix
is negative definite for all
Note: if none of these hold, we have thatis indefinite and the function is neither convex nor concave.
-
- To determine all eigenvalues to the matrix
by finding the roots to the so called characteristic equation to
(i.e. to the equation
)
- Let
denote the eigenvalues to the Hessian matrix
.
Theorem 9.6:
Ifthen
is positive semi-definite
Ifthen
is positive definite
Ifthen
is negative semi-definite
Ifthen
is negative definite
- Theorem 9.8:
Letand
be convex functions, and let also
be a non-decreasing function. Then the compound function
is convex.
10 Methods for unconstrained nonlinear optimization
- In this chapter, we consider problems written as
i.e. nonlinear problems without constraints. It is assumed that the functionis differentiable and that there exists an optimal solution to the problem.
- An optimal solution to an unconstrained problem is always a stationary point which may be a local minimum, local maximum or a saddle point.
- Theorem 10.1: A necessary condition for the unconstrained problem min
to have an optimal solution in the point
is that it is a stationary point, i.e.
- Theorem 10.2: A sufficient condition for the unconstrained problem min
to have an optimal solution in the point
is that it is a stationary point, and the function
is convex.
- If the function is strictly convex (concave) there is only one stationary point. This point is hence both a local and global minimum (maximum).
- Theorem 10.3: A sufficient condition for the unconstrained problem min
to have a local optimal solution in the point
, is that it is a stationary point, and that the Hessian matrix in the point
is positive definite.
- With the first order Taylor series approximation of the function
in the point
, we get the following function:
Definition 10.1: A direction, such that
is called a descent direction and is a direction in which
Definition 10.2: A directedsuch that
is called an ascent direction and is a direction in which
increases.
- Global convergence - "global" here means that the methods find a local minimum regardless of the (global) starting point.
- In all these search methods, the step length is determined through an one-dimensional search (line search)
10.1 Steepest descent method
- The search direction is computed only by using the gradient
in the point
. The gradient
define the direction in which the function
increases most (is steepest) locally. In the same way,
defines the direction where it decreases most locally.
10.2 Newton methods
- If we use a second order Taylor series approximation to determine a search direction, we get Newton's method and Newton's modified method. A second order Taylor series approximation of the function
around the point
can be written as:
The gradient for the approximate function is
We know from Theorem 10.1 that the optimal solution is found in a stationary point. We can therefore setand express the search direction as
This search direction is called the Newton direction. The search direction is valid both for minimization and maximization problems. The interpretation of using the Newton direction is that we use second order information to rotate the gradient direction so that it points towards the stationary point of the approximate function. - If we always use the step length t^{(k)}=1$, i.e. no line search, we get Newton's method. If we use normal line search, we get Newton's modified method.
10.3 Extensions for Newton methods
- A more general search method can be expressed as:
10.4 One-dimensional optimization
- A one-dimensional optimization problem is a problem with one variable.
- Line search or one-dimensional search.
- Definition 10.3: A function
to be minimized in the interval
and has a minimum in
is unimodal on the interval if all
such that
,
and
has the properties
The interpretation of a unimodal function in minimization is that the function (for increasing value of) is non-increasing followed by non-decreasing.
Note: a functioncan be unimodal despite being non-convex. A convex function is always unimodal.
- Golden section method
- Bi-section method
- Newton-Raphson;s method
- Armijo's method
11 Optimality conditions for nonlinear problems
11.1 Geometric interpretation of the KKT conditions
- Definition 11.1: A cone C which is spanned by the vectors h_1, ..., h_s consists of all vectors y that can be written as a non-negative linear combination of the vectors h_i, i = 1, ..., s, i.e.
- For
to be a local maximum in maximization problem, the gradient of the objective function must lie in the cone spanned by the gradients of the active constraints. In other words, we should be able to write
as a non-negative linear combination of
and
- If we have a nonlinear problem with
inequalities we can generalize the above conditions and formulate it as follows:
These constraints are called the Karush-Kuhn-Tucker (KKT) conditions, where (1) denotes dual feasibility, (2) denotes primal feasibility and (3) complementarity.
11.2 KKT conditions derived using the Lagrangian function
- A general nonlinear optimization problem formulated as:
The Lagrangian function for problem (ILP) can be stated:
Note:- Lagrangian multipliers are also called dual variables.
- In our formulation, all multipliers are non-negative as we have a minimization problem and all constraints are
constraints.
- The terms that are added to
and that define the Lagrangian function can be interpreted as penalty terms for not satisfying the constraints, and the multiplier
can be interpreted as a penalty parameter for constraint
.
- Definition 11.2: The point
is a saddle point to
if
for alland
- The definition of a saddle point states that
has a minimum with respect to
and a maximum with respect to
in the point
. The point
is a solution to the problem
- Theorem 11.1 (Saddle point theorem): If the Lagrangian function
has a saddle point in
, then
is an optimal solution to the problem (ILP).
- The saddle point theorem provides sufficient conditions for optimality but not necessary conditions. It may happen that a point is optimal without being a saddle point to the Lagrangian function.
- A necessary condition for a point to be optimal is that it is a stationary point to the Lagrangian function. It can be proved using Lagrangain multiplier method that each stationary point satisfies the Karush-Kuhn-Tucker conditions.
11.3 Necessary and sufficient conditions for optimality
- Theorem 11.2 (Necessary conditions for optimality)
Suppose that the pointis a local minimum to the problem (ILP) and satisfies some regularity conditions. Then there is a solution
such that
The theorem states that the KKT conditions must be satisfied if a pointis to be a local minimum.
There are several types of "some regularity conditions", and the most common (and strongest) is that the gradients of the active and non-redundant constraints are linearly independent (also called Constraint Qualification condition).
Note: more constraint qualifications will be covered in nonlinear optimization. Because some implicit regularity conditions are met in LP or IP (e.g., in Lecture 14 slides 39, assume that there exists a pointsuch that
).
- Theorem 11.3 (Sufficient conditions for optimality)
Suppose that there is a pointthat satisfies the KKT conditions. If the problem is convex, then
is both a local and global optimum.
- (The comparison of necessary (串联) and sufficient (并联) condition): https://www.zhihu.com/question/30469121
11.4 Formulation of the KKT conditions
- Sign restrictions of dual variable (also called Lagrangian multipliers)
Primal problem -constraint
-constraint
=-constraint Min problem free
Max problem free
- The interpretation of the Lagrangian multiplier values in optimum is the same as for dual values in LP theory.
12 Methods for constrained nonlinear optimization
- Two methods that can be used to solve nonlinear problems with linear constraints.
- Frank-Wolfe method linearizes the objective function and solves a sequence of LP problems to find the optimal solution.
- Quadratic programming based on KKT conditions
- A nonlinear problem with nonlinear constraints can be reformulated and then solved by using a sequence of unconstrained nonlinear problems. Two methods:
- Reformulating the problem using penalty functions
- Using barrier functions
12.1 The Frank-Wolfe method
- This method primarily is used for convex nonlinear problem with linear constraints
- The idea is to linearize the function (make a first order Taylor series expansion) around the current point. By solving the resulting LP problem, we can determine a search direction and then perform a line search between the current point and the solution to the LP problem.
- Considering
as a convex function of a minimization problem and
denotes a linear approximation of the function
in the given point. If we solve the LP problem with
as the objective function, we will obtain a solution in the point
. The objective function values in the points
and
which are
and
respectively can be used to get bounds on the optimal objective function value
-
gives a pessimistic bound of the optimal objective function value
. This is a Upper Bound, UBD.
-
gives a optimisitic bound of the optimal objective function value
. This is a Lower Bound, LBD.
-
- The search direction is
. We need to ensure that we stay between the points
and
by limiting the step length
by
.
- If the solution found satisfies
, then we have found the optimal solution.
12.2 Quadratic Programming
- Quadratic programming problems have a quadratic objective function and linear constraints
- A quadratic programming problem with equality constraints written in matrix form as
It is assumed that the quadratic matrixis symmetric and positive definite, which means that the problem is convex. It is also assumed that the matrix
has full rank, which means there is always a feasible solution to the system of linear equations written as
.
The KKT conditions for the problem above can be written in matrix form as
The system above can also be expressed as:
By assuming that the matrixis positive definite and that the matrix
has full rank, there is always a unique solution to the equations.
- Algorithm - Active set method (The active set at
is made up of those constraints
that are active at the current point.)
12.3 Penalty function methods
- In this section, the constrained nonlinear problem is reformulated as an unconstrained optimization problem by adding a penalty term that measures and penalizes the distance from any infeasible point to the feasible region. The new function is a penalty function, and by using different values on a penalty parameter, we solve a sequence of unconstrained problems.
- Referring to the Table 12.1 on Page 32, the larger the value of the penalty parameter, the closer it is to the optimal solution and the optimal objective function value. However, for really large values of the parameter, we get numerical problem.
- A general nonlinear problem with equality constraints can be formulated as
The penalty function can be written as
where the penalty term is, the penalty parameter is
.
The unconstrained optimization problem becomes
- Recommended penalty term
12.4 Barrier function methods
- Instead of penalizing infeasible solutions, barrier function methods penalize the deviation from the border of the feasible region.
- Penalty functions (also called exterior penalty functions) generate a sequence of infeasible points that converge to a feasible and optimal point.
- Barrier functions (also called interior penalty functions) generate a sequence of feasible points that converge to the optimal solution.
- A general nonlinear problem with inequality constraints can be formulated as
The penalty function can be written as
where the penalty term isor
, the penalty parameter is
.
The unconstrained optimization problem becomes
Note: Barrier functions can only be used when all constraints are inequality constraints. This is because we need the feasible solutions to be interior points. - For initial large values of
, the optimal solution will be an interior point relatively far from the border of the feasible region. For smaller values of
, the penalty will be smaller and the solution will get closer to the border. Referrring to the Table on Page 318, we can see that we get closer to the optimal point and to the optimal function value as
is decreasing.
- Barrier functions always provide feasible solutions which can be used to generate pessimistic bounds of the optimal objective function value. In the same way, penalty functions provide optimistic bounds.