Chalmers Course: Linear and integer optimization with applications

Course literature:

Optimization (English) (Links to an external site.) by J. Lundgren, M. Rönnqvist, and P. Värbrand; Studentlitteratur 2010

Computer exercises and software

  • The purpose of this computer exercise is to make you familiar with the use of software for computing solutions to linear programs.
  • The following software is included:
    1. JuMP - An algebraic modelling language for linear, quadratic, and nonlinear constrained optimization problems embedded in Julia.
    2. Gurobi - A linear, integer linear, and quadratic programming solver with interfaces to MATLAB, JuMP, and Python.
    3. Atom - A free and open-source text and source code editor.

Lectures plan

Lecture Content Chapters in the course book
1 Introduction, course map, modelling optimization applications 1, 2.1-2.5, 3
2 a) Julia/JuMP and optimization solvers; computer exercise on linear optimization.
b) Supply chain (Assignment 1)
3 Convexity and linear optimization 2.4, 4.1
4 Linear optimization: basic feasible solutions 4.2–4.4, 4.8
5 Linear optimization: the simplex method 4.5–4.10, (7.1)
6 Duality in linear optimization models 6, (7.3, 7.5)
7 Linear optimization: post-optimal and sensitivity analysis 5.1–5.5, (5.6)
8 a) Integer linear optimization: modelling, applications, complexity
b) Optimization of maintenance scheduling (Assignment 2)
13, 2.6
9 Integer linear optimization: relaxations, bounds, branch&bound algorithm 14.1–14.3, 15.1–15.3
10 Integer linear optimization: good/ideal formulation, cutting planes, Lagrangean duality and relaxation 14.4–14.5, 17.1–17.2, (17.3–17.4), 13.10–13.11, 15.4, (15.5)
11 Integer linear optimization: (non-)convexity, local/global optimality, heuristics and approximation algorithms 16, 8.3
12 Network flows: linear optimization formulations 8.1–8.2, 8.4, (8.5), 18.1–18.5, (18.6–18.7), 13.5
13 Minimum cost network flows: linear optimization formulations and algorithms 8.6–8.7
14 Overview of nonlinear optimization 2.5.1, 9–12
15 Example exam

Problem solving sessions

Session Topics Exercises recommended for students to solve
1 Mathematical modelling 3.1 b,c,d; 3.5, 3.10, 3.12, 3.15
2 Linear optimization and the simplex method 2.4, 2.6, 4.2, 4.6, 4.10, 4.11, 4.15
3 Sensitivity analysis and duality theory 5.1, 5.5, 6.6, 6.8, 6.10, 6.15
4 Integer linear optimization and the branch-and-bound algorithm 13.5, 13.6, 13.9, 15.6 (13.8, 13.15, 15.3, 15.12, 15.14)
5 Cutting plane methods, minimal cover, Lagrangean duality 14.4, 14.8, 17.9 (14.1, 14.3, 14.6, 14.9, 17.8)
6 Network optimization: minimum spanning tree and shortest path algorithms 8.10, 8.12, 8.17a (8.18, 8.38ab)
7 Nonlinear optimization: convexity and the KKT conditions 9.8, 9.10, 11.4, (9.4, 11.6)

Lecture 1 - Introduction, course map, modelling optimization applications (Chapter 1, 2.1-5, 3)

Chapter 1 - Introduction to optimization

  • A general optimization problem:
    min \ f(x)\\ s.t. g_i(x) \leq b_i, i =1,...,m
  • P is a LP problem if:
    1. all functions f, g_1, ..., g_m are linear
    2. all variables are continuous, i.e., x \in R^n
  • P is a nonlinear problem if:
    1. at least one of the function f, g_1, ..., g_m is a nonlinear function
    2. all variables are continuous, i.e., x \in R^n
  • P is a linear integer programming problem when a subset of the variables (at least one) is defined as integer variables.

Chapter 2 - Introductional examples and basic concepts

  • Level curves of the objective function
  • An optimal solution is a feasible solution where the objective function reaches its maximum (or minimal) value.
  • Active constaints
  • A constraint that doesn't affect the feasible region is called redundant.
  • Linear programming solution properties
    1. Unique optimal solution - The unique solution is always found in a corner point of the feasible region.
    2. Alternative optimal solutions - When the best optimal objective function value is found in at least two corner points. However, we stop once we have found one optimal solution.
    3. Unbounded solution - This situation occurs in a practical problem when there is an error in the model formulation or in the input data.
    4. No solution exits - This situation appears when the constraints are defined so that there is no point that satisfies all constraints. Note that infeasibility only depends on the cojnstraints, and not on the objective function.
  • Nonlinear optimization solution properties
    1. A nonlinear problem may have several local optima, and one or several of these are the global optima.
    2. If the nonlinear problem is a convex optimization problem, there is only one local optimum. If in addition the objective function is strictly convex (minimization problem) or strictly concave (maximization problem), the optimal solution is unique.
  • Integer programming solution properties
    1. If we remove the integer requirements on the variables, we get an LP problem. This problem is called an LP relaxation of the integer programming problem.
    2. Some integer programming problems can be solved by first solving the LP relation, and then rounding the solution to obtain a feasible integer solution.
  • Most solution methods are developed to find local optimum.
  • In a convex problem, each local optimum is also a global optimum.
  • All LP problems are convex, while nonlinear problems are often non-convex.
  • Note that a set cannot be concave, it is either convex or non-convex sets.

2.5 Search methods - a general description

  • Search methods start from a feasible point (solution) to the problem.
  • The search method will generate a sequence of feasible points x^{(0)}, x^{(1)}, ..., x^{(p)}, whose objective function values become successively better.
  • In some search methods, e.g., the simplex method, the objective function value may be the same between two consecutive iterations.

2.5.1 Search method for continuous problems (primal method):

x^{(k+1)} = x^{(k)} + t^{(k)} d^{(k)}
Chosen search direction d^{(k)} should be:

  1. A feasible search direction
  2. An improving search direction
  • If in the given point, we cannot find a feasible and improving search direction, we can conclude that we have found a local optimum.
  • General description
    1. Step 0 - starting point
    2. Step 1 - search direction (we use some info about the gradient and the Hessian matrix of the objective function f(x) in the point x^{(k)}. A descent direction d{(k)} has the property \nabla f(x^{(k)})^T d^{(k)} < 0)
    3. Step 2 - convergence criterion (Predefined optimality conditions for the problem)
    4. Step 3 - step length
    5. step 4 - convergence (The general search method converges towards a local optimum if we choose an improving search direction and perform the line search properly. If the problem is convex, we also find the global optimum.)

2.5.2 Search method for integer programming problems (primal method)

  • It is more difficult to use the concepts of search direction and step length.
  • General description
    1. Step 0 - Start from a feasible point x^{(0)}. set k=0
    2. Step 1 - Find the neighbouring integer solution x^{(k+1)} with the best objective function value
    3. Step 2 - Check convergence criterion. The point x^{(k)} is a local optimum if the point x^{(k+1)} does not have a better objective function
    4. Step 3 - Let x^{(k+1)} be the new solution
    5. Step 4 - Update k:=k+1 and go to Step 1
      Note: Since integer programming problems are non-convex, we cannot guarantee that the solution obtained is a local optimum.

2.5.3 Primal and dual methods

  • An alternative principal is to move between infeasible points in a systematic way, and ultimately in the last iteration finds a feasible solution. However, we then know that this solution is a local (possible global) optimum. This class of methods is called dual methods.
  • Dual methods are also common for various classes of network problems in which the methods can be made very efficient.
  • General description - dual search method:
    1. Step 0 - Relax some constraints and let \Omega^1 \supseteq X define the set of feasible solutions that are defined by the remaining constraints. Set k=1.
    2. Step 1 - Solve the relaxed optimization problem min f(x), s.t. x \in \Omega^k -> the point x^{(k)}
    3. Step 2 - If the point x^{(k)} \in X, then all constraints in the original problem are satisfied -> Stop! The point x^{(k)} is an optimal solution to the problem.
    4. Step 3 - Idenfity a constraint that is not ssatisfied by the point x^{(k)} and let S denote all feasible solutions with the respect to this constraint. Add the constraint to the problem, i.e., update \Omega^{(k+1)}:=\Omega^k \cap S
    5. Step 4 - Update k=k+1 and go to Step 1.

Chapter 3 - Modelling

3.1 Index and summation

  • Amount of raw material i that is blended into product j during time period t
    x_{ijt}
  • Introducing a general notation m,n,T,p for the number of products, number of raw materials, number of time periods and number of resources, respectively.

3.2 Production planning

  • General formulation of this problem:
  1. Variables:
    x_{jt}=number of units of model j produced during month t, j=S,L,P;t=1,2
    I_{jt}=number of units of model j in inventory at the end of month t, j=S,L,P;t=1,2
  2. Parameters:
    c_{jt}=cost to produce one unit of product j in time period t, j=1,...,n, t=1,...,T
    h_{jt}=inventory cost for one unit of product j in period t, j=1,...,n, t=1,...,T
    a_{jk}=number of hours product j requires in department k, j=1,...,n, k=1,...,K
    b_{kt}=capacity in department k in period t, k=1,...,K, t=1,...,T
  3. Objective:
    min z=\sum_{j=1}^n \sum_{t=1}^T (c_{jt} x_{jt}+h_{jt} I_{jt})
  4. Constraints:
    \sum_{j=1}^n a_{jk} x_{jt} \leq b_{kt}, k=1,...,K; t=1,...,T
    I_{j,t-1}+x_{jt}-I_{jt}=d_{jt}, j=1,...,n; t=1,...,T
    x_{jt},I_{jt} \geq 0, j=1,...,n; t=1,...,T

3.3 Transport and distribution

This optimization problem can be expressed in a network consisting of nodes and arcs.

3.3.1 Transportation problem

  • General formulation of this problem
  1. Variables:
    x_{ij}=flow from facility i to customer j, i=1,..,m; j=1,...,n
  2. Parameters:
    c_{ij}=unit transportation cost from facility i to customer j, i=1,..,m; j=1,...,n
    s_i=supply at facility i, i=1,...,m
    d_j=demand at customer j,j=1,...,n
  3. Objective:
    min z=\sum_{i=1}^m \sum_{j=1}^n c_{ij}x_{ij}
  4. Constraints:
    \sum_{j=1}^n x_{ij} \leq s_i, i=1,...,m (supply)
    \sum_{i=1}^m x_{ij} = d_j, j=1,...,n(demand)
    x_{ij} \geq 0, i=1,...,m; j=1,...,n

3.3.2 Transshipment problem

  • General formulation of this problem
  1. Variables:
    x_{ij}=flow from facility i to customer j, i=1,..,m; j=1,...,n
    y_{ik}=flow from facility i to to distributioncenter k, i=1,...4, k=1,2
    w_{kj}=flow from distribution center k to customer j, k=1,2; j=1,..,5
  2. Parameters:
    e_{ik}=unit transportation cost between facility i and distribution center k
    g_{kj}=unit transportation cost between distribution center k and customer j.
  3. Objective:
    min z=\sum_{i=1}^m \sum_{j=1}^n c_{ij}x_{ij}+\sum_{i=1}^m \sum_{k=1}^{p} e_{ik} y_{ik} + \sum_{k=1}^p \sum_{j=1}^n g_{kj} w_{kj}
  4. Constraints:
    \sum_{j=1}^n x_{ij}+\sum_{k=1}^p y_{ik} \leq s_i, i=1,...,m (supply)
    \sum_{i=1}^{m} y_{ik} - \sum_{j=1}^{n} w_{kj}=0, k=1,...,p (distribution)
    \sum_{i=1}^{m} x_{ij}+\sum_{k=1}^{p} w_{kj}=d_j, i=1,...,m; j=1,...,n; k=1,...,p

3.3.3 Facility location

  1. Variables:
    x_{ij}=flow from facility i to customer j, i=1,..,m; j=1,...,n
    y_i=if facility i is used or not
  2. Parameters:
    f_i=fixed cost for facility i
  3. Objective:
    min z=\sum_{i=1}^m \sum_{j=1}^n c_{ij}x_{ij}+\sum_{i=1}^m f_i y_i
  4. constraints:
    \sum_{j=1}^n x_{ij} \leq s_i y_i, i=1,..., m (supply)
    \sum_{i=1}^m x_{ij}=d_j, j=1,...,n (demand)
    x_{ij} \geq 0, i=1,...,m; j=1,...,n
    y_i \in \{0,1\}, i=1,...,m
    The model is an integer programming model.

3.4 Blending problem

3.4.1 Simple metal blending

3.4.2 Blending for fodder

3.4.3 Pooling problem

The model is a nonlinear model

3.5 Allocation models

3.5.1 Investment strategy

The model is an LP model

3.5.2 Investment over several time periods

  • Pay attention to the illustration of the possible alternatives
  • The model is a LP model

3.5.3 Loading of tankers

The model is a mixed integer programming model because it includes both integer and continuous variables.


Lecture 2 JuMP and Gurobi

  • JuMP is an open-source modelling language for mathematical optimization, and is a package in Julia
  • Gurobi
    1. Optimization software package for solving linear and quadratic optimization problems with continuous and integer variables
    2. Originally based on the simplex method, implemented in C
    3. The primal and dual simplex methods (Lecture 3-5)
    4. The barrier method
    5. Techniques for avoiding degeneracy (Lecture 4)
    6. Generating cutting planes (Lecture 9)
    7. The branch & bound algorithm (Lecture 8)
    8. Heuristic methods (Lecture 10)
  • Solvers that work with JuMP
    1. Gurobi - solves linear and quadratic optimization problems with continuous and integer variables
    2. CPLEX
    3. MOSEK
    4. Clp - free, solves linear optimization problems with continuous variables
    5. Cbc - free, solves linear optimization problems with continuous and integer variables
    6. Baron, lpopt, Knitro, Nlopt, Xpress, etc
  • The diet problem - mathematical model
    1. Sets: J={1,...,5}; I = {1,...,4}
    2. Variables: x_j, j \in J
    3. Parameters: c_j, j \in J
  • The sensitivity analysis as described is possible only for linear programs in continuous variables (not for integer/binary; this is due to theoretical properties (Chapter 5))

Lecture 3 Convexity and linear optimization (Chapter 2.4, 4.1)

2.4 Basic convexity analysis

  • A general optimization problem written as:
    min f(x)
    s.t. x \in X
  • Definition 2.1
    A feasible point x^{(k)} is a global optimum if there is no other feasible points with a better objective function value, i.e.
    f(x^{(k)}) \leq f(x), \forall x \in X (minimization \ problem) \\ f(x^{(k)}) \geq f(x), \forall x \in X (maximization \ problem)
  • Definition 2.2
    A feasible point x^{(k)} is a local optimum if in a small neighbourhood N(x^{(k)}) to x^{(k)} there is no other feasible point with a better objective function value, i.e.
    f(x^{(k)}) \leq f(x), \forall x \in X \ and \ x \in N(x^{(k)})(minimization \ problem) \\ f(x^{(k)}) \geq f(x), \forall x \in X \ and \ x \in N(x^{(k)}) (maximization \ problem)
  • Definition 2.3
    The problem
    \text{min} \ f(x) \\ \text{s.t.} \ x\in X
    is a convex problem if f(x) is a convex function and X is a convex set. A problem is also convex if a maximization problem and the function f(x) is concave and X is a convex set.
  • In a convex problem, each local optimum is also a global optimum.
  • All LP problems are convex, while nonlinear problems are often non-convex
  • Integer programming problems are always non-convex.
  • Definition 2.4
    The function f(x) is a convex function on the feasible region X if for all choices of points x^{(1)}, x^{(2)} \in X and 0 \leq \lambda \leq 1 we have that f(\lambda x^{(1)}+(1-\lambda)x^{(2)} \leq \lambda f(x^{(1)}) + (1-\lambda)f(x^{(2)}))
  • Definition 2.5
    A set X \subseteq R^n is a convex set if for any pair of points x^{(1)}, x^{(2)} \in X and 0 \leq \lambda \leq 1 we have
    x=\lambda x^{(1)} + (1-\lambda)x^{(2)} \in X

Chapter 4 - Linear programming and the simplex method

4.1 Mathemetical characteristics of LP problems

  • LP problem in general form
    \text{min} \ z=\sum_{j=1}^{n} c_j x_j \\ \text{s.t.} \sum_{j=1}^n a_{ij} x_j \leq b_i, i=1,...,m\\ x_j \geq 0, j=1,...,n
  • Theorem 4.1: The feasible region of an LP problem is a convex set.
  • Definition 4.1: A convex combination of the points x^{(1)}, x^{(2)}, ... x^{(p)} is a point x such that
    x=\sum_{j=1}^{p} \lambda_j x^{(j)}, \ \sum_{j=1}^{p} \lambda_j =1 \ and \ \lambda_j \geq 0, j=1,...,p
  • Definition 4.2: The point x^{k} is an extreme point in X if it is not possible to express x^{(k)} as a strict (0 < \lambda < 1) convex combination of two points in X.
  • Theorem 4.2 (Fundamental theorem of linear programming): Assume that the feasible region X of an LP problem is bounded and non-empty, i.e. there exists a bounded optimal solution to the LP problem. Then the minimum value of the objective function c^T x is obtained in (at least) one extreme point x^{(k)} in X.

Lecture 4 Linear optimization: basic feasible solutions (Chapter 4.2-4.4, 4.8)

4.2 Standard form

  • Standard form has to satisfy the following criteria:
    1. All constraints as equality constraints
    2. Convert all variables to non-negative variables
    3. In addition, constraints are usually formulated so that right-hand-side(rhs) coefficients are non-negative
  • Slack variable (also called surplus variable) is added to left-hand-side in the constraint.
  • Slack variable is non-negative and can be interpreted as "exceeded rhs-resource"
  • A free variable x_6 can be replaced by tw onew non-negative variables according to the substitution x_6=x_6^+ - x_6^-, where x_6^+ \geq 0 and x_6^- \geq 0
  • A general LP problem written in standard form can be formulated as:
    min z=c^T x\\ s.t. Ax=b\\ x \geq 0
  • A system of equations in standard form normally has significantly more variables than equations. (under-determined)
  • With the assumption that the rows in A are linearly independent, the difference between the number of variables (n) and the number of equations (m), n-m, expresses the degree of freedom that the system of equations have.

4.3 Basic solutions

  • In order to have a mathematical description of the vertices, the basic solutions to the corresponding standard form of the LP problem are introduced.
  • Definition 4.3:
    A basic solution to the system of equations Ax=b is obtained if n-m variables are set to 0 and the remaining variables get their unique values when we solve the remaining (mxm) system of equations
    The variables that are set to 0 are called nonbasic variables (or zero variables) and the remaining m variables are called basic variables.
    If the basic solution also satisfies all non-negativity conditions we have a basic feasible solution. Otherwise we have an unfeasible basic solution.
    Every basic feasible solution corresponds to an unique vertex to the feasible region.
    A basic solution in which one or more basic variables have the value zero is called a degenerated basic solution. If a vertex is represented by more than one basic solution, then all these basic solutions are degenerated.
  • Summary of basic solutions - given the standard form for linear optimization problems
    1. Basic solution <-> intersection of constraints
    2. Basic feasible solution <-> a basic solution such that all variable values are \geq 0
    3. Basic infeasible solution <-> basic solution such that at least one variable is < 0
    4. Linearly dependent constraints -> 'missing' basic solution(s)

4.4 Change of basis

  • Optimal solution to a LP problem is always found in a vertex (except when the solution is unbounded)
  • Every vertex is presented by one or more basic feasible solution
  • The strategy of the simplex method is to search among basic feasible solutions (BFS) until the optimal BFS is found.
  • Definition 4.4: Two vertices are adjacent if the set of active constraints in the two points differ in one element only
  • Definition 4.5 Two basic solutions are adjacent if they have m-1 basic variables in common.
  • Change of basis: When we move between adjacent vertices (basic solutions) exactly one basic variable is replaced (which then becomes a non-basic variable) with one other basic variable (which before was non-basic).
  • Entering basic variable: The new basic variable
  • Leaving basic variable: The one that is replaced
  • In a vertex represented by a unique (non-degenerate) basic solution, there are n-m possible feasible search directions
  • If the basic solution is degenerate, it might be so that the change of basis does not lead to a change of vertex. We can interpret this change of basis as a move in the search direction with a zero step length.
  • In simplex method, we strive to make a move (a change of basis) so that the objective function value is improved, i.e., we always try to find a feasible search direction that is an improving search direction. However, in the case of degeneracy, it might be that the objective function value is unchanged after a change of basis.
  • Calculating the reduced cost in the point x^{k} when we move in direction d as:
    \overline{c} = \nabla f(x^{(k)})^T d = c^T d
  • Need to know more about this d, how it is derived. See exercise 4.5 in problem solving session 2. Two steps:
    1. Expressing the basic variables as a function of the non-basic variables
    2. Deciding all possible search directions by looking up the coefficient of entering variables in expression at step 1.
  • Optimality and feasibility condition for minimization in the simplex method
    1. Optimality condition (for minimization)
      The basis B is optimal if c_N^T - c_B^T B^{-1} N \geq 0^{(n-m)} (marginal values = reduced costs \geq 0)
    2. Feasibility condition
      For all i \in B it holds that x_i = (B^{-1}b)_i - (B^{-1}A_{j^*})_i x_{j^*}

4.8 Algebraic description of the simplex method

  • These algebraic expressions are for example useful when doing sensitivity analysis
  • Rewrite the linear optimization problem equivalently as:
    minimize \ z = c_B^T x_B + c_N^T x_N \ (1a)\\ subject \ to \ Bx_B + N x_N = b \ (1b)\\ \qquad x_B \geq 0^m, x_N \geq 0^{n-m} \ (1c)
    From equation (1b): x_B = B^{-1}b - B^{-1} N x_N, then insert this into (1a) and (1c):
    minimize \ z = c_B^T B^{-1}b + (c_N^T - c_B^T B^{-1} N)x_N\\ subject \ to \ B^{-1}b - B^{-1} N x_N \geq 0^m\\ \qquad x_N \geq 0^{n-m}
    1. Each non-basic variable takes the value 0, i.e., x_N = 0
    2. The basic variables take the values x_B = B^{-1}b - B^{-1}N x_N = B^{-1} b
    3. The value of the objective function is z=c_B^T B^{-1} b
    4. The basic solution is feasible if B^{-1}b \geq 0^m

Lecture 5 Linear optimization: the simplex method (Chapter 4.5–4.10, (7.1))

4.5 The simplex method - an example

  • Problem expressed in standard form:
    max \ z=30x_1 + 20x_2\\ s.t. \ 2x_1 + x_2 + x_3 = 100\\ \quad x_1 + x_2 + x_4 = 80\\ \quad x_1 + x_5 = 40\\ x_1, x_2, x_3, x_4, x_5 \geq 0
  • For more detailed algorithm of simplex method, please check course problem solving notes
  • Pivot operation

4.6 The simplex method - general algorithm description

  • The simplex method can be summarized in the following way:
    Step 0: Start from a basic feasible solution x^{(0)}. Let k=0.
    Step 1: Determine possible search directions and the reduced costs \overline{c}_j for all non-basic variables.
    Step 2: Check the convergence criterion: The point x^{(k)} is an optimal solution if:
    \overline{c}_j \geq 0, \forall j \ \text{(minimization problem)}\\ \overline{c}_j \leq 0, \forall j \ \text{(maximization problem)}
    for all non-basic variables
    Step 3: Determine the entering basic variable according to the criteria
    \overline{c}_p = \min_{j} \{\overline{c}_j | \overline{c}_j < 0 \} \ \text{(minimization problem)}\\ \overline{c}_p = \max_{j} \{\overline{c}_j | \overline{c}_j > 0 \} \ \text{(maximization problem)}
    which gives the entering basic variable x_p and the search direction d^{(k)}
    Step 4: Determine a step length by
    t^{(k)} = \frac{x_r^{(k)}}{-d_r^{(k)}}= \min_j \left ( \frac{x_j^{(k)}}{-d_j^{(k)}} | d_j^{(k)} < 0 \right )
    x_r becomes the leaving basic variable. The step length is calculated in the same way for both minimization and maximization problems.
    Note: If all components in d^{(k)} are non-negative, the problem is unbounded.
    Step 5: The new point is given by x^{(k+1)}=x^{(k)}+t^{(k)}d^{(k} and in the new basic solution x^r is replaced with x_p. Update k:=k+1 and go to Step 1.

4.7 Using simplex tableaux

4.8 Algebraic description of the simplex method

  • See Lecture 4

4.9 Finding a basic feasible solution

  • Two-phase method
    1. Phase 1 - If there is no feasible solution to the given set of constraints, introducing artificial variables a. The goal of Phase-1 is to find a basic solution where no artificial variables are in the basis, i.e. they should all have value 0. We can try to reach that goal by minimizing anaritificial objective function defined as the sum of all artificial variables. min \ w=a_1+a_3. The interpretation of the objective function is that we try to minimize the total infeasibility of the solution.
      Two cases can occur:
      • If the optimal solution has the value w^* = 0, we have reached the goal of finding a basic solution in which all artificial variables have the value 0. The resulting basic solution is optimal in the phase 1 problem and feasible in the original problem. Start phase 2
      • If the optimal solution has a value w^* > 0. The conclusion is that the original problem has no feasible solution. What to do then? Modelling errors? Or infeasible from the data, demand > supply?
    2. Phase 2

4.10 Convergence and degeneration

  • Definition 4.6: A basic solution is degenerated if (at least) one basic variable has the value 0.

Lecture 6 - Duality in linear optimization models (Chapter 6 (7.3, 7.5))

6.1 Deriving and interpreting the dual problem

  • There are two different ways of deriving the dual problem (maximization problem)
    1. Formulating an LP problem that determines how the combination of contraints should be carried out in order to get the best possible estimation of upper bound of z^*
      Defining the following variables:
      v_1 = weight of constraint 1
      v_2 = weight of constraint 2
      Do better than guess, compute optimal weights \Rightarrow minimize the value of the estimate \min w=3600v_1 + 5400 v_2
    2. Formulating an LP problem that determines the prices which will be set on the market is called the market problem.
      Defining the following variables:
      v_1 = price per minute for sawing
      v_2 = price per minute for gluing
      The market wishes to minimize the payment \min w=3600v_1 + 5400 v_2
  • Primal problem <=> Dual problem
  • One can always interpret the dual variables as shadow prices or marginal prices as we have seen in connection with the discussion about sensitivity analysis.

6.2 Formulation of the dual problem

  • a variable is in normal form if it is defined as non-negative (x_j \geq 0 and v_i \geq 0 respectively)
  • A constraint is in normal form if it is a \leq-constraint in a maximization problem and a \geq-constraint in a minimization problem, respectively.
  • If all variables and constraints in the primal problem are in normal form, then all constraints and variables in the dual problem will also be in normal form.


    Rules for constructing the dual problem

6.3 Duality theory

  • Primal (P) in normal form:
    max \ z = c^T x\\ s.t. \ Ax \leq b\\ \quad x \geq 0
  • Dual (D) in normal form:
    min \ w = b^T v\\ s.t. \ A^T v \geq c\\ \quad v \geq 0
  • Theorem 6.1 (weak duality)
    Assume \hat{x} is a feasible solution to (P) and \hat{v} is a feasible solution to (D). We then know that:
    c^T \hat{x} \leq b^T \hat{v}
  • Theorem 6.2 - If \hat{x} and \hat{v} are feasible solutions in (P) and (D) respectively and c^T \hat{x} = b^T \hat{v}, we know that \hat{x} and \hat{v} are optimal solution to respective problem.
  • Theorem 6.3 (Strong duality)
    If either (P) or (D) has a bounded optimal solution, the same holds for the other problem, and the two objective function values are equal to optimum, i.e.
    c^T \hat{x} = b^T \hat{v}
  • Alternative formulation of problem (P) which is:
    max \ z = c_B^T x_B + c_N^T x _N\\ s.t. \ Bx_B + Nx_N = b\\ \quad x_B, x_N \geq 0
    The corresponding dual problem (D) can be formulated:
    min \ w = b^T v\\ s.t. \ B^T v \geq c_B\\ \quad N^T v \geq c_N\\ \quad v \ free
  • Theorem 6.4 (Dual theorem)
    Assume that x_B = B^{-1} b is an optimal basic solution to (P). Then \overline{v}^T = c_B^T B^{-1} is an optimal solution to (D) and z^* = w^*
  • If a constraint is not active in optimum, the corresponding shadow price will have the value zero.
  • The general relation between a primal constraint i and the corresponding dual variable v_i is given by the complementary slackness constraint:
    v_i (b_i - \sum_{j=1}^n a_{ij} x_j) = 0
    either the dual variable has value zero, or the slack variable has value zero (the constraint is active). It can of course also be the case that both variables in a pair have the value zero.
  • For the relation between a primal variable x_j and the corresponding dual constraint j, we can formulate a complementary slackness constraint according to:
    x_j (\sum_{i=1}^m a_{ij} v_i -c_j) = 0
    saying that either the primal variable has value zero or the dual variable has value zero.
  • Theorem 6.5 (Complementary slackness theorem)
    Assume \hat{x} and \hat{v} are feasible solutions in (P) and (D) respectively. Then \hat{x} and \hat{v} are optimal solutions to respective problem if and only if the complementary slackness constraints
    \hat{v}^T(b-A \hat{x}) = 0 \quad and \quad \hat{x}^T (A^T \hat{v} - c) = 0
    are satisfied.
    Relations between primal and dual optimal solutions

Lecture 7 - Linear optimization: post-optimal and sensitivity analysis (Chapter 5.1–5.5, (5.6))

5.1 Sensitivity analysis and an example

  • Sensitivity analysis consists of two aspects:
    1. How does the optimum change when the right hand sides change?
    2. When the objective coefficients change?
  • To what extent one can draw correct conclusions from the optimization if the input data to the model is uncertain.
  • To investigate how sensitive the optimal solution is for changes in input data and other assumptions, i.e. to be able to analyze how robust the solution is.
  • By "change" we mean both an increase and a decrease of the coefficient.

5.2 Relaxation and restriction

  • Definition 5.1: A relaxation is a modification of a problem instance that leads to a larger feasible set (increases the number of fesible solutions) or alternatively that the feasible set remains unchanged.

  • If a problem is relaxed, the objective function value can never be worse. In a max problem, the objective function value will increase while in a min problem, the value will decrease.

  • Definition 5.2: A restriction is a modification of a problem instance that leads to a smaller feasible set (decreases the number of feasible solutions) or alternatively that the feasible set remains unchanged.

  • Examples of relaxation and restriction

    1. The change in a right-hand-side coefficient
    Constraint type Increase of b_i Decrease of b_i
    \leq Relaxation Restriction
    \geq Restriction Relaxation
    1. The change in a constraint coefficient a_{ij} in the left-hand-side of a constraint
    Constraint type Increase of a_{ij} Decrease of a_{ij}
    \leq Restriction Relaxation
    \geq Relaxation Restriction
    1. The change in objective function coefficient
    Problem type Increase of c_{j} Decrease of c_{j}
    Max problem z^* unchanged or larger z^* unchanged or lower
    \geq z^* unchanged or larger z^* unchanged or lower
    1. Add or remove a constraint or a variable
      Adding a constraint: Restriction
      Removing a constraint: Relaxation
      Adding a variable: Relaxation
      Removing a variable: Restriction

5.3 Shadow prices and the connection to duality

  • Definition 5.3: The shadow price for a constraint is given by the change in objective function value when making a marginal increase of the right-hand-side.
    Note: Other names of shadow prices are dual price, dual value or marginal price
  • Sign restrictions of dual variable v_i
    Primal problem \leq-constraint \geq-constraint =-constraint
    Min problem v_i \leq 0 v_i \geq 0 v_i free
    Max problem v_i \geq 0 v_i \leq 0 v_i free
  • Only constraints active in the optimum can affect the optimal objkective function value if we change the right-hand-side. The value of dual variables corresponding to non-active constraints are thus zero in the optimum.
  • Shadow price v_i is actually a derivative v_i = \frac{\partial z}{\partial b_i}
  • The shadow price is only constant in a certain neighbourhood of the original value of the right-hand-side coefficient and it is very important to calculate the interval in which the shadow price is valid (unchanged).
  • Figure 1: The effect of changing the right-hand-side, showing the relation between z^* and b_i
  • Figure 2: The effect of a change in the objective function, showing the relation between z^* and c_j

5.4 Interpretation of output froma software package

  • Some examples of analysis

5.5 Algebraic analysis of changes

  • Changing a right-hand-side coefficient b, only the problem feasibility is affected, not the reduced costs.

    Changes in the right-hand-side coefficients

  • Changing an objective function coefficient c_B an c_N, the reduced costs are affected, but not the feasible region. Thus, the current basic solution will be still feasible.

    Changes in the objective coefficients

  • New variables

  • New constraints

  • More about sensitivity analysis in linear optimization

    1. The current optimal solution may become infeasible due to
      1. Change in the right-hand-side, or
      2. Addition of constraints \Rightarrow new basis matrix: (m+1) \times (m+1)
    2. The current optimal solution may become non-optimal due to
      1. Changes in the objective coefficients, or
      2. Addition of variables (new opportunities may be favourable)
    3. Several coefficients may change simultaneously \Rightarrow may affect both feasibility and optimality
    4. Constraint coefficients (i.e., elements of A) may change, which may change the basis matrix, and then also B^{-1} \Rightarrow may affect both feasibility and optimality.

Lecture 8 - Integer linear optimization: modelling, applications, complexity (Chapter 13, 2.6)

13.1 What is an integer programming problem

  • An integer programming problem is a problem where one or several variables are restricted to integer values (discrete variables)
  • LP relaxation of the integer programming problem
  • Many integer programming problems have an underlying network structure and can also be formulated as network problems.
  • Integer linear programming (ILP) uses integer variables (x_{ij} \in \mathbb{Z}). An ILP (or MILP) possesses linear constraints and integer requirements on the variables

13.2 Knapsack problems

  • Objective function c_j benefit of choosing object j, j=1,...,n
  • Budge constraint

13.3 Facility location

13.4 Network design

13.5 Assignment problem

  • Assign each task to one resource, and each resource to one task

13.6 Generalized assignment problem

13.7 Matching problem

13.8 Set partitioning, set covering and set packing problems

  • Mathematical formulation
    \begin{aligned} &\min &c^T x\\ &\text{s.t.} &Ax \geq 1\\ & &x \in \{0,1\} \end{aligned}

13.9 Sequencing problem

13.10 Traveling salesman problem

13.11 Route planning

2.6 Basic complexity theory

  • Algorithm complexity: Estimating how the computational time increases as the problem size increases.
  • Problem complexity: Classifying how easy or difficult various problem classes are to solve.

2.6.1 Algorithm complexity

  • A problem means an optimization problem here.
  • The number of computations in an algorithm is proportional to the computation time.
  • For an LP problem, the size can be expressed by the number of variables and the number of equations in the problem together with a parameter that states how many bits are required to represent all numerical input data.
  • Ordo function O to express an upper cound of the number of steps. For two functions f(t) and g(t), we say that f(t)=O(g(t)) if there exists a cosntant c such that for a large t we have f(t) \leq cg(t). If the function g(t) is a polynomial function of the problem size t we say that the algorithm has a polynomial complexity. This type of problems are regarded as "easy" to solve. If the function g(t) is a exponential function of the problem size t (e.g., 2^t) we say that the algorithm has an exponential complexity.

2.6.2 Problem complexity

  • When we analyze and classify optimization problem with respect to their problem complexity we study a transformation of the problem which is called decision problem.
  • Optimization problems whose decision problem can be solved with a polynomial algorithm are called "easy" problem and this class of problem is denoted P.
  • A larger class of problems, which includes the class P is denoted NP from "nondeterministic polynomial".
  • The most difficult decision problem in the class NP are called NP complete (also called NP hard problem).

Lecture 9 - Integer linear optimization: relaxation, bounds, branch & bound algorithm (Chapter 14.1-14.3, 15.1-15.3)

14.1 Overview of methods

  • There is no similar property like in the simplex method in IP problem
  • IP problem is non-convex, and it is not possible to compute a derivative and hence no gradient. It is also difficult to conlcude if a solution that is a local optimal solution is also a global optimal solution
  • The most important strategies for solving IP problems are:
    1. Enumeration methods (implicit enumeration, Branch and Bound methods)
    2. Relaxation and decomposition methods (LP relaxation and Lagrangian relaxation)
    3. Cutting plane methods (Repeatedly solving the LP relaxation of the IP problem)
    4. Heuristics (a class )
  • The difficulty to solve IP problem depends on two main factors: the number of integer variables and the problem structure.
  • For LP problem, the number of constraints is much more important for the solution time.

14.2 Optimality and relaxations

  • For LP problem and nonlinear problems it is possible to formulate optimality conditions (KKT conditions). These conditions together with convexity analysis can be used to identify and prove that a solution is a local or global optimal solution. However, for IP problems there are no corresponding optimiality conditions so other techniques are needed.
  • In a minimization problem,
    1. The optimistic bound (also called dual bounds) provides a so-called lower bound (LBD) (\underline{z} \leq z^*) from a relaxation of ILP. Relaxation examples are:
      1. Remove the integer constraints and solve the LP relaxation.
      2. Make the feasible region larger by removing one or several of the constraints. (constraint relaxation or combinatorial relaxation)
      3. Make a Lagrangian relaxation, which removes a set of constraints and at the same time changes the objective function by introducing corresponding terms based on Lagrangian multipliers.
    2. The pessimistic bound (also called primal bound) provides an upper bounds (UBD) (\overline{z} \geq z^*) from a feasible solution to ILP.
  • The following properties are valid:
    1. A relaxation always provides a better (or equally good) objetive function value.
    2. If a relaxation of the IP problem has no feasible solution, then problem IP has no feasible solution.
    3. If x_R^* is a feasible solution to IP, then we have x_{IP}^*=x_{R}^* and z_{IP}^*=z_{R}^*. (May not be true with Lagrangian relaxation)
    4. Each feasible solution \overline{x} that is found to IP, by modifying x_R^*, provides a pessimistic bound of z_{IP}^*.

14.3 To choose the right model

  • Many solution methods for integer programming are based on solving a sequence of LP problems whose optimal objective function values gradually reach the optimal objective function value for the IP problem. If we can use a strong formulation, we typically can make such methods more effective.
  • Definition 14.1: A point y is a convex combination of a set of points x^{k}, \ k \in K if y=\sum_{k \in K} \lambda_k x^{(k)}, where \sum_{k \in K} \lambda_k = 1 and \lambda_k \geq 0, \ \forall k \in K.
  • Definition 14.2: The convex hull of a set of points x^{(k)}, \ k \in K, consists of all possible convex combinations of the points.
    Note: Convex hull is an area where feasible integer points are located
  • The smaller the difference is between the optimal objective function value to the IP problem and the corresponding optimal solution to the LP relaxation, the better (or stronger) model we have.
  • The ideal linear program has integer extreme points.
  • In general, it is not computationally practical to find an ideal formulation (the convex hull) of an IP problem. However, the difference btween the feasible region (in particular near the optimal solution) and the convex hull is minimized by two methods:
    1. Initially choose a strong formulation with respect to the LP relaxation of the problem
    2. Adding problem specific valid inequalities

15.1 Introduction

  • Branch and Bound (B&B), optimistic bound of the optimal objective function value and a pessimistic bound of the optimal objective function value are generated.
  • Search tree (or B&B tree) consists of nodes representing subproblems and edges representing the new constraints added to define new subproblems.

15.2 General strategy for branch and bound

  • Relax
    The optimal solution of a relaxed version of the problem provides an optimistic bound of the optimal objective function value. Two alternatives to relax the original problem:
    1. LP relaxation
    2. Lagrangian relaxation
  • Branch
    Exclude the solution to a relaxation if it is not feasible in ILP \Leftrightarrow a partitioning of the feasible set.
  • Prune (修剪)
  • Search
    1. Depth first
    2. Breadth first
    3. Best first
  • A termination criterion that expresses the relative deviation between the best pessimistic bound (UBD) and the best optimistic bound (LBD) can be defined as:
    \frac{UBD-LBD}{LBD} < \epsilon
    where a typical parameter value of \epsilon is 0.01.

15.3 The Land-Doig-Dakin algoroithm

  • The most common relaxation using branch and bound is LP relaxation, solve this LP relaxation problem first.
  • Land-Doig-Dakin algorithm is a branch and bound algorithm based on LP relaxation

Lecture 10 - Integer linear optimization theory and algorthms: good/ideal formulation, cutting planes, lagrangean duality and relaxation (Chapter 14.4-14.5, 17.1-17.2, (17.3-17.4), 13.10-13.11, 15.4, (15.5))

14.4 Valid inequalities

  • Definition 14.3: A vaid inequality is a linear inequality
    \sum_{j=1}^{n} a_j x_j \leq b
    that is satisfied for all x \in X_{IP}. If we add valid inequalities to the LP relaxation of an IP problem, we will get a better approximation of the convex hull.
  • Defnition 14.4 : if \sum_{j=1}^{n} a_j x_j \leq b is a valid inequality to X_{IP}, then the set
    F = \{ x:x \in X_C, \sum_{j=1}^n a_j x_j = b\}
    is the face of X_C created by the valid inequality.
  • Definition 14.5: A face F of a convex hull X_C with dimension dim(X_C)=n, is called a facet if dim(F)=n-1.

14.5 Cutting plane methods

  • A general cutting plane algorithm (p378)
    1. Solve the linear programming (LP) relaxation
    2. If the LP solution is integer: stop; an optimal solution to the ILP is found
    3. Add one or several valid inequalities that cut off the fractional solution but bone of the integer solutions
    4. Resolve the new problem and go to step 2
  • The cuts (or constraints) added in a cutting plane method must be valid inequalities. Otherwise, one or several feasible integer points may be cut away.
  • A valid inequality must cut away parts of the feasible region to the LP relaxation (hence the name cutting plane)
  • Often only one valid inequality is added in each iteration (after solving each LP relaxation) in cutting plane methods.
  • Methods for generating valid inequalities
    1. Chvatal-Gomory cuts (combine constraints, make beneficial roundings of LHS and RHS)
    2. Gomory's method: generate cut from an optimal simplex basis (Ch. 14.5.1)
  • Pure cutting plane algorithms are usually less efficient than branch-&-bound
  • In commercial solvers (e.g. CPLEX), cuts are used to help (presolve) the branch-&bound algorithm
  • For problems with specific structures (e.g. TSP and set covering) problem specific classes of cuts are used.

14.5.1 Gomory's method

  • Gomory's method was one of the first cutting plane methods developed (Gomory's method \in cutting plan methods).
  • One requirement needed for generating cuts is that all coefficients in the problem are integers.
  • The equation where x_i is a basic variable, N is the set of non-basic variables and where the right hand side \overline{b_i} is non-integer:
    x_i + \sum_{j \in N} \overline{a_{ij} x_j} = \overline{b_i}
    which can be rewritten as:
    x_i - \lfloor \overline{b_i} \rfloor + \sum_{j \in N} \lfloor \overline{a_{ij}} \rfloor x_j = f_i - \sum_{j \in N} f_{ij} x_j
  • The constraint is a Gomory cut and it is called the integer cut since it is expressed from the integer parts in equation above:
    x_i + \sum_{j \in N} \lfloor \overline{a_{ij}} \rfloor x_j \leq \lfloor \overline{b_{i}} \rfloor
  • The constraint is the fractional cut:
    f_i - \sum_{j \in N} f_{ij} x_j \leq 0

14.5.2 Problem specific cuts: cover inequalities

  • Definition 14.6: The set S is a cover if \sum_{j \in S} a_j > b. The set S is also a minimal cover if for each selection of k \in S, we also have that S \setminus k is not a cover, i.e. \sum_{j \in S} a_j - a_k \leq b
  • Theorem 14.1: If S is a minimal cover, then the constraint \sum_{j \in S} x_j \leq |S| - 1 is a valid inequality for X.
  • Theorem 14.2: If S is a cover of X, then the augmented cover constraint
    \sum_{j \in E(S)} x_j \leq |S| - 1
    is a valid inequality if E(S)= S \cup \{ j: a_j \geq a_i, \forall i \in S\}
  • Theorem 14.3: If \alpha \geq 1, then x_{LP} satisfies all cover inequalities.
    If \alpha < 1, then the inequality
    \sum_{j: z_j^*=1} x_j \leq \sum_j z_j^* -1
    cuts away the fractional solution x_{LP} with the proportion 1-\alpha. We have got a measure of the strength of an inequality.

17.1 Lagrangian duality

  • General formulation of an optimization problem
    min \ f(x)\\ s.t. \ g_i(x) \geq b_i, \ i=1,...,m
    The functions f(x) and g_i(x), i=1,...,m are general nonlinear funcitons and may also be non-convex functions.
    The Lagrangian function for problem is:
    L(x,v) = f(x) + \sum_{i=1}^m v_i (b_i-g_i(x))
    where v_i \geq 0, i = 1,..., m are Lagrangian multipliers (or dual variables)
  • (Lagrange) dual function:
    h(v) = \min_x L(x,v)
  • (Lagrange) dual problem:
    \max_{v \geq 0} h(v)
  • Theorem 17.1 (Weak duality)
    Suppose that x^* is an optimal solution to (P). For each v \geq 0, we have h(v) \leq f(x^*), duality gap
  • Theorem 17.2: The dual function h(v) is concave.

17.2 Lagrangian relaxation

  • If the original problem is convex, we can guarantee to find solution x^* and v^* where h(v^*) = f(x^*)
  • With non-convex problem, we often have duality gap h(v^*) < f(x^*)
  • In minimization problem, Lagrangian relaxation provides optimistic estimate of z^*. The Lagrangian relaxation bound is never worse than the linear programming relaxation bound z^{LP} \leq h^* \leq z^*

17.4 Applications

  • There are two ways to generate an optimistic bound of the optimal objective function value.
    1. Relax the integer requirements and solve the LP relaxation.
    2. Relax one of the linear constraint by using Lagrangian relaxation.
  • The general experience of solving integer programming problems is that the optimistic bound obtained from Lagrangian relaxation is stronger than the bound from the LP relaxation.
    z_{LP}^* \leq h(v^*) \leq z^*
    Where z_{LP}^* is the optimal objective function value to the LP relaxation.
  • If the set X has the integrality property (i.e., X^{LP} has integral extreme points), then h^*=z^{LP}

13.10 Traveling salesman problem (TSP)

  • Many applications can be formulated as a traveling salesman problem
  • The problem is easy to model but difficult to solve
  • Two types of problem:
    1. Symmetric version of the problem: only one arc is needed.
    2. Asymmetric version of the problem: the arc cost between two nodes depends on the direction

13.11 Route planning

  • Vehicle Routing Problem (VRP)

15.4 Branch and bound methods for structured problems

15.4.1 Land-Doig-Dakin for the knapsack problem

15.4.2 Branch and bound for the traveling salesman problem

15.4.3 Branch and bound for the job scheduling problem


Lecture 11 - Integer linear optimization: (non-) convexity, local/global optimality, heuristics and approximation algorithms (Chapter 16, 8.3)

Local vs. global optima
  • Non-convex optimization problems do not fulfull strong duality \Rightarrow (efficient) algorithms cannot generally be constructed to fulfill optimality conditions

16.1 Introduction

  • Each heuristic is adapted to fit the specific problem structure.
  • Heuristics can also be used as part of advanced methods to find feasible solutions. Examples are branch and bound, Lagrangian relaxation.

16.2 Combinatorial formulations

16.3 Constructive heuristics

  • Greedy algorithms

16.3.1 Heuristics for the traveling salesman problem

  • Nearest neighbor algorithm
  • Nearest addition algorithm
  • Nearest merger algorithm

16.3.2 Heuristics for matching problems

  • Greedy matching algorithm

16.3.3 Heuristics for the set covering problem

  • Greedy covering algorithm

16.3.4 Heuristics for the vehicle routing problem

  • Clarke & Wright algorithm
  • Sweep algorithm

16.4 Local search

  • Defining a neighbourhood N(x) which consists of all solutions in some sense are "close" to x.
  • Each solution y \in N(x) is called a nighbour to x.

16.5 Metaheuristics

16.5.1 Tabu search

16.5.2 Simulated annealing

16.5.3 Metaheuristics and infeasible solutions

16.6 Approximation algorithms

16.6.1 Spanning tree heuristic for the traveling salesman problem

16.6.2 Christofides'heuristic traveling salesman problem

8.3 Minimum spanning tree

  • Select a subset of the arcs in the graph so that they constitute a tree spanning all nodes and where the sum of the arc costs in the tree is minimal.
  • The optimality conditions for the problem can be formulated as:
    A tree T is a minimum spanning tree if
    1. for every arc (i,j) \in \overline{T} we have that c_{ij} \geq c_{kl} for all arcs (k,l) included in the unique path between i and j.
    2. for every arc (i,j) \in T we have that c_{ij} \leq c_{kl} for all arcs (k,l) included in the cut [S. \overline{S}] obtained if arc (i,j) is removed.
  • The cardinality of the selected subset of arcs should be n-1, where n is the number of nodes in the graph.
  • Two algorithms for determining a minimum spanning tree:
    1. Kruskal's algorithm is based on the first group of conditions
    2. Prim's algorithm is based on the second group of conditions
  • Kruskal's and Prim's algorithm both belong to a type of algorithm known as greedy algorithms.

Lecture 12 - Network flows: linear optimization formulations (Chapter 8.1-8.2, 8.4, (8.5), 18.1-18.5, (18.6-18.7), 13.5)

8.1 Network optimization problem preview

  • A network consists of a network consisting of nodes and arcs.
  • Network flow problems can be divided into two classes:
    1. Minimum cost network flow problem (LP problems and considered to be easy to solve. Our focus in this Lecture)
    2. Combinatorial problem

8.2 Basic concepts

  • A graph G=(N,B) is defined by a set of nodes N=\{ n_1, n_2, ...\}, a set of arcs B=\{b_1, b_2, ...\} and a relation (mapping) between the nodes and arcs.
  • A direct arc only allows flow in the specified direction. An undirected arc (also called link) allows flows in both directions and can be interpreted as two (opposite) directed arcs.
  • Undirected graph: Only need the node numbers
  • Directed graph: with m nodes and n arcs, we can define a so called node-arc incidence matrix A. each row in the matrix corresponds to one node, and each column corresponds to one arc and includes exactly two non-zero elements; one -1 and one +1.
  • Node has in-degree and out-degree
  • A graph is bipartite if we can partition the node set into two subsets N_1 and N_2 such that for each arc b_j=(n_i, n_k) \in B either n_i \in N_1 and n_k \in N_2 or n_i \in N_2 and n_k \in N_1.
  • A cut is a partition of the node set into two subsets, S and \overline{S} = N \backslash S
  • A path (sometimes called route) in a directed graph is a connected sequence of arcs from one node to another, where the starting node of each arc is the termination node for the preceding node in the sequence.
  • If we disregard the direction of the arcs, we talk about a chain.
  • A flow problem is defined by specifying for each node n_i the total net flow to/from the node. The net flow (also called node strength) is defined as
    d_i = inflow - outflow = \sum_{j:T(b_j)=n_i} x_j - \sum_{j:S(b_j)=n_i} x_j
    If d_i is negative, we say that node n_i is a source (or demand node) and if d_i is positive we say that node n_i is a sink (or supply node). A node where d_i=0 is called a transhipment node.

8.4 Shortest path problems

8.4.1 Problem description and Bellman's equations

  • When solving a shortest path problem we assume that our network satisfies the following conditions:
    1. All arcs are directed
    2. Node n_t can be reached from node n_s
    3. There are no cycles with negative costs
  • Bellman's equations:
    y_s = 0\\ y_j = \min_{i:(i,j) \in B} \{y_i + c_{ij}\}, j = 2,..., n

8.4.2 An algorithm for the shortest-path problem

  • Theorem 8.1: Dijkstra's algorithm will give an optimal solution to the problem of finding a shortest path in a network with non-negative arc costs only.

8.4.3 Shortest path - all to all

  • Algorithm - Floyd-Warshall

8.4.4 Other types of shortest path problems

  • Longest path
  • Multiplicative objective function
  • Path with maximal capacity

18.1 Introduction

  • Dynamic programming is a solution strategy which can be used to solve many structured optimization problems.
  • An optimal decision is often called a control and the variables are called control variables. Given the current state in one stage, the decision will give a new state in the next stage.

18.2 An initial example

  • The shortest path can be computed sequentially (recursively) with respect to defined stages as follows.
    (backpropagation)

18.3 Network interpretation and Bellman's equations

  • The cost on which we base the decision has two components:
    1. The first component is the arc cost to go from a node in the current stage to a new node in the next stage
    2. The second component is the remaining (best) cost to continue to the end node.
  • The last stage computed provides the overall best objective function value.
  • Principle of optimality: Given the current state, the optimal decision for each of the remaining stages must not depend on previously reached states or previously chosen decisions.

18.4 General formulation

  • The total cost is given by the recursion function denoted f_t(s_t)
    \begin{aligned} f_t(s_t) &=\min \sum_{i=t}^T c_i(s_i, x_i)\\ &=\min_{x_t}[c_t(s_t,x_t)+\sum_{i=t+1}^T c_i(s_i,x_i)]\\ &=\min_{x_t}[c_t(s_t,x_t)+f_{t+1}(s_{t+1}) ] \end{aligned}
    The recursion relation corresponds to Bellman's equation in a standard shortest path problem. The first term is the arc cost and the second is the node price.
  • Backward recursion and forward recursion are equivalent, both provide the optimal solution. The difference is the side information:
    1. Backward recursion offers the optimal way to continue from each possible state.
    2. Forward recursion offers the optimal way to get to each state.

18.5 Formulation of the shortest path problem

  • Formulation (backward recursion) for Example 18.1
    Stages: t=1,2,3,4
    Control variables: x_t=city which we travel to in stage t
    States: s_t=city we start from in stage t
    Transition function: s_{t+1}=x_t
    Recursion function: f_t(s_t)=shortest path from city s_t to LA
    Recursion relation:f_t(s_t)=\min_{x_t}\{f_{t+1}(s_{t+1})+c_t(s_t,x_t)\}, where c_t(s_t,x_t) gives the distance between city s_t and city x_t. Search f_1(s_1)
    Initial conditions: s_1=NY, f_5(s_5)=0
    Restrictions: x_1,s_2 \in \{Co, Na, Lo\}
    x_2, s_3 \in \{KC, Om, Da\}
    x_3, s_4 \in {De, SA}

13.5 Assignment problems

  • Assignment problem is a special case of the transportation problem. Hence it can be classified as a network flow problem.
  • Define the variables as:
    x_{ij} = 1, \text{if object $i \in I$ is assigned object $j \in J$}\\ x_{ij} = 0, \text{otherwise}
    The model can be formulated as:
    \begin{aligned} &\min &\quad z = &\sum_{i \in I} \sum_{j \in J} c_{ij} x_{ij}\\ &\text{s.t.} &\quad &\sum_{j \in J}x_{ij} = 1, i \in I\\ & &\quad &\sum_{i \in I}x_{ij} = 1, j \in J\\ & &\quad &x_{ij} \in \{ 0,1 \}, i \in I, j \in J. \end{aligned}

Lecture 13 - Linear optimization formulations and algorithms for minimum cost network flows (Chapter 8.6-8.7)

8.6 Minimum cost network flow problems

  • To determine a flow with the minimum cost in a network, where
    1. supply and demand (strengths) of the nodes have to be satisfied
    2. Fufil the capacity constraints of the arcs.
  • Interger properties of MCFP

8.6.1 Examples of models

  • A network flow problem
  • Maximum flow problem
  • Time dynamic networks

8.6.2 LP formulation

  • Given a network with m nodes and n arcs, where c_j, l_j and u_j denote cost, lower bound and upper bound on the flows on arc j and where d_i denotes the net flow to/from node i, we can formulate a minimum cost network flow problem as:
    \begin{aligned} &\min &\quad z = & \sum_{j=1}^n c_j x_j \\ &\text{s.t.} &\quad &\sum_{j=1}^n a_{ij} x_j = d_i, i = 1,...,m\\ & &\quad &l_j \leq x_j \leq u_j, j=1,..., n \end{aligned}

8.6.3 Integer properties

  • Integer property holds for for the minimum cost network flow problem. We study a minimum cost network flow problem stated as:
    \begin{aligned} &\min &\quad z=&c^T x\\ &\text{s.t.} &\quad &Ax=b\\ & &\quad &l \leq x \leq u \end{aligned}
  • Theorem 8.2: If a minimum cost network flow problem has integer bounds b,l and u, all extreme points of the feasible region are integer.
  • Definition 8.1: A matrix is totally unimodular if each quadratic sub-matrix has a determinant with value -1, 0, +1.
  • Theorem 8.3: The incidence matrix (关联矩阵 from graph theory)

8.6.4 Extensions of the minimum cost network flow problem

  • Network with several commodities
  • Network with gains and losses

8.7 The simplex method for network flow problems

  • MCFP problems can be solved with a special version of the standard simplex method for LP problems.
  • In a minimum cost network flow problem we may have both lower and upper bounds of the variables (flows) and the simplex method for network flow problem takes these into account.

8.7.1 Optimality conditions

  • Let us study the following LP formulation of the minimum cost network flow problem
    \begin{aligned} &\min &\quad z=&\sum_{(i,j) \in B} c_{ij} x_{ij}\\ &\text{s.t.} &\quad &\sum_{k:(k,i) \in B} x_{ki} - \sum_{j:(i,j) \in B} x_{ij} = b_i, i \in N\\ & &\quad &l_{ij} \leq x_{ij} \leq u_{ij}, (i,j) \in B \end{aligned}
  • Summary of the optimality conditions
    A solution x_{ij} is an optimal solution to a minimum cost network flow problem is the following relations are satisfied (two alternative sets of conditions are provided)

\overline{c}_{ij}=0 \to l_{ij} \leq x_{ij} \leq u_{ij}\\ \overline{c}_{ij}<0 \to x_{ij}=u_{ij}\\ \overline{c}_{ij}>0 \to x_{ij}=l_{ij}

l_{ij} < x_{ij} <u_{ij} \to \overline{c}_{ij}=0\\ x_{ij}=l_{ij} \to \overline{c}_{ij} \geq 0\\ x_{ij}=u_{ij} \to \overline{c}_{ij} \leq 0
where \overline{c}_{ij} is the reduced cost for arc \{i,j\}

8.7.2 Solution principle

8.7.3 Algorithm - Simplex method for the network flow problem

8.7.4 Determining a basic feasible solution

When solving a minimum cost network flow problem with the simplex method we require a basic feasible solution, i.e. a solution that satisfies the node balance constraints and the capacity constraints.


Lecture 14 - Overview of nonlinear optimization (Chapter 2.5.1, 9-12)

2.5.1 Search method for continuous problems

  • For continuous problems (linear programming problems and nonlinear problem), the sequence of points is determined by the general formula:
    x^{(k+1)} = x^{(k)} + t^{(k)} d^{(k)}
    where x^{(k)} and x^{(k+1)} are the current and next points, d^{(k)} is a search direction and t^{(k)}
    is a step length.
  • The search direction d^{(k)} should be:
    1. A feasible search direction
    2. An improving search direction

9 Introduction to nonlinear optimization

  • Separating nonlinear optimization problems by with and without constraints:
    1. unconstrained optimization
    2. constrained optimization
  • If the problem has an quadratic objective function and linear constraints, we call it a quadratic programming problem.
  • Separating nonlinear optimization problems by convex and nonconvex problem.

9.1 Examples of nonlinear models

  1. Least square problem
  2. Two dimensional cutting pattern
  3. Construction of a bar-truss structure

9.2 Approximations of functions

  • First and second order approximations based on a Taylor expansion.
  • Other equivalent names for these approximations are linear and quadratic approximations.
  • Definition 9.1: The gradient \nabla f is a vector consisting of all partial derivatives to the function f(x) and can be expressed as
    \nabla f(x_1,...,x_n) = \begin{pmatrix} \frac{\partial f} {\partial x_1}\\ \frac{\partial f} {\partial x_2}\\ \vdots\\ \frac{\partial f} {\partial x_n} \end{pmatrix}
  • Definition 9.2: The Hessian H (\nabla^2 f) is a quadratic matrix consisting of all partial second order derivatives to the function f(x) and can be expressed as
    H(x_1,...,x_n)= \begin{pmatrix} \frac{\partial^2 f} {\partial x_1^2} & \frac{\partial^2 f} {\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f} {\partial x_1 \partial x_n}\\ \frac{\partial^2 f} {\partial x_2 \partial x_1} & \frac{\partial^2 f} {\partial^2 x_2} & \cdots & \frac{\partial^2 f} {\partial x_2 \partial x_n}\\ \vdots & \vdots & & \vdots\\ \frac{\partial^2 f} {\partial x_n \partial x_1} & \frac{\partial^2 f} {\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f} {\partial x^2_n} \end{pmatrix}
  • Definition 9.3: A first order Taylor approximation of the function of the function f(x) around the point x^{(k)} is given by
    f_1(x) = f(x^{(k)}) + \nabla f(x^{(k)})^T (x-x^{(k)})
  • Definition 9.4: A second order Taylor approximation of the function f(x) around the point x^{(k)} is given by
    f_2(x) = f(x^{(k)}) + \nabla f(x^{(k)})^T (x-x^{(k)}) + \frac{1}{2} (x-x^{(k)})^T H(x^{(k)})(x-x^{(k)})

9.3 Convex analysis

  • The convexity properties decide if we with certainty or not can announce that the solution we have found is the global optimum.
  • The problem
    \begin{aligned} &\min &\quad &f(x)\\ &\text{s.t.} &\quad &g_i(x) \leq b_i, i=1,...,m \end{aligned}
    is a convex problem if f(x) is a convex function and if the feasible region X = \{x | g_i(x) \leq b_i, i=1,...,m\} defined by the constraints g_i(x) \leq b_i, i=1,...,m is a convex set.
  • Theorem 9.1: If the problem
    \begin{aligned} &\min &\quad & f(x)\\ &\text{s.t.} &\quad & x \in X \end{aligned}
    is convex, then each local minimum is also a global minimum.

Convexity of a set

  • Theorem 9.2: Let X_1, X_2, ..., X_p be convex sets. Then the intersection
    X = X_1 \cap X_2 \cap ... \cap X_p
    is a convex set
  • Theorem 9.3: The set
    X = \{x | g(x) \leq b\}
    is a convex set if the function g(x) is convex.

Convexity of a function

  • Theorem 9.4: If f_1(x), f_2(x), ..., f_p(x) are convex functions and we have \lambda_i \geq 0, i=1,...,p, then the function f(x)=\sum_{i=1}^{p} \lambda_i f_i(x) is convex.
  • Definition 9.6: The quadratic matrix H is positive definite (positive semi-definite) if d^T H d > 0, for all vectors d \neq 0 \ (d^T H d \geq 0, \forall d).
  • Theorem 9.5: Suppose the function f(x) is twice differentiable defined on a convex set X. Then we have:
    1. f(x) is a convex function on X if the Hessian matrix H is positive semi-definite for all x \in X
    2. f(x) is a strict convex function on X if the Hessian matrix H is positive definite for all x \in X
    3. f(x) is a concave function on X if the Hessian matrix H is negative semi-definite for all x \in X
    4. f(x) is a strict concave function on X if the Hessian matrix H is negative definite for all x \in X
      Note: if none of these hold, we have that H is indefinite and the function is neither convex nor concave.
  • To determine all eigenvalues to the matrix H by finding the roots to the so called characteristic equation to H (i.e. to the equation det(H - \lambda I) = 0)
  • Let \lambda_1, \lambda_2, ..., \lambda_n denote the eigenvalues to the Hessian matrix H.
    Theorem 9.6:
    If \lambda_1 \geq 0, ..., \lambda_n \geq 0 then H is positive semi-definite
    If \lambda_1 > 0, ..., \lambda_n > 0 then H is positive definite
    If \lambda_1 \leq 0, ..., \lambda_n \leq 0 then H is negative semi-definite
    If \lambda_1 < 0, ..., \lambda_n < 0 then H is negative definite
  • Theorem 9.8:
    Let h(y) and g(x) be convex functions, and let also h(y) be a non-decreasing function. Then the compound function f(x)=h(g(x)) is convex.

10 Methods for unconstrained nonlinear optimization

  • In this chapter, we consider problems written as
    \min f(x) = f(x_1,..., x_n)
    i.e. nonlinear problems without constraints. It is assumed that the function f(x) is differentiable and that there exists an optimal solution to the problem.
  • An optimal solution to an unconstrained problem is always a stationary point which may be a local minimum, local maximum or a saddle point.
  • Theorem 10.1: A necessary condition for the unconstrained problem min f(x) to have an optimal solution in the point x^* is that it is a stationary point, i.e. \nabla f(x^*)=0
  • Theorem 10.2: A sufficient condition for the unconstrained problem min f(x) to have an optimal solution in the point x^* is that it is a stationary point, and the function f(x) is convex.
  • If the function is strictly convex (concave) there is only one stationary point. This point is hence both a local and global minimum (maximum).
  • Theorem 10.3: A sufficient condition for the unconstrained problem min f(x) to have a local optimal solution in the point x^*, is that it is a stationary point, and that the Hessian matrix in the point x^* is positive definite.
  • With the first order Taylor series approximation of the function f(x) in the point x^{(k)}, we get the following function:
    f_1(x) = f(x^{(k)}) + \nabla f(x^{(k)})^T (x- x^{k}) = f(x^{(k)}) + \nabla f(x^{(k)})^T d^{(k)}
    Definition 10.1: A direction d^{(k)}, such that \nabla f(x^{(k)})^T d^{(k)} < 0 is called a descent direction and is a direction in which
    Definition 10.2: A directed d^{(k)} such that \nabla f(x^{(k)})^T d^{(k)} > 0 is called an ascent direction and is a direction in which f(x) increases.
  • Global convergence - "global" here means that the methods find a local minimum regardless of the (global) starting point.
  • In all these search methods, the step length is determined through an one-dimensional search (line search)

10.1 Steepest descent method

  • The search direction is computed only by using the gradient \nabla f(x^{(k)}) in the point x^{(k)}. The gradient \nabla f(x^{(k)}) define the direction in which the function f(x) increases most (is steepest) locally. In the same way, -\nabla f(x^{(k)}) defines the direction where it decreases most locally.

10.2 Newton methods

  • If we use a second order Taylor series approximation to determine a search direction, we get Newton's method and Newton's modified method. A second order Taylor series approximation of the function f(x) around the point x^{(k)} can be written as:
    f_2(x) = f(x^{(k)}) + \nabla f(x^{(k)})^T(x-x^{(k)}) + \frac{1}{2} (x-x^{(k)})^T H(x^{(k)}) (x-x^{(k)})
    The gradient for the approximate function is
    \nabla f_2(x) = \nabla f(x^{(k)}) + H(x^{(k)})(x-x^{(k)})
    We know from Theorem 10.1 that the optimal solution is found in a stationary point. We can therefore set \nablaf_2(x)=0 and express the search direction as
    d^{(k)} = x - x^{(k)} = -H^{-1} (x^{(k)}) \nabla f(x^{(k)})
    This search direction is called the Newton direction. The search direction is valid both for minimization and maximization problems. The interpretation of using the Newton direction is that we use second order information to rotate the gradient direction so that it points towards the stationary point of the approximate function.
  • If we always use the step length t^{(k)}=1$, i.e. no line search, we get Newton's method. If we use normal line search, we get Newton's modified method.

10.3 Extensions for Newton methods

  • A more general search method can be expressed as:
    \begin{aligned} x^{(k+1)} &= x^{(k)} + t^{(k)} d ^{(k)}\\ &= x^{(k)} - t^{(k)} M (x^{(k)}) \nabla f(x^{(k)})\\ &= x^{(k)} - t^{(k)} (H(x^{(k)}) + \lambda I)^{-1} \nabla f(x^{(k)}) \end{aligned} This modification is called *(Newton) Marquardt's modification*.

10.4 One-dimensional optimization

  • A one-dimensional optimization problem is a problem with one variable.
  • Line search or one-dimensional search.
  • Definition 10.3: A function f(x) to be minimized in the interval [\alpha, \beta] and has a minimum in x^* is unimodal on the interval if all t_1, t_2 \in [\alpha, \beta] such that t_1 < t_2, f(t_1) \neq f(x^*) and f(t_2) \neq f(x^*) has the properties
    t_2 < x^* \Rightarrow f(t_1) > f(t_2)\\ t_1 > x^* \Rightarrow f(t_1) < f(t_2)
    The interpretation of a unimodal function in minimization is that the function (for increasing value of x) is non-increasing followed by non-decreasing.
    Note: a function f(t) can be unimodal despite being non-convex. A convex function is always unimodal.
  • Golden section method
  • Bi-section method
  • Newton-Raphson;s method
  • Armijo's method

11 Optimality conditions for nonlinear problems

11.1 Geometric interpretation of the KKT conditions

  • Definition 11.1: A cone C which is spanned by the vectors h_1, ..., h_s consists of all vectors y that can be written as a non-negative linear combination of the vectors h_i, i = 1, ..., s, i.e.
    C = \{y|y=\sum_{i=1}^s \alpha_i h_i, \alpha_i \geq 0, i=1,...,s\}
  • For \overline{x} to be a local maximum in maximization problem, the gradient of the objective function must lie in the cone spanned by the gradients of the active constraints. In other words, we should be able to write \nabla f(\overline{x}) as a non-negative linear combination of \nabla g_1(\overline{x}) and \nabla g_2{(\overline{x})}
  • If we have a nonlinear problem with m inequalities we can generalize the above conditions and formulate it as follows:
    \begin{aligned} \nabla f(x) = \sum_{i=1}^{m} v_i \nabla g_i(x) (1a)\\ v_I \geq 0, i = 1,...,m (1b)\\ g_i(x) \leq b_i, i = 1,..., m (2)\\ v_i(b_i-g_i(x)) = 0, i = 1,..., m (3) \end{aligned}
    These constraints are called the Karush-Kuhn-Tucker (KKT) conditions, where (1) denotes dual feasibility, (2) denotes primal feasibility and (3) complementarity.

11.2 KKT conditions derived using the Lagrangian function

  • A general nonlinear optimization problem formulated as:
    \begin{aligned} &\text{(ILP)} &\quad &\min &\quad &f(x) \\ & &\quad &\text{s.t.} &\quad &g_i(x) \geq b_i, i = 1,..., m \end{aligned}
    The Lagrangian function for problem (ILP) can be stated:
    L(x,v) = f(x) + \sum_{i=1}^m v_i(b_i-g_i(x))
    Note:
    1. Lagrangian multipliers are also called dual variables.
    2. In our formulation, all multipliers are non-negative as we have a minimization problem and all constraints are \geqconstraints.
    3. The terms that are added to f(x) and that define the Lagrangian function can be interpreted as penalty terms for not satisfying the constraints, and the multiplier v_i can be interpreted as a penalty parameter for constraint i.
  • Definition 11.2: The point (x^*, v^*) is a saddle point to L(x,v) if
    L(x^*, v) \leq L(x^*, v^*) \leq L(x, v^*)
    for all x and v \geq 0
  • The definition of a saddle point states that L(x,v) has a minimum with respect to x and a maximum with respect to v in the point (x^*, v^*). The point (x^*, v^*) is a solution to the problem
    \max_{v \geq 0} \min_x L(x,v)
  • Theorem 11.1 (Saddle point theorem): If the Lagrangian function L(x,v) has a saddle point in (x^*, v^*), then x^* is an optimal solution to the problem (ILP).
  • The saddle point theorem provides sufficient conditions for optimality but not necessary conditions. It may happen that a point is optimal without being a saddle point to the Lagrangian function.
  • A necessary condition for a point to be optimal is that it is a stationary point to the Lagrangian function. It can be proved using Lagrangain multiplier method that each stationary point satisfies the Karush-Kuhn-Tucker conditions.

11.3 Necessary and sufficient conditions for optimality

  • Theorem 11.2 (Necessary conditions for optimality)
    Suppose that the point x^* is a local minimum to the problem (ILP) and satisfies some regularity conditions. Then there is a solution v \in R^m such that
    \begin{aligned} \nabla f(x^*) &= \sum_{i=1}^m v_i \nabla g_i(x^*)\\ v_i &\geq 0, i=1,...,m\\ g_i(x^*) &\geq b_i, i=1,...,m\\ v_i(b_i-g_i(x^*)) &=0, i=1,...,m \end{aligned}
    The theorem states that the KKT conditions must be satisfied if a point x^* is to be a local minimum.
    There are several types of "some regularity conditions", and the most common (and strongest) is that the gradients of the active and non-redundant constraints are linearly independent (also called Constraint Qualification condition).
    Note: more constraint qualifications will be covered in nonlinear optimization. Because some implicit regularity conditions are met in LP or IP (e.g., in Lecture 14 slides 39, assume that there exists a point \overline{x} \in S such that g_I(\overline{x}) < 0, i \in L).
  • Theorem 11.3 (Sufficient conditions for optimality)
    Suppose that there is a point x^* that satisfies the KKT conditions. If the problem is convex, then x^* is both a local and global optimum.
  • (The comparison of necessary (串联) and sufficient (并联) condition): https://www.zhihu.com/question/30469121

11.4 Formulation of the KKT conditions

  • Sign restrictions of dual variable (also called Lagrangian multipliers) v_i
    Primal problem \leq-constraint \geq-constraint =-constraint
    Min problem v_i \leq 0 v_i \geq 0 v_i free
    Max problem v_i \geq 0 v_i \leq 0 v_i free
  • The interpretation of the Lagrangian multiplier values in optimum is the same as for dual values in LP theory.

12 Methods for constrained nonlinear optimization

  • Two methods that can be used to solve nonlinear problems with linear constraints.
    1. Frank-Wolfe method linearizes the objective function and solves a sequence of LP problems to find the optimal solution.
    2. Quadratic programming based on KKT conditions
  • A nonlinear problem with nonlinear constraints can be reformulated and then solved by using a sequence of unconstrained nonlinear problems. Two methods:
    1. Reformulating the problem using penalty functions
    2. Using barrier functions

12.1 The Frank-Wolfe method

  • This method primarily is used for convex nonlinear problem with linear constraints
  • The idea is to linearize the function (make a first order Taylor series expansion) around the current point. By solving the resulting LP problem, we can determine a search direction and then perform a line search between the current point and the solution to the LP problem.
  • Considering f(x) as a convex function of a minimization problem and z(x) denotes a linear approximation of the function f(x) in the given point. If we solve the LP problem with z(x) as the objective function, we will obtain a solution in the point y^{(k)}. The objective function values in the points x^{(k)} and y^{(k)} which are f(x^{(k)}) and z(y^{(k)}) respectively can be used to get bounds on the optimal objective function value f(x^*)
    z(y^{(k)}) \leq f(x^*) \leq f(x^{(k)})
    1. f(x^{(k)}) gives a pessimistic bound of the optimal objective function value f(x^*). This is a Upper Bound, UBD.
    2. z(y^{(k)}) gives a optimisitic bound of the optimal objective function value f(x^*). This is a Lower Bound, LBD.
  • The search direction is d^{(k)} = y^{(k)} - x^{(k)}. We need to ensure that we stay between the points x^{(k)} and y^{(k)} by limiting the step length t by 0 \leq t \leq 1.
  • If the solution found satisfies f(x^{(k)}) = z(y^{(k)}), then we have found the optimal solution.

12.2 Quadratic Programming

  • Quadratic programming problems have a quadratic objective function and linear constraints
  • A quadratic programming problem with equality constraints written in matrix form as
    \begin{aligned} &\min &\quad f(x) = & \frac{1}{2} x^T Q x + c^T x\\ &\text{s.t.} &\quad &Ax = b \end{aligned}
    It is assumed that the quadratic matrix Q is symmetric and positive definite, which means that the problem is convex. It is also assumed that the matrix A has full rank, which means there is always a feasible solution to the system of linear equations written as Ax=b.
    The KKT conditions for the problem above can be written in matrix form as
    \begin{aligned} c + Qx &= A^T v\\ Ax &= b \end{aligned}
    The system above can also be expressed as:
    \begin{bmatrix} -Q & A^T\\ A & 0 \end{bmatrix} \begin{bmatrix} x\\ v \end{bmatrix} = \begin{bmatrix} c \\ b \end{bmatrix}
    By assuming that the matrix Q is positive definite and that the matrix A has full rank, there is always a unique solution to the equations.
  • Algorithm - Active set method (The active set at x is made up of those constraints g_i(x) that are active at the current point.)

12.3 Penalty function methods

  • In this section, the constrained nonlinear problem is reformulated as an unconstrained optimization problem by adding a penalty term that measures and penalizes the distance from any infeasible point to the feasible region. The new function is a penalty function, and by using different values on a penalty parameter, we solve a sequence of unconstrained problems.
  • Referring to the Table 12.1 on Page 32, the larger the value of the penalty parameter, the closer it is to the optimal solution and the optimal objective function value. However, for really large values of the parameter, we get numerical problem.
  • A general nonlinear problem with equality constraints can be formulated as
    \begin{aligned} &\min &\quad &f(x)\\ &\text{s.t.} &\quad &h_i(x) = 0, i=1,...,m \end{aligned}
    The penalty function can be written as
    F_S(x, \mu) = f(x) + \mu \alpha(x)
    where the penalty term is \alpha(x), the penalty parameter is \mu.
    The unconstrained optimization problem becomes
    \min_x F_S(x, \mu)
  • Recommended penalty term \alpha(x) = \sum_{i=1}^m (max(0, g_i(x)))^2

12.4 Barrier function methods

  • Instead of penalizing infeasible solutions, barrier function methods penalize the deviation from the border of the feasible region.
  • Penalty functions (also called exterior penalty functions) generate a sequence of infeasible points that converge to a feasible and optimal point.
  • Barrier functions (also called interior penalty functions) generate a sequence of feasible points that converge to the optimal solution.
  • A general nonlinear problem with inequality constraints can be formulated as
    \begin{aligned} &\min &\quad &f(x) \\ &\text{s.t.} &\quad &g_i(x) \leq 0, i=1,...,m \end{aligned}
    The penalty function can be written as
    F_B(x, \mu) = f(x) + \mu B(x)
    where the penalty term is B(x)=\sum_{i=1}^{m} \frac{-1}{g_i(x)} or B(x)= - \sum_{i=1}^{m} ln(-g_i(x)), the penalty parameter is \mu.
    The unconstrained optimization problem becomes
    \min_x F_B(x, \mu)
    Note: Barrier functions can only be used when all constraints are inequality constraints. This is because we need the feasible solutions to be interior points.
  • For initial large values of \mu, the optimal solution will be an interior point relatively far from the border of the feasible region. For smaller values of \mu, the penalty will be smaller and the solution will get closer to the border. Referrring to the Table on Page 318, we can see that we get closer to the optimal point and to the optimal function value as \mu is decreasing.
  • Barrier functions always provide feasible solutions which can be used to generate pessimistic bounds of the optimal objective function value. In the same way, penalty functions provide optimistic bounds.

Lecture 15 - Example exams


最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 218,036评论 6 506
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,046评论 3 395
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 164,411评论 0 354
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,622评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,661评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,521评论 1 304
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,288评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,200评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,644评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,837评论 3 336
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,953评论 1 348
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,673评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,281评论 3 329
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,889评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,011评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,119评论 3 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,901评论 2 355

推荐阅读更多精彩内容

  • 表情是什么,我认为表情就是表现出来的情绪。表情可以传达很多信息。高兴了当然就笑了,难过就哭了。两者是相互影响密不可...
    Persistenc_6aea阅读 125,036评论 2 7
  • 16宿命:用概率思维提高你的胜算 以前的我是风险厌恶者,不喜欢去冒险,但是人生放弃了冒险,也就放弃了无数的可能。 ...
    yichen大刀阅读 6,050评论 0 4