Course literature:

Optimization (English) (Links to an external site.) by J. Lundgren, M. Rönnqvist, and P. Värbrand; Studentlitteratur 2010

Computer exercises and software

The purpose of this computer exercise is to make you familiar with the use of software for computing solutions to linear programs.
The following software is included:
1. JuMP - An algebraic modelling language for linear, quadratic, and nonlinear constrained optimization problems embedded in Julia.
2. Gurobi - A linear, integer linear, and quadratic programming solver with interfaces to MATLAB, JuMP, and Python.
3. Atom - A free and open-source text and source code editor.

Lectures plan

Lecture	Content	Chapters in the course book
1	Introduction, course map, modelling optimization applications	1, 2.1-2.5, 3
2	a) Julia/JuMP and optimization solvers; computer exercise on linear optimization. b) Supply chain (Assignment 1)
3	Convexity and linear optimization	2.4, 4.1
4	Linear optimization: basic feasible solutions	4.2–4.4, 4.8
5	Linear optimization: the simplex method	4.5–4.10, (7.1)
6	Duality in linear optimization models	6, (7.3, 7.5)
7	Linear optimization: post-optimal and sensitivity analysis	5.1–5.5, (5.6)
8	a) Integer linear optimization: modelling, applications, complexity b) Optimization of maintenance scheduling (Assignment 2)	13, 2.6
9	Integer linear optimization: relaxations, bounds, branch&bound algorithm	14.1–14.3, 15.1–15.3
10	Integer linear optimization: good/ideal formulation, cutting planes, Lagrangean duality and relaxation	14.4–14.5, 17.1–17.2, (17.3–17.4), 13.10–13.11, 15.4, (15.5)
11	Integer linear optimization: (non-)convexity, local/global optimality, heuristics and approximation algorithms	16, 8.3
12	Network flows: linear optimization formulations	8.1–8.2, 8.4, (8.5), 18.1–18.5, (18.6–18.7), 13.5
13	Minimum cost network flows: linear optimization formulations and algorithms	8.6–8.7
14	Overview of nonlinear optimization	2.5.1, 9–12
15	Example exam

Problem solving sessions

Session	Topics	Exercises recommended for students to solve
1	Mathematical modelling	3.1 b,c,d; 3.5, 3.10, 3.12, 3.15
2	Linear optimization and the simplex method	2.4, 2.6, 4.2, 4.6, 4.10, 4.11, 4.15
3	Sensitivity analysis and duality theory	5.1, 5.5, 6.6, 6.8, 6.10, 6.15
4	Integer linear optimization and the branch-and-bound algorithm	13.5, 13.6, 13.9, 15.6 (13.8, 13.15, 15.3, 15.12, 15.14)
5	Cutting plane methods, minimal cover, Lagrangean duality	14.4, 14.8, 17.9 (14.1, 14.3, 14.6, 14.9, 17.8)
6	Network optimization: minimum spanning tree and shortest path algorithms	8.10, 8.12, 8.17a (8.18, 8.38ab)
7	Nonlinear optimization: convexity and the KKT conditions	9.8, 9.10, 11.4, (9.4, 11.6)

Lecture 1 - Introduction, course map, modelling optimization applications (Chapter 1, 2.1-5, 3)

Chapter 1 - Introduction to optimization

A general optimization problem:
$min \ f(x)\\ s.t. g_i(x) \leq b_i, i =1,...,m$
P is a LP problem if:
1. all functions $f, g_1, ..., g_m$ are linear
2. all variables are continuous, i.e., $x \in R^n$
P is a nonlinear problem if:
1. at least one of the function $f, g_1, ..., g_m$ is a nonlinear function
2. all variables are continuous, i.e., $x \in R^n$
P is a linear integer programming problem when a subset of the variables (at least one) is defined as integer variables.

Chapter 2 - Introductional examples and basic concepts

Level curves of the objective function
An optimal solution is a feasible solution where the objective function reaches its maximum (or minimal) value.
Active constaints
A constraint that doesn't affect the feasible region is called redundant.
Linear programming solution properties
1. Unique optimal solution - The unique solution is always found in a corner point of the feasible region.
2. Alternative optimal solutions - When the best optimal objective function value is found in at least two corner points. However, we stop once we have found one optimal solution.
3. Unbounded solution - This situation occurs in a practical problem when there is an error in the model formulation or in the input data.
4. No solution exits - This situation appears when the constraints are defined so that there is no point that satisfies all constraints. Note that infeasibility only depends on the cojnstraints, and not on the objective function.
Nonlinear optimization solution properties
1. A nonlinear problem may have several local optima, and one or several of these are the global optima.
2. If the nonlinear problem is a convex optimization problem, there is only one local optimum. If in addition the objective function is strictly convex (minimization problem) or strictly concave (maximization problem), the optimal solution is unique.
Integer programming solution properties
1. If we remove the integer requirements on the variables, we get an LP problem. This problem is called an LP relaxation of the integer programming problem.
2. Some integer programming problems can be solved by first solving the LP relation, and then rounding the solution to obtain a feasible integer solution.
Most solution methods are developed to find local optimum.
In a convex problem, each local optimum is also a global optimum.
All LP problems are convex, while nonlinear problems are often non-convex.
Note that a set cannot be concave, it is either convex or non-convex sets.

2.5 Search methods - a general description

Search methods start from a feasible point (solution) to the problem.
The search method will generate a sequence of feasible points $x^{(0)}, x^{(1)}, ..., x^{(p)}$ , whose objective function values become successively better.
In some search methods, e.g., the simplex method, the objective function value may be the same between two consecutive iterations.

2.5.1 Search method for continuous problems (primal method):

$x^{(k+1)} = x^{(k)} + t^{(k)} d^{(k)}$
Chosen search direction $d^{(k)}$ should be:

A feasible search direction
An improving search direction

If in the given point, we cannot find a feasible and improving search direction, we can conclude that we have found a local optimum.
General description
1. Step 0 - starting point
2. Step 1 - search direction (we use some info about the gradient and the Hessian matrix of the objective function $f(x)$ in the point $x^{(k)}$ . A descent direction $d{(k)}$ has the property $\nabla f(x^{(k)})^T d^{(k)} < 0$ )
3. Step 2 - convergence criterion (Predefined optimality conditions for the problem)
4. Step 3 - step length
5. step 4 - convergence (The general search method converges towards a local optimum if we choose an improving search direction and perform the line search properly. If the problem is convex, we also find the global optimum.)

2.5.2 Search method for integer programming problems (primal method)

It is more difficult to use the concepts of search direction and step length.
General description
1. Step 0 - Start from a feasible point $x^{(0)}$ . set $k=0$
2. Step 1 - Find the neighbouring integer solution $x^{(k+1)}$ with the best objective function value
3. Step 2 - Check convergence criterion. The point $x^{(k)}$ is a local optimum if the point $x^{(k+1)}$ does not have a better objective function
4. Step 3 - Let $x^{(k+1)}$ be the new solution
5. Step 4 - Update $k:=k+1$ and go to Step 1
  Note: Since integer programming problems are non-convex, we cannot guarantee that the solution obtained is a local optimum.

2.5.3 Primal and dual methods

An alternative principal is to move between infeasible points in a systematic way, and ultimately in the last iteration finds a feasible solution. However, we then know that this solution is a local (possible global) optimum. This class of methods is called dual methods.
Dual methods are also common for various classes of network problems in which the methods can be made very efficient.
General description - dual search method:
1. Step 0 - Relax some constraints and let $\Omega^1 \supseteq X$ define the set of feasible solutions that are defined by the remaining constraints. Set $k=1$ .
2. Step 1 - Solve the relaxed optimization problem min $f(x)$ , s.t. $x \in \Omega^k$ -> the point $x^{(k)}$
3. Step 2 - If the point $x^{(k)} \in X$ , then all constraints in the original problem are satisfied -> Stop! The point $x^{(k)}$ is an optimal solution to the problem.
4. Step 3 - Idenfity a constraint that is not ssatisfied by the point $x^{(k)}$ and let $S$ denote all feasible solutions with the respect to this constraint. Add the constraint to the problem, i.e., update $\Omega^{(k+1)}:=\Omega^k \cap S$
5. Step 4 - Update $k=k+1$ and go to Step 1.

Chapter 3 - Modelling

3.1 Index and summation

Amount of raw material $i$ that is blended into product $j$ during time period $t$
$x_{ijt}$
Introducing a general notation $m,n,T,p$ for the number of products, number of raw materials, number of time periods and number of resources, respectively.

3.2 Production planning

General formulation of this problem:

Variables:
$x_{jt}$ =number of units of model $j$ produced during month $t$ , $j=S,L,P;t=1,2$
$I_{jt}$ =number of units of model $j$ in inventory at the end of month $t$ , $j=S,L,P;t=1,2$
Parameters:
$c_{jt}$ =cost to produce one unit of product $j$ in time period $t$ , $j=1,...,n$ , $t=1,...,T$
$h_{jt}$ =inventory cost for one unit of product $j$ in period $t$ , $j=1,...,n$ , $t=1,...,T$
$a_{jk}$ =number of hours product $j$ requires in department $k$ , $j=1,...,n$ , $k=1,...,K$
$b_{kt}$ =capacity in department $k$ in period $t$ , $k=1,...,K$ , $t=1,...,T$
Objective:
min $z=\sum_{j=1}^n \sum_{t=1}^T (c_{jt} x_{jt}+h_{jt} I_{jt})$
Constraints:
$\sum_{j=1}^n a_{jk} x_{jt} \leq b_{kt}, k=1,...,K; t=1,...,T$
$I_{j,t-1}+x_{jt}-I_{jt}=d_{jt}, j=1,...,n; t=1,...,T$
$x_{jt},I_{jt} \geq 0, j=1,...,n; t=1,...,T$

3.3 Transport and distribution

This optimization problem can be expressed in a network consisting of nodes and arcs.

3.3.1 Transportation problem

General formulation of this problem

Variables:
$x_{ij}$ =flow from facility $i$ to customer $j$ , $i=1,..,m; j=1,...,n$
Parameters:
$c_{ij}$ =unit transportation cost from facility $i$ to customer $j$ , $i=1,..,m; j=1,...,n$
$s_i$ =supply at facility $i, i=1,...,m$
$d_j$ =demand at customer $j,j=1,...,n$
Objective:
min $z=\sum_{i=1}^m \sum_{j=1}^n c_{ij}x_{ij}$
Constraints:
$\sum_{j=1}^n x_{ij} \leq s_i, i=1,...,m$ (supply)
$\sum_{i=1}^m x_{ij} = d_j, j=1,...,n$ (demand)
$x_{ij} \geq 0, i=1,...,m; j=1,...,n$

3.3.2 Transshipment problem

General formulation of this problem

Variables:
$x_{ij}$ =flow from facility $i$ to customer $j$ , $i=1,..,m; j=1,...,n$
$y_{ik}$ =flow from facility $i$ to to distributioncenter $k$ , $i=1,...4, k=1,2$
$w_{kj}$ =flow from distribution center $k$ to customer $j$ , $k=1,2; j=1,..,5$
Parameters:
$e_{ik}$ =unit transportation cost between facility $i$ and distribution center $k$
$g_{kj}$ =unit transportation cost between distribution center $k$ and customer $j$ .
Objective:
min $z=\sum_{i=1}^m \sum_{j=1}^n c_{ij}x_{ij}+\sum_{i=1}^m \sum_{k=1}^{p} e_{ik} y_{ik} + \sum_{k=1}^p \sum_{j=1}^n g_{kj} w_{kj}$
Constraints:
$\sum_{j=1}^n x_{ij}+\sum_{k=1}^p y_{ik} \leq s_i, i=1,...,m$ (supply)
$\sum_{i=1}^{m} y_{ik} - \sum_{j=1}^{n} w_{kj}=0, k=1,...,p$ (distribution)
$\sum_{i=1}^{m} x_{ij}+\sum_{k=1}^{p} w_{kj}=d_j, i=1,...,m; j=1,...,n; k=1,...,p$

3.3.3 Facility location

Variables:
$x_{ij}$ =flow from facility $i$ to customer $j$ , $i=1,..,m; j=1,...,n$
$y_i$ =if facility $i$ is used or not
Parameters:
$f_i$ =fixed cost for facility $i$
Objective:
min $z=\sum_{i=1}^m \sum_{j=1}^n c_{ij}x_{ij}+\sum_{i=1}^m f_i y_i$
constraints:
$\sum_{j=1}^n x_{ij} \leq s_i y_i, i=1,..., m$ (supply)
$\sum_{i=1}^m x_{ij}=d_j, j=1,...,n$ (demand)
$x_{ij} \geq 0, i=1,...,m; j=1,...,n$
$y_i \in \{0,1\}, i=1,...,m$
The model is an integer programming model.

3.4 Blending problem

3.4.1 Simple metal blending

3.4.2 Blending for fodder

3.4.3 Pooling problem

The model is a nonlinear model

3.5 Allocation models

3.5.1 Investment strategy

The model is an LP model

3.5.2 Investment over several time periods

Pay attention to the illustration of the possible alternatives
The model is a LP model

3.5.3 Loading of tankers

The model is a mixed integer programming model because it includes both integer and continuous variables.

Lecture 2 JuMP and Gurobi

JuMP is an open-source modelling language for mathematical optimization, and is a package in Julia
Gurobi
1. Optimization software package for solving linear and quadratic optimization problems with continuous and integer variables
2. Originally based on the simplex method, implemented in C
3. The primal and dual simplex methods (Lecture 3-5)
4. The barrier method
5. Techniques for avoiding degeneracy (Lecture 4)
6. Generating cutting planes (Lecture 9)
7. The branch & bound algorithm (Lecture 8)
8. Heuristic methods (Lecture 10)
Solvers that work with JuMP
1. Gurobi - solves linear and quadratic optimization problems with continuous and integer variables
2. CPLEX
3. MOSEK
4. Clp - free, solves linear optimization problems with continuous variables
5. Cbc - free, solves linear optimization problems with continuous and integer variables
6. Baron, lpopt, Knitro, Nlopt, Xpress, etc
The diet problem - mathematical model
1. Sets: $J={1,...,5}; I = {1,...,4}$
2. Variables: $x_j, j \in J$
3. Parameters: $c_j, j \in J$
The sensitivity analysis as described is possible only for linear programs in continuous variables (not for integer/binary; this is due to theoretical properties (Chapter 5))

Lecture 3 Convexity and linear optimization (Chapter 2.4, 4.1)

2.4 Basic convexity analysis

A general optimization problem written as:
min $f(x)$
s.t. $x \in X$
Definition 2.1
A feasible point $x^{(k)}$ is a global optimum if there is no other feasible points with a better objective function value, i.e.
$f(x^{(k)}) \leq f(x), \forall x \in X (minimization \ problem) \\ f(x^{(k)}) \geq f(x), \forall x \in X (maximization \ problem)$
Definition 2.2
A feasible point $x^{(k)}$ is a local optimum if in a small neighbourhood $N(x^{(k)})$ to $x^{(k)}$ there is no other feasible point with a better objective function value, i.e.
$f(x^{(k)}) \leq f(x), \forall x \in X \ and \ x \in N(x^{(k)})(minimization \ problem) \\ f(x^{(k)}) \geq f(x), \forall x \in X \ and \ x \in N(x^{(k)}) (maximization \ problem)$
Definition 2.3
The problem
$\text{min} \ f(x) \\ \text{s.t.} \ x\in X$
is a convex problem if $f(x)$ is a convex function and $X$ is a convex set. A problem is also convex if a maximization problem and the function $f(x)$ is concave and $X$ is a convex set.
In a convex problem, each local optimum is also a global optimum.
All LP problems are convex, while nonlinear problems are often non-convex
Integer programming problems are always non-convex.
Definition 2.4
The function $f(x)$ is a convex function on the feasible region $X$ if for all choices of points $x^{(1)}, x^{(2)} \in X$ and $0 \leq \lambda \leq 1$ we have that $f(\lambda x^{(1)}+(1-\lambda)x^{(2)} \leq \lambda f(x^{(1)}) + (1-\lambda)f(x^{(2)}))$
Definition 2.5
A set $X \subseteq R^n$ is a convex set if for any pair of points $x^{(1)}, x^{(2)} \in X$ and $0 \leq \lambda \leq 1$ we have
$x=\lambda x^{(1)} + (1-\lambda)x^{(2)} \in X$

Chapter 4 - Linear programming and the simplex method

4.1 Mathemetical characteristics of LP problems

LP problem in general form
$\text{min} \ z=\sum_{j=1}^{n} c_j x_j \\ \text{s.t.} \sum_{j=1}^n a_{ij} x_j \leq b_i, i=1,...,m\\ x_j \geq 0, j=1,...,n$
Theorem 4.1: The feasible region of an LP problem is a convex set.
Definition 4.1: A convex combination of the points $x^{(1)}, x^{(2)}, ... x^{(p)}$ is a point $x$ such that
$x=\sum_{j=1}^{p} \lambda_j x^{(j)}, \ \sum_{j=1}^{p} \lambda_j =1 \ and \ \lambda_j \geq 0, j=1,...,p$
Definition 4.2: The point $x^{k}$ is an extreme point in $X$ if it is not possible to express $x^{(k)}$ as a strict (0 < $\lambda$ < 1) convex combination of two points in $X$ .
Theorem 4.2 (Fundamental theorem of linear programming): Assume that the feasible region $X$ of an LP problem is bounded and non-empty, i.e. there exists a bounded optimal solution to the LP problem. Then the minimum value of the objective function $c^T x$ is obtained in (at least) one extreme point $x^{(k)}$ in $X$ .

Lecture 4 Linear optimization: basic feasible solutions (Chapter 4.2-4.4, 4.8)

4.2 Standard form

Standard form has to satisfy the following criteria:
1. All constraints as equality constraints
2. Convert all variables to non-negative variables
3. In addition, constraints are usually formulated so that right-hand-side(rhs) coefficients are non-negative
Slack variable (also called surplus variable) is added to left-hand-side in the constraint.
Slack variable is non-negative and can be interpreted as "exceeded rhs-resource"
A free variable $x_6$ can be replaced by tw onew non-negative variables according to the substitution $x_6=x_6^+ - x_6^-$ , where $x_6^+ \geq 0$ and $x_6^- \geq 0$
A general LP problem written in standard form can be formulated as:
$min z=c^T x\\ s.t. Ax=b\\ x \geq 0$
A system of equations in standard form normally has significantly more variables than equations. (under-determined)
With the assumption that the rows in $A$ are linearly independent, the difference between the number of variables (n) and the number of equations (m), $n-m$ , expresses the degree of freedom that the system of equations have.

4.3 Basic solutions

In order to have a mathematical description of the vertices, the basic solutions to the corresponding standard form of the LP problem are introduced.
Definition 4.3:
A basic solution to the system of equations $Ax=b$ is obtained if $n-m$ variables are set to 0 and the remaining variables get their unique values when we solve the remaining (mxm) system of equations
The variables that are set to 0 are called nonbasic variables (or zero variables) and the remaining m variables are called basic variables.
If the basic solution also satisfies all non-negativity conditions we have a basic feasible solution. Otherwise we have an unfeasible basic solution.
Every basic feasible solution corresponds to an unique vertex to the feasible region.
A basic solution in which one or more basic variables have the value zero is called a degenerated basic solution. If a vertex is represented by more than one basic solution, then all these basic solutions are degenerated.
Summary of basic solutions - given the standard form for linear optimization problems
1. Basic solution <-> intersection of constraints
2. Basic feasible solution <-> a basic solution such that all variable values are $\geq 0$
3. Basic infeasible solution <-> basic solution such that at least one variable is $< 0$
4. Linearly dependent constraints -> 'missing' basic solution(s)

4.4 Change of basis

Optimal solution to a LP problem is always found in a vertex (except when the solution is unbounded)
Every vertex is presented by one or more basic feasible solution
The strategy of the simplex method is to search among basic feasible solutions (BFS) until the optimal BFS is found.
Definition 4.4: Two vertices are adjacent if the set of active constraints in the two points differ in one element only
Definition 4.5 Two basic solutions are adjacent if they have $m-1$ basic variables in common.
Change of basis: When we move between adjacent vertices (basic solutions) exactly one basic variable is replaced (which then becomes a non-basic variable) with one other basic variable (which before was non-basic).
Entering basic variable: The new basic variable
Leaving basic variable: The one that is replaced
In a vertex represented by a unique (non-degenerate) basic solution, there are $n-m$ possible feasible search directions
If the basic solution is degenerate, it might be so that the change of basis does not lead to a change of vertex. We can interpret this change of basis as a move in the search direction with a zero step length.
In simplex method, we strive to make a move (a change of basis) so that the objective function value is improved, i.e., we always try to find a feasible search direction that is an improving search direction. However, in the case of degeneracy, it might be that the objective function value is unchanged after a change of basis.
Calculating the reduced cost in the point $x^{k}$ when we move in direction $d$ as:
$\overline{c} = \nabla f(x^{(k)})^T d = c^T d$
Need to know more about this , how it is derived. See exercise 4.5 in problem solving session 2. Two steps:
1. Expressing the basic variables as a function of the non-basic variables
2. Deciding all possible search directions by looking up the coefficient of entering variables in expression at step 1.
Optimality and feasibility condition for minimization in the simplex method
1. Optimality condition (for minimization)
  The basis $B$ is optimal if $c_N^T - c_B^T B^{-1} N \geq 0^{(n-m)}$ (marginal values = reduced costs $\geq 0$ )
2. Feasibility condition
  For all $i \in B$ it holds that $x_i = (B^{-1}b)_i - (B^{-1}A_{j^*})_i x_{j^*}$

4.8 Algebraic description of the simplex method

These algebraic expressions are for example useful when doing sensitivity analysis
Rewrite the linear optimization problem equivalently as:

From equation (1b): , then insert this into (1a) and (1c):
1. Each non-basic variable takes the value 0, i.e., $x_N = 0$
2. The basic variables take the values $x_B = B^{-1}b - B^{-1}N x_N = B^{-1} b$
3. The value of the objective function is $z=c_B^T B^{-1} b$
4. The basic solution is feasible if $B^{-1}b \geq 0^m$

Lecture 5 Linear optimization: the simplex method (Chapter 4.5–4.10, (7.1))

4.5 The simplex method - an example

Problem expressed in standard form:
$max \ z=30x_1 + 20x_2\\ s.t. \ 2x_1 + x_2 + x_3 = 100\\ \quad x_1 + x_2 + x_4 = 80\\ \quad x_1 + x_5 = 40\\ x_1, x_2, x_3, x_4, x_5 \geq 0$
For more detailed algorithm of simplex method, please check course problem solving notes
Pivot operation

4.6 The simplex method - general algorithm description

The simplex method can be summarized in the following way:
Step 0: Start from a basic feasible solution $x^{(0)}$ . Let $k=0$ .
Step 1: Determine possible search directions and the reduced costs $\overline{c}_j$ for all non-basic variables.
Step 2: Check the convergence criterion: The point $x^{(k)}$ is an optimal solution if:
$\overline{c}_j \geq 0, \forall j \ \text{(minimization problem)}\\ \overline{c}_j \leq 0, \forall j \ \text{(maximization problem)}$
for all non-basic variables
Step 3: Determine the entering basic variable according to the criteria
$\overline{c}_p = \min_{j} \{\overline{c}_j | \overline{c}_j < 0 \} \ \text{(minimization problem)}\\ \overline{c}_p = \max_{j} \{\overline{c}_j | \overline{c}_j > 0 \} \ \text{(maximization problem)}$
which gives the entering basic variable $x_p$ and the search direction $d^{(k)}$
Step 4: Determine a step length by
$t^{(k)} = \frac{x_r^{(k)}}{-d_r^{(k)}}= \min_j \left ( \frac{x_j^{(k)}}{-d_j^{(k)}} | d_j^{(k)} < 0 \right )$
$x_r$ becomes the leaving basic variable. The step length is calculated in the same way for both minimization and maximization problems.
Note: If all components in $d^{(k)}$ are non-negative, the problem is unbounded.
Step 5: The new point is given by $x^{(k+1)}=x^{(k)}+t^{(k)}d^{(k}$ and in the new basic solution $x^r$ is replaced with $x_p$ . Update $k:=k+1$ and go to Step 1.

4.7 Using simplex tableaux

4.8 Algebraic description of the simplex method

See Lecture 4

4.9 Finding a basic feasible solution

Two-phase method
1. Phase 1 - If there is no feasible solution to the given set of constraints, introducing artificial variables . The goal of Phase-1 is to find a basic solution where no artificial variables are in the basis, i.e. they should all have value 0. We can try to reach that goal by minimizing anaritificial objective function defined as the sum of all artificial variables. . The interpretation of the objective function is that we try to minimize the total infeasibility of the solution.
  Two cases can occur:
  - If the optimal solution has the value $w^* = 0$ , we have reached the goal of finding a basic solution in which all artificial variables have the value 0. The resulting basic solution is optimal in the phase 1 problem and feasible in the original problem. Start phase 2
  - If the optimal solution has a value $w^* > 0$ . The conclusion is that the original problem has no feasible solution. What to do then? Modelling errors? Or infeasible from the data, demand > supply?
2. Phase 2

4.10 Convergence and degeneration

Definition 4.6: A basic solution is degenerated if (at least) one basic variable has the value 0.

Lecture 6 - Duality in linear optimization models (Chapter 6 (7.3, 7.5))

6.1 Deriving and interpreting the dual problem

There are two different ways of deriving the dual problem (maximization problem)
1. Formulating an LP problem that determines how the combination of contraints should be carried out in order to get the best possible estimation of upper bound of $z^*$
  Defining the following variables:
  $v_1$ = weight of constraint 1
  $v_2$ = weight of constraint 2
  Do better than guess, compute optimal weights $\Rightarrow$ minimize the value of the estimate $\min w=3600v_1 + 5400 v_2$
2. Formulating an LP problem that determines the prices which will be set on the market is called the market problem.
  Defining the following variables:
  $v_1$ = price per minute for sawing
  $v_2$ = price per minute for gluing
  The market wishes to minimize the payment $\min w=3600v_1 + 5400 v_2$
Primal problem <=> Dual problem
One can always interpret the dual variables as shadow prices or marginal prices as we have seen in connection with the discussion about sensitivity analysis.

6.2 Formulation of the dual problem

a variable is in normal form if it is defined as non-negative ( $x_j \geq 0$ and $v_i \geq 0$ respectively)
A constraint is in normal form if it is a $\leq$ -constraint in a maximization problem and a $\geq$ -constraint in a minimization problem, respectively.
If all variables and constraints in the primal problem are in normal form, then all constraints and variables in the dual problem will also be in normal form.

Rules for constructing the dual problem

6.3 Duality theory

Primal (P) in normal form:
$max \ z = c^T x\\ s.t. \ Ax \leq b\\ \quad x \geq 0$
Dual (D) in normal form:
$min \ w = b^T v\\ s.t. \ A^T v \geq c\\ \quad v \geq 0$
Theorem 6.1 (weak duality)
Assume $\hat{x}$ is a feasible solution to (P) and $\hat{v}$ is a feasible solution to (D). We then know that:
$c^T \hat{x} \leq b^T \hat{v}$
Theorem 6.2 - If $\hat{x}$ and $\hat{v}$ are feasible solutions in (P) and (D) respectively and $c^T \hat{x} = b^T \hat{v}$ , we know that $\hat{x}$ and $\hat{v}$ are optimal solution to respective problem.
Theorem 6.3 (Strong duality)
If either (P) or (D) has a bounded optimal solution, the same holds for the other problem, and the two objective function values are equal to optimum, i.e.
$c^T \hat{x} = b^T \hat{v}$
Alternative formulation of problem (P) which is:
$max \ z = c_B^T x_B + c_N^T x _N\\ s.t. \ Bx_B + Nx_N = b\\ \quad x_B, x_N \geq 0$
The corresponding dual problem (D) can be formulated:
$min \ w = b^T v\\ s.t. \ B^T v \geq c_B\\ \quad N^T v \geq c_N\\ \quad v \ free$
Theorem 6.4 (Dual theorem)
Assume that $x_B = B^{-1} b$ is an optimal basic solution to (P). Then $\overline{v}^T = c_B^T B^{-1}$ is an optimal solution to (D) and $z^* = w^*$
If a constraint is not active in optimum, the corresponding shadow price will have the value zero.
The general relation between a primal constraint $i$ and the corresponding dual variable $v_i$ is given by the complementary slackness constraint:
$v_i (b_i - \sum_{j=1}^n a_{ij} x_j) = 0$
either the dual variable has value zero, or the slack variable has value zero (the constraint is active). It can of course also be the case that both variables in a pair have the value zero.
For the relation between a primal variable $x_j$ and the corresponding dual constraint $j$ , we can formulate a complementary slackness constraint according to:
$x_j (\sum_{i=1}^m a_{ij} v_i -c_j) = 0$
saying that either the primal variable has value zero or the dual variable has value zero.
Theorem 6.5 (Complementary slackness theorem)
Assume $\hat{x}$ and $\hat{v}$ are feasible solutions in (P) and (D) respectively. Then $\hat{x}$ and $\hat{v}$ are optimal solutions to respective problem if and only if the complementary slackness constraints
$\hat{v}^T(b-A \hat{x}) = 0 \quad and \quad \hat{x}^T (A^T \hat{v} - c) = 0$
are satisfied.

Relations between primal and dual optimal solutions

Lecture 7 - Linear optimization: post-optimal and sensitivity analysis (Chapter 5.1–5.5, (5.6))

5.1 Sensitivity analysis and an example

Sensitivity analysis consists of two aspects:
1. How does the optimum change when the right hand sides change?
2. When the objective coefficients change?
To what extent one can draw correct conclusions from the optimization if the input data to the model is uncertain.
To investigate how sensitive the optimal solution is for changes in input data and other assumptions, i.e. to be able to analyze how robust the solution is.
By "change" we mean both an increase and a decrease of the coefficient.

5.2 Relaxation and restriction

Definition 5.1: A relaxation is a modification of a problem instance that leads to a larger feasible set (increases the number of fesible solutions) or alternatively that the feasible set remains unchanged.
If a problem is relaxed, the objective function value can never be worse. In a max problem, the objective function value will increase while in a min problem, the value will decrease.
Definition 5.2: A restriction is a modification of a problem instance that leads to a smaller feasible set (decreases the number of feasible solutions) or alternatively that the feasible set remains unchanged.
Examples of relaxation and restriction
1. The change in a right-hand-side coefficient
Constraint type Increase of $b_i$ Decrease of $b_i$

$\leq$ Relaxation Restriction

$\geq$ Restriction Relaxation
1. The change in a constraint coefficient $a_{ij}$ in the left-hand-side of a constraint
Constraint type Increase of $a_{ij}$ Decrease of $a_{ij}$

$\leq$ Restriction Relaxation

$\geq$ Relaxation Restriction
1. The change in objective function coefficient
Problem type Increase of $c_{j}$ Decrease of $c_{j}$

Max problem $z^*$ unchanged or larger $z^*$ unchanged or lower

$\geq$ $z^*$ unchanged or larger $z^*$ unchanged or lower
1. Add or remove a constraint or a variable
  Adding a constraint: Restriction
  Removing a constraint: Relaxation
  Adding a variable: Relaxation
  Removing a variable: Restriction

Constraint type	Increase of $b_i$	Decrease of $b_i$
$\leq$	Relaxation	Restriction
$\geq$	Restriction	Relaxation

Constraint type	Increase of $a_{ij}$	Decrease of $a_{ij}$
$\leq$	Restriction	Relaxation
$\geq$	Relaxation	Restriction

Problem type	Increase of $c_{j}$	Decrease of $c_{j}$
Max problem	$z^*$ unchanged or larger	$z^*$ unchanged or lower
$\geq$	$z^*$ unchanged or larger	$z^*$ unchanged or lower

5.3 Shadow prices and the connection to duality

Definition 5.3: The shadow price for a constraint is given by the change in objective function value when making a marginal increase of the right-hand-side.
Note: Other names of shadow prices are dual price, dual value or marginal price
Sign restrictions of dual variable $v_i$

Primal problem $\leq$ -constraint $\geq$ -constraint =-constraint

Min problem $v_i \leq 0$ $v_i \geq 0$ $v_i$ free

Max problem $v_i \geq 0$ $v_i \leq 0$ $v_i$ free
Only constraints active in the optimum can affect the optimal objkective function value if we change the right-hand-side. The value of dual variables corresponding to non-active constraints are thus zero in the optimum.
Shadow price $v_i$ is actually a derivative $v_i = \frac{\partial z}{\partial b_i}$
The shadow price is only constant in a certain neighbourhood of the original value of the right-hand-side coefficient and it is very important to calculate the interval in which the shadow price is valid (unchanged).
Figure 1: The effect of changing the right-hand-side, showing the relation between $z^*$ and $b_i$
Figure 2: The effect of a change in the objective function, showing the relation between $z^*$ and $c_j$

Primal problem	$\leq$ -constraint	$\geq$ -constraint	=-constraint
Min problem	$v_i \leq 0$	$v_i \geq 0$	$v_i$ free
Max problem	$v_i \geq 0$	$v_i \leq 0$	$v_i$ free

5.4 Interpretation of output froma software package

Some examples of analysis

5.5 Algebraic analysis of changes

Changing a right-hand-side coefficient $b$ , only the problem feasibility is affected, not the reduced costs.

Changes in the right-hand-side coefficients
Changing an objective function coefficient $c_B$ an $c_N$ , the reduced costs are affected, but not the feasible region. Thus, the current basic solution will be still feasible.

Changes in the objective coefficients
New variables
New constraints
More about sensitivity analysis in linear optimization
1. The current optimal solution may become infeasible due to
  1. Change in the right-hand-side, or
  2. Addition of constraints $\Rightarrow$ new basis matrix: $(m+1) \times (m+1)$
2. The current optimal solution may become non-optimal due to
  1. Changes in the objective coefficients, or
  2. Addition of variables (new opportunities may be favourable)
3. Several coefficients may change simultaneously $\Rightarrow$ may affect both feasibility and optimality
4. Constraint coefficients (i.e., elements of $A$ ) may change, which may change the basis matrix, and then also $B^{-1}$ $\Rightarrow$ may affect both feasibility and optimality.

Lecture 8 - Integer linear optimization: modelling, applications, complexity (Chapter 13, 2.6)

13.1 What is an integer programming problem

An integer programming problem is a problem where one or several variables are restricted to integer values (discrete variables)
LP relaxation of the integer programming problem
Many integer programming problems have an underlying network structure and can also be formulated as network problems.
Integer linear programming (ILP) uses integer variables ( $x_{ij} \in \mathbb{Z}$ ). An ILP (or MILP) possesses linear constraints and integer requirements on the variables

13.2 Knapsack problems

Objective function $c_j$ benefit of choosing object $j, j=1,...,n$
Budge constraint

13.3 Facility location

13.4 Network design

13.5 Assignment problem

Assign each task to one resource, and each resource to one task

13.6 Generalized assignment problem

13.7 Matching problem

13.8 Set partitioning, set covering and set packing problems

Mathematical formulation
$\begin{aligned} &\min &c^T x\\ &\text{s.t.} &Ax \geq 1\\ & &x \in \{0,1\} \end{aligned}$

13.9 Sequencing problem

13.10 Traveling salesman problem

13.11 Route planning

2.6 Basic complexity theory

Algorithm complexity: Estimating how the computational time increases as the problem size increases.
Problem complexity: Classifying how easy or difficult various problem classes are to solve.

2.6.1 Algorithm complexity

A problem means an optimization problem here.
The number of computations in an algorithm is proportional to the computation time.
For an LP problem, the size can be expressed by the number of variables and the number of equations in the problem together with a parameter that states how many bits are required to represent all numerical input data.
Ordo function O to express an upper cound of the number of steps. For two functions $f(t)$ and $g(t)$ , we say that $f(t)=O(g(t))$ if there exists a cosntant $c$ such that for a large $t$ we have $f(t) \leq cg(t)$ . If the function $g(t)$ is a polynomial function of the problem size $t$ we say that the algorithm has a polynomial complexity. This type of problems are regarded as "easy" to solve. If the function $g(t)$ is a exponential function of the problem size $t$ (e.g., $2^t$ ) we say that the algorithm has an exponential complexity.

2.6.2 Problem complexity

When we analyze and classify optimization problem with respect to their problem complexity we study a transformation of the problem which is called decision problem.
Optimization problems whose decision problem can be solved with a polynomial algorithm are called "easy" problem and this class of problem is denoted $P$ .
A larger class of problems, which includes the class $P$ is denoted $NP$ from "nondeterministic polynomial".
The most difficult decision problem in the class $NP$ are called $NP$ complete (also called $NP$ hard problem).

Lecture 9 - Integer linear optimization: relaxation, bounds, branch & bound algorithm (Chapter 14.1-14.3, 15.1-15.3)

14.1 Overview of methods

There is no similar property like in the simplex method in IP problem
IP problem is non-convex, and it is not possible to compute a derivative and hence no gradient. It is also difficult to conlcude if a solution that is a local optimal solution is also a global optimal solution
The most important strategies for solving IP problems are:
1. Enumeration methods (implicit enumeration, Branch and Bound methods)
2. Relaxation and decomposition methods (LP relaxation and Lagrangian relaxation)
3. Cutting plane methods (Repeatedly solving the LP relaxation of the IP problem)
4. Heuristics (a class )
The difficulty to solve IP problem depends on two main factors: the number of integer variables and the problem structure.
For LP problem, the number of constraints is much more important for the solution time.

14.2 Optimality and relaxations

For LP problem and nonlinear problems it is possible to formulate optimality conditions (KKT conditions). These conditions together with convexity analysis can be used to identify and prove that a solution is a local or global optimal solution. However, for IP problems there are no corresponding optimiality conditions so other techniques are needed.
In a minimization problem,
1. The optimistic bound (also called dual bounds) provides a so-called lower bound (LBD) () from a relaxation of ILP. Relaxation examples are:
  1. Remove the integer constraints and solve the LP relaxation.
  2. Make the feasible region larger by removing one or several of the constraints. (constraint relaxation or combinatorial relaxation)
  3. Make a Lagrangian relaxation, which removes a set of constraints and at the same time changes the objective function by introducing corresponding terms based on Lagrangian multipliers.
2. The pessimistic bound (also called primal bound) provides an upper bounds (UBD) ( $\overline{z} \geq z^*$ ) from a feasible solution to ILP.
The following properties are valid:
1. A relaxation always provides a better (or equally good) objetive function value.
2. If a relaxation of the IP problem has no feasible solution, then problem IP has no feasible solution.
3. If $x_R^*$ is a feasible solution to IP, then we have $x_{IP}^*=x_{R}^*$ and $z_{IP}^*=z_{R}^*$ . (May not be true with Lagrangian relaxation)
4. Each feasible solution $\overline{x}$ that is found to IP, by modifying $x_R^*$ , provides a pessimistic bound of $z_{IP}^*$ .

14.3 To choose the right model

Many solution methods for integer programming are based on solving a sequence of LP problems whose optimal objective function values gradually reach the optimal objective function value for the IP problem. If we can use a strong formulation, we typically can make such methods more effective.
Definition 14.1: A point $y$ is a convex combination of a set of points $x^{k}, \ k \in K$ if $y=\sum_{k \in K} \lambda_k x^{(k)}$ , where $\sum_{k \in K} \lambda_k = 1$ and $\lambda_k \geq 0, \ \forall k \in K$ .
Definition 14.2: The convex hull of a set of points $x^{(k)}, \ k \in K$ , consists of all possible convex combinations of the points.
Note: Convex hull is an area where feasible integer points are located
The smaller the difference is between the optimal objective function value to the IP problem and the corresponding optimal solution to the LP relaxation, the better (or stronger) model we have.
The ideal linear program has integer extreme points.
In general, it is not computationally practical to find an ideal formulation (the convex hull) of an IP problem. However, the difference btween the feasible region (in particular near the optimal solution) and the convex hull is minimized by two methods:
1. Initially choose a strong formulation with respect to the LP relaxation of the problem
2. Adding problem specific valid inequalities

15.1 Introduction

Branch and Bound (B&B), optimistic bound of the optimal objective function value and a pessimistic bound of the optimal objective function value are generated.
Search tree (or B&B tree) consists of nodes representing subproblems and edges representing the new constraints added to define new subproblems.

15.2 General strategy for branch and bound

Relax
The optimal solution of a relaxed version of the problem provides an optimistic bound of the optimal objective function value. Two alternatives to relax the original problem:
1. LP relaxation
2. Lagrangian relaxation
Branch
Exclude the solution to a relaxation if it is not feasible in ILP $\Leftrightarrow$ a partitioning of the feasible set.
Prune (修剪)
Search
1. Depth first
2. Breadth first
3. Best first
A termination criterion that expresses the relative deviation between the best pessimistic bound (UBD) and the best optimistic bound (LBD) can be defined as:
$\frac{UBD-LBD}{LBD} < \epsilon$
where a typical parameter value of $\epsilon$ is 0.01.

15.3 The Land-Doig-Dakin algoroithm

The most common relaxation using branch and bound is LP relaxation, solve this LP relaxation problem first.
Land-Doig-Dakin algorithm is a branch and bound algorithm based on LP relaxation

Lecture 10 - Integer linear optimization theory and algorthms: good/ideal formulation, cutting planes, lagrangean duality and relaxation (Chapter 14.4-14.5, 17.1-17.2, (17.3-17.4), 13.10-13.11, 15.4, (15.5))

14.4 Valid inequalities

Definition 14.3: A vaid inequality is a linear inequality
$\sum_{j=1}^{n} a_j x_j \leq b$
that is satisfied for all $x \in X_{IP}$ . If we add valid inequalities to the LP relaxation of an IP problem, we will get a better approximation of the convex hull.
Defnition 14.4 : if $\sum_{j=1}^{n} a_j x_j \leq b$ is a valid inequality to $X_{IP}$ , then the set
$F = \{ x:x \in X_C, \sum_{j=1}^n a_j x_j = b\}$
is the face of $X_C$ created by the valid inequality.
Definition 14.5: A face F of a convex hull $X_C$ with dimension dim( $X_C$ )=n, is called a facet if dim(F)=n-1.

14.5 Cutting plane methods

A general cutting plane algorithm (p378)
1. Solve the linear programming (LP) relaxation
2. If the LP solution is integer: stop; an optimal solution to the ILP is found
3. Add one or several valid inequalities that cut off the fractional solution but bone of the integer solutions
4. Resolve the new problem and go to step 2
The cuts (or constraints) added in a cutting plane method must be valid inequalities. Otherwise, one or several feasible integer points may be cut away.
A valid inequality must cut away parts of the feasible region to the LP relaxation (hence the name cutting plane)
Often only one valid inequality is added in each iteration (after solving each LP relaxation) in cutting plane methods.
Methods for generating valid inequalities
1. Chvatal-Gomory cuts (combine constraints, make beneficial roundings of LHS and RHS)
2. Gomory's method: generate cut from an optimal simplex basis (Ch. 14.5.1)
Pure cutting plane algorithms are usually less efficient than branch-&-bound
In commercial solvers (e.g. CPLEX), cuts are used to help (presolve) the branch-&bound algorithm
For problems with specific structures (e.g. TSP and set covering) problem specific classes of cuts are used.

14.5.1 Gomory's method

Gomory's method was one of the first cutting plane methods developed (Gomory's method $\in$ cutting plan methods).
One requirement needed for generating cuts is that all coefficients in the problem are integers.
The equation where $x_i$ is a basic variable, $N$ is the set of non-basic variables and where the right hand side $\overline{b_i}$ is non-integer:
$x_i + \sum_{j \in N} \overline{a_{ij} x_j} = \overline{b_i}$
which can be rewritten as:
$x_i - \lfloor \overline{b_i} \rfloor + \sum_{j \in N} \lfloor \overline{a_{ij}} \rfloor x_j = f_i - \sum_{j \in N} f_{ij} x_j$
The constraint is a Gomory cut and it is called the integer cut since it is expressed from the integer parts in equation above:
$x_i + \sum_{j \in N} \lfloor \overline{a_{ij}} \rfloor x_j \leq \lfloor \overline{b_{i}} \rfloor$
The constraint is the fractional cut:
$f_i - \sum_{j \in N} f_{ij} x_j \leq 0$

14.5.2 Problem specific cuts: cover inequalities

Definition 14.6: The set $S$ is a cover if $\sum_{j \in S} a_j > b$ . The set $S$ is also a minimal cover if for each selection of $k \in S$ , we also have that $S \setminus k$ is not a cover, i.e. $\sum_{j \in S} a_j - a_k \leq b$
Theorem 14.1: If $S$ is a minimal cover, then the constraint $\sum_{j \in S} x_j \leq |S| - 1$ is a valid inequality for $X$ .
Theorem 14.2: If $S$ is a cover of $X$ , then the augmented cover constraint
$\sum_{j \in E(S)} x_j \leq |S| - 1$
is a valid inequality if $E(S)= S \cup \{ j: a_j \geq a_i, \forall i \in S\}$
Theorem 14.3: If $\alpha \geq 1$ , then $x_{LP}$ satisfies all cover inequalities.
If $\alpha < 1$ , then the inequality
$\sum_{j: z_j^*=1} x_j \leq \sum_j z_j^* -1$
cuts away the fractional solution $x_{LP}$ with the proportion $1-\alpha$ . We have got a measure of the strength of an inequality.

17.1 Lagrangian duality

General formulation of an optimization problem
$min \ f(x)\\ s.t. \ g_i(x) \geq b_i, \ i=1,...,m$
The functions $f(x)$ and $g_i(x)$ , i=1,...,m are general nonlinear funcitons and may also be non-convex functions.
The Lagrangian function for problem is:
$L(x,v) = f(x) + \sum_{i=1}^m v_i (b_i-g_i(x))$
where $v_i \geq 0, i = 1,..., m$ are Lagrangian multipliers (or dual variables)
(Lagrange) dual function:
$h(v) = \min_x L(x,v)$
(Lagrange) dual problem:
$\max_{v \geq 0} h(v)$
Theorem 17.1 (Weak duality)
Suppose that $x^*$ is an optimal solution to (P). For each $v \geq 0$ , we have $h(v) \leq f(x^*)$ , duality gap
Theorem 17.2: The dual function $h(v)$ is concave.

17.2 Lagrangian relaxation

If the original problem is convex, we can guarantee to find solution $x^*$ and $v^*$ where $h(v^*) = f(x^*)$
With non-convex problem, we often have duality gap $h(v^*) < f(x^*)$
In minimization problem, Lagrangian relaxation provides optimistic estimate of $z^*$ . The Lagrangian relaxation bound is never worse than the linear programming relaxation bound $z^{LP} \leq h^* \leq z^*$

17.4 Applications

There are two ways to generate an optimistic bound of the optimal objective function value.
1. Relax the integer requirements and solve the LP relaxation.
2. Relax one of the linear constraint by using Lagrangian relaxation.
The general experience of solving integer programming problems is that the optimistic bound obtained from Lagrangian relaxation is stronger than the bound from the LP relaxation.
$z_{LP}^* \leq h(v^*) \leq z^*$
Where $z_{LP}^*$ is the optimal objective function value to the LP relaxation.
If the set $X$ has the integrality property (i.e., $X^{LP}$ has integral extreme points), then $h^*=z^{LP}$

13.10 Traveling salesman problem (TSP)

Many applications can be formulated as a traveling salesman problem
The problem is easy to model but difficult to solve
Two types of problem:
1. Symmetric version of the problem: only one arc is needed.
2. Asymmetric version of the problem: the arc cost between two nodes depends on the direction

13.11 Route planning

Vehicle Routing Problem (VRP)

15.4 Branch and bound methods for structured problems

15.4.1 Land-Doig-Dakin for the knapsack problem

15.4.2 Branch and bound for the traveling salesman problem

15.4.3 Branch and bound for the job scheduling problem

Lecture 11 - Integer linear optimization: (non-) convexity, local/global optimality, heuristics and approximation algorithms (Chapter 16, 8.3)

Local vs. global optima

Non-convex optimization problems do not fulfull strong duality $\Rightarrow$ (efficient) algorithms cannot generally be constructed to fulfill optimality conditions

16.1 Introduction

Each heuristic is adapted to fit the specific problem structure.
Heuristics can also be used as part of advanced methods to find feasible solutions. Examples are branch and bound, Lagrangian relaxation.

16.2 Combinatorial formulations

16.3 Constructive heuristics

Greedy algorithms

16.3.1 Heuristics for the traveling salesman problem

Nearest neighbor algorithm
Nearest addition algorithm
Nearest merger algorithm

16.3.2 Heuristics for matching problems

Greedy matching algorithm

16.3.3 Heuristics for the set covering problem

Greedy covering algorithm

16.3.4 Heuristics for the vehicle routing problem

Clarke & Wright algorithm
Sweep algorithm

16.4 Local search

Defining a neighbourhood $N(x)$ which consists of all solutions in some sense are "close" to x.
Each solution $y \in N(x)$ is called a nighbour to x.

16.5 Metaheuristics

16.5.1 Tabu search

16.5.2 Simulated annealing

16.5.3 Metaheuristics and infeasible solutions

16.6 Approximation algorithms

16.6.1 Spanning tree heuristic for the traveling salesman problem

16.6.2 Christofides'heuristic traveling salesman problem

8.3 Minimum spanning tree

Select a subset of the arcs in the graph so that they constitute a tree spanning all nodes and where the sum of the arc costs in the tree is minimal.
The optimality conditions for the problem can be formulated as:
A tree is a minimum spanning tree if
1. for every arc $(i,j) \in \overline{T}$ we have that $c_{ij} \geq c_{kl}$ for all arcs $(k,l)$ included in the unique path between $i$ and $j$ .
2. for every arc $(i,j) \in T$ we have that $c_{ij} \leq c_{kl}$ for all arcs $(k,l)$ included in the cut $[S. \overline{S}]$ obtained if arc $(i,j)$ is removed.
The cardinality of the selected subset of arcs should be $n-1$ , where $n$ is the number of nodes in the graph.
Two algorithms for determining a minimum spanning tree:
1. Kruskal's algorithm is based on the first group of conditions
2. Prim's algorithm is based on the second group of conditions
Kruskal's and Prim's algorithm both belong to a type of algorithm known as greedy algorithms.

Lecture 12 - Network flows: linear optimization formulations (Chapter 8.1-8.2, 8.4, (8.5), 18.1-18.5, (18.6-18.7), 13.5)

8.1 Network optimization problem preview

A network consists of a network consisting of nodes and arcs.
Network flow problems can be divided into two classes:
1. Minimum cost network flow problem (LP problems and considered to be easy to solve. Our focus in this Lecture)
2. Combinatorial problem

8.2 Basic concepts

A graph $G=(N,B)$ is defined by a set of nodes $N=\{ n_1, n_2, ...\}$ , a set of arcs $B=\{b_1, b_2, ...\}$ and a relation (mapping) between the nodes and arcs.
A direct arc only allows flow in the specified direction. An undirected arc (also called link) allows flows in both directions and can be interpreted as two (opposite) directed arcs.
Undirected graph: Only need the node numbers
Directed graph: with $m$ nodes and $n$ arcs, we can define a so called $node-arc incidence matrix A$ . each row in the matrix corresponds to one node, and each column corresponds to one arc and includes exactly two non-zero elements; one -1 and one +1.
Node has in-degree and out-degree
A graph is bipartite if we can partition the node set into two subsets $N_1$ and $N_2$ such that for each arc $b_j=(n_i, n_k) \in B$ either $n_i \in N_1$ and $n_k \in N_2$ or $n_i \in N_2$ and $n_k \in N_1$ .
A $cut$ is a partition of the node set into two subsets, $S$ and $\overline{S} = N \backslash S$
A path (sometimes called route) in a directed graph is a connected sequence of arcs from one node to another, where the starting node of each arc is the termination node for the preceding node in the sequence.
If we disregard the direction of the arcs, we talk about a chain.
A flow problem is defined by specifying for each node $n_i$ the total net flow to/from the node. The net flow (also called node strength) is defined as
$d_i = inflow - outflow = \sum_{j:T(b_j)=n_i} x_j - \sum_{j:S(b_j)=n_i} x_j$
If $d_i$ is negative, we say that node $n_i$ is a source (or demand node) and if $d_i$ is positive we say that node $n_i$ is a sink (or supply node). A node where $d_i=0$ is called a transhipment node.

8.4 Shortest path problems

8.4.1 Problem description and Bellman's equations

When solving a shortest path problem we assume that our network satisfies the following conditions:
1. All arcs are directed
2. Node $n_t$ can be reached from node $n_s$
3. There are no cycles with negative costs
Bellman's equations:
$y_s = 0\\ y_j = \min_{i:(i,j) \in B} \{y_i + c_{ij}\}, j = 2,..., n$

8.4.2 An algorithm for the shortest-path problem

Theorem 8.1: Dijkstra's algorithm will give an optimal solution to the problem of finding a shortest path in a network with non-negative arc costs only.

8.4.3 Shortest path - all to all

Algorithm - Floyd-Warshall

8.4.4 Other types of shortest path problems

Longest path
Multiplicative objective function
Path with maximal capacity

18.1 Introduction

Dynamic programming is a solution strategy which can be used to solve many structured optimization problems.
An optimal decision is often called a control and the variables are called control variables. Given the current state in one stage, the decision will give a new state in the next stage.

18.2 An initial example

The shortest path can be computed sequentially (recursively) with respect to defined stages as follows.
(backpropagation)

18.3 Network interpretation and Bellman's equations

The cost on which we base the decision has two components:
1. The first component is the arc cost to go from a node in the current stage to a new node in the next stage
2. The second component is the remaining (best) cost to continue to the end node.
The last stage computed provides the overall best objective function value.
Principle of optimality: Given the current state, the optimal decision for each of the remaining stages must not depend on previously reached states or previously chosen decisions.

18.4 General formulation

The total cost is given by the recursion function denoted $f_t(s_t)$
$\begin{aligned} f_t(s_t) &=\min \sum_{i=t}^T c_i(s_i, x_i)\\ &=\min_{x_t}[c_t(s_t,x_t)+\sum_{i=t+1}^T c_i(s_i,x_i)]\\ &=\min_{x_t}[c_t(s_t,x_t)+f_{t+1}(s_{t+1}) ] \end{aligned}$
The recursion relation corresponds to Bellman's equation in a standard shortest path problem. The first term is the arc cost and the second is the node price.
Backward recursion and forward recursion are equivalent, both provide the optimal solution. The difference is the side information:
1. Backward recursion offers the optimal way to continue from each possible state.
2. Forward recursion offers the optimal way to get to each state.

18.5 Formulation of the shortest path problem

Formulation (backward recursion) for Example 18.1
Stages: $t=1,2,3,4$
Control variables: $x_t$ =city which we travel to in stage $t$
States: $s_t$ =city we start from in stage $t$
Transition function: $s_{t+1}=x_t$
Recursion function: $f_t(s_t)$ =shortest path from city $s_t$ to LA
Recursion relation: $f_t(s_t)=\min_{x_t}\{f_{t+1}(s_{t+1})+c_t(s_t,x_t)\}$ , where $c_t(s_t,x_t)$ gives the distance between city $s_t$ and city $x_t$ . Search $f_1(s_1)$
Initial conditions: $s_1$ =NY, $f_5(s_5)=0$
Restrictions: $x_1,s_2 \in \{Co, Na, Lo\}$
$x_2, s_3 \in \{KC, Om, Da\}$
$x_3, s_4 \in {De, SA}$

13.5 Assignment problems

Assignment problem is a special case of the transportation problem. Hence it can be classified as a network flow problem.
Define the variables as:
$x_{ij} = 1, \text{if object $i \in I$ is assigned object $j \in J$}\\ x_{ij} = 0, \text{otherwise}$
The model can be formulated as:
$\begin{aligned} &\min &\quad z = &\sum_{i \in I} \sum_{j \in J} c_{ij} x_{ij}\\ &\text{s.t.} &\quad &\sum_{j \in J}x_{ij} = 1, i \in I\\ & &\quad &\sum_{i \in I}x_{ij} = 1, j \in J\\ & &\quad &x_{ij} \in \{ 0,1 \}, i \in I, j \in J. \end{aligned}$

Lecture 13 - Linear optimization formulations and algorithms for minimum cost network flows (Chapter 8.6-8.7)

8.6 Minimum cost network flow problems

To determine a flow with the minimum cost in a network, where
1. supply and demand (strengths) of the nodes have to be satisfied
2. Fufil the capacity constraints of the arcs.
Interger properties of MCFP

8.6.1 Examples of models

A network flow problem
Maximum flow problem
Time dynamic networks

8.6.2 LP formulation

Given a network with $m$ nodes and $n$ arcs, where $c_j, l_j$ and $u_j$ denote cost, lower bound and upper bound on the flows on arc $j$ and where $d_i$ denotes the net flow to/from node $i$ , we can formulate a minimum cost network flow problem as:
$\begin{aligned} &\min &\quad z = & \sum_{j=1}^n c_j x_j \\ &\text{s.t.} &\quad &\sum_{j=1}^n a_{ij} x_j = d_i, i = 1,...,m\\ & &\quad &l_j \leq x_j \leq u_j, j=1,..., n \end{aligned}$

8.6.3 Integer properties

Integer property holds for for the minimum cost network flow problem. We study a minimum cost network flow problem stated as:
$\begin{aligned} &\min &\quad z=&c^T x\\ &\text{s.t.} &\quad &Ax=b\\ & &\quad &l \leq x \leq u \end{aligned}$
Theorem 8.2: If a minimum cost network flow problem has integer bounds $b,l$ and $u$ , all extreme points of the feasible region are integer.
Definition 8.1: A matrix is totally unimodular if each quadratic sub-matrix has a determinant with value -1, 0, +1.
Theorem 8.3: The incidence matrix (关联矩阵 from graph theory)

8.6.4 Extensions of the minimum cost network flow problem

Network with several commodities
Network with gains and losses

8.7 The simplex method for network flow problems

MCFP problems can be solved with a special version of the standard simplex method for LP problems.
In a minimum cost network flow problem we may have both lower and upper bounds of the variables (flows) and the simplex method for network flow problem takes these into account.

8.7.1 Optimality conditions

Let us study the following LP formulation of the minimum cost network flow problem
$\begin{aligned} &\min &\quad z=&\sum_{(i,j) \in B} c_{ij} x_{ij}\\ &\text{s.t.} &\quad &\sum_{k:(k,i) \in B} x_{ki} - \sum_{j:(i,j) \in B} x_{ij} = b_i, i \in N\\ & &\quad &l_{ij} \leq x_{ij} \leq u_{ij}, (i,j) \in B \end{aligned}$
Summary of the optimality conditions
A solution $x_{ij}$ is an optimal solution to a minimum cost network flow problem is the following relations are satisfied (two alternative sets of conditions are provided)

$\overline{c}_{ij}=0 \to l_{ij} \leq x_{ij} \leq u_{ij}\\ \overline{c}_{ij}<0 \to x_{ij}=u_{ij}\\ \overline{c}_{ij}>0 \to x_{ij}=l_{ij}$

$l_{ij} < x_{ij} <u_{ij} \to \overline{c}_{ij}=0\\ x_{ij}=l_{ij} \to \overline{c}_{ij} \geq 0\\ x_{ij}=u_{ij} \to \overline{c}_{ij} \leq 0$
where $\overline{c}_{ij}$ is the reduced cost for arc $\{i,j\}$

8.7.2 Solution principle

8.7.3 Algorithm - Simplex method for the network flow problem

8.7.4 Determining a basic feasible solution

When solving a minimum cost network flow problem with the simplex method we require a basic feasible solution, i.e. a solution that satisfies the node balance constraints and the capacity constraints.

Lecture 14 - Overview of nonlinear optimization (Chapter 2.5.1, 9-12)

2.5.1 Search method for continuous problems

For continuous problems (linear programming problems and nonlinear problem), the sequence of points is determined by the general formula:
$x^{(k+1)} = x^{(k)} + t^{(k)} d^{(k)}$
where $x^{(k)}$ and $x^{(k+1)}$ are the current and next points, $d^{(k)}$ is a search direction and $t^{(k)}$
is a step length.
The search direction should be:
1. A feasible search direction
2. An improving search direction

9 Introduction to nonlinear optimization

Separating nonlinear optimization problems by with and without constraints:
1. unconstrained optimization
2. constrained optimization
If the problem has an quadratic objective function and linear constraints, we call it a quadratic programming problem.
Separating nonlinear optimization problems by convex and nonconvex problem.

9.1 Examples of nonlinear models

Least square problem
Two dimensional cutting pattern
Construction of a bar-truss structure

9.2 Approximations of functions

First and second order approximations based on a Taylor expansion.
Other equivalent names for these approximations are linear and quadratic approximations.
Definition 9.1: The gradient $\nabla f$ is a vector consisting of all partial derivatives to the function $f(x)$ and can be expressed as
$\nabla f(x_1,...,x_n) = \begin{pmatrix} \frac{\partial f} {\partial x_1}\\ \frac{\partial f} {\partial x_2}\\ \vdots\\ \frac{\partial f} {\partial x_n} \end{pmatrix}$
Definition 9.2: The Hessian H ( $\nabla^2 f$ ) is a quadratic matrix consisting of all partial second order derivatives to the function $f(x)$ and can be expressed as
$H(x_1,...,x_n)= \begin{pmatrix} \frac{\partial^2 f} {\partial x_1^2} & \frac{\partial^2 f} {\partial x_1 \partial x_2} & \cdots & \frac{\partial^2 f} {\partial x_1 \partial x_n}\\ \frac{\partial^2 f} {\partial x_2 \partial x_1} & \frac{\partial^2 f} {\partial^2 x_2} & \cdots & \frac{\partial^2 f} {\partial x_2 \partial x_n}\\ \vdots & \vdots & & \vdots\\ \frac{\partial^2 f} {\partial x_n \partial x_1} & \frac{\partial^2 f} {\partial x_n \partial x_2} & \cdots & \frac{\partial^2 f} {\partial x^2_n} \end{pmatrix}$
Definition 9.3: A first order Taylor approximation of the function of the function $f(x)$ around the point $x^{(k)}$ is given by
$f_1(x) = f(x^{(k)}) + \nabla f(x^{(k)})^T (x-x^{(k)})$
Definition 9.4: A second order Taylor approximation of the function $f(x)$ around the point $x^{(k)}$ is given by
$f_2(x) = f(x^{(k)}) + \nabla f(x^{(k)})^T (x-x^{(k)}) + \frac{1}{2} (x-x^{(k)})^T H(x^{(k)})(x-x^{(k)})$

9.3 Convex analysis

The convexity properties decide if we with certainty or not can announce that the solution we have found is the global optimum.
The problem
$\begin{aligned} &\min &\quad &f(x)\\ &\text{s.t.} &\quad &g_i(x) \leq b_i, i=1,...,m \end{aligned}$
is a convex problem if $f(x)$ is a convex function and if the feasible region $X = \{x | g_i(x) \leq b_i, i=1,...,m\}$ defined by the constraints $g_i(x) \leq b_i, i=1,...,m$ is a convex set.
Theorem 9.1: If the problem
$\begin{aligned} &\min &\quad & f(x)\\ &\text{s.t.} &\quad & x \in X \end{aligned}$
is convex, then each local minimum is also a global minimum.

Convexity of a set

Theorem 9.2: Let $X_1, X_2, ..., X_p$ be convex sets. Then the intersection
$X = X_1 \cap X_2 \cap ... \cap X_p$
is a convex set
Theorem 9.3: The set
$X = \{x | g(x) \leq b\}$
is a convex set if the function $g(x)$ is convex.

Convexity of a function

Theorem 9.4: If $f_1(x), f_2(x), ..., f_p(x)$ are convex functions and we have $\lambda_i \geq 0, i=1,...,p$ , then the function $f(x)=\sum_{i=1}^{p} \lambda_i f_i(x)$ is convex.
Definition 9.6: The quadratic matrix $H$ is positive definite (positive semi-definite) if $d^T H d > 0$ , for all vectors $d \neq 0 \ (d^T H d \geq 0, \forall d)$ .
Theorem 9.5: Suppose the function is twice differentiable defined on a convex set . Then we have:
1. $f(x)$ is a convex function on $X$ if the Hessian matrix $H$ is positive semi-definite for all $x \in X$
2. $f(x)$ is a strict convex function on $X$ if the Hessian matrix $H$ is positive definite for all $x \in X$
3. $f(x)$ is a concave function on $X$ if the Hessian matrix $H$ is negative semi-definite for all $x \in X$
4. $f(x)$ is a strict concave function on $X$ if the Hessian matrix $H$ is negative definite for all $x \in X$
  Note: if none of these hold, we have that $H$ is indefinite and the function is neither convex nor concave.
To determine all eigenvalues to the matrix $H$ by finding the roots to the so called characteristic equation to $H$ (i.e. to the equation $det(H - \lambda I) = 0$ )
Let $\lambda_1, \lambda_2, ..., \lambda_n$ denote the eigenvalues to the Hessian matrix $H$ .
Theorem 9.6:
If $\lambda_1 \geq 0, ..., \lambda_n \geq 0$ then $H$ is positive semi-definite
If $\lambda_1 > 0, ..., \lambda_n > 0$ then $H$ is positive definite
If $\lambda_1 \leq 0, ..., \lambda_n \leq 0$ then $H$ is negative semi-definite
If $\lambda_1 < 0, ..., \lambda_n < 0$ then $H$ is negative definite
Theorem 9.8:
Let $h(y)$ and $g(x)$ be convex functions, and let also $h(y)$ be a non-decreasing function. Then the compound function $f(x)=h(g(x))$ is convex.

10 Methods for unconstrained nonlinear optimization

In this chapter, we consider problems written as
$\min f(x) = f(x_1,..., x_n)$
i.e. nonlinear problems without constraints. It is assumed that the function $f(x)$ is differentiable and that there exists an optimal solution to the problem.
An optimal solution to an unconstrained problem is always a stationary point which may be a local minimum, local maximum or a saddle point.
Theorem 10.1: A necessary condition for the unconstrained problem min $f(x)$ to have an optimal solution in the point $x^*$ is that it is a stationary point, i.e. $\nabla f(x^*)=0$
Theorem 10.2: A sufficient condition for the unconstrained problem min $f(x)$ to have an optimal solution in the point $x^*$ is that it is a stationary point, and the function $f(x)$ is convex.
If the function is strictly convex (concave) there is only one stationary point. This point is hence both a local and global minimum (maximum).
Theorem 10.3: A sufficient condition for the unconstrained problem min $f(x)$ to have a local optimal solution in the point $x^*$ , is that it is a stationary point, and that the Hessian matrix in the point $x^*$ is positive definite.
With the first order Taylor series approximation of the function $f(x)$ in the point $x^{(k)}$ , we get the following function:
$f_1(x) = f(x^{(k)}) + \nabla f(x^{(k)})^T (x- x^{k}) = f(x^{(k)}) + \nabla f(x^{(k)})^T d^{(k)}$
Definition 10.1: A direction $d^{(k)}$ , such that $\nabla f(x^{(k)})^T d^{(k)} < 0$ is called a descent direction and is a direction in which
Definition 10.2: A directed $d^{(k)}$ such that $\nabla f(x^{(k)})^T d^{(k)} > 0$ is called an ascent direction and is a direction in which $f(x)$ increases.
Global convergence - "global" here means that the methods find a local minimum regardless of the (global) starting point.
In all these search methods, the step length is determined through an one-dimensional search (line search)

10.1 Steepest descent method

The search direction is computed only by using the gradient $\nabla f(x^{(k)})$ in the point $x^{(k)}$ . The gradient $\nabla f(x^{(k)})$ define the direction in which the function $f(x)$ increases most (is steepest) locally. In the same way, $-\nabla f(x^{(k)})$ defines the direction where it decreases most locally.

10.2 Newton methods

If we use a second order Taylor series approximation to determine a search direction, we get Newton's method and Newton's modified method. A second order Taylor series approximation of the function $f(x)$ around the point $x^{(k)}$ can be written as:
$f_2(x) = f(x^{(k)}) + \nabla f(x^{(k)})^T(x-x^{(k)}) + \frac{1}{2} (x-x^{(k)})^T H(x^{(k)}) (x-x^{(k)})$
The gradient for the approximate function is
$\nabla f_2(x) = \nabla f(x^{(k)}) + H(x^{(k)})(x-x^{(k)})$
We know from Theorem 10.1 that the optimal solution is found in a stationary point. We can therefore set $\nablaf_2(x)=0$ and express the search direction as
$d^{(k)} = x - x^{(k)} = -H^{-1} (x^{(k)}) \nabla f(x^{(k)})$
This search direction is called the Newton direction. The search direction is valid both for minimization and maximization problems. The interpretation of using the Newton direction is that we use second order information to rotate the gradient direction so that it points towards the stationary point of the approximate function.
If we always use the step length t^{(k)}=1$, i.e. no line search, we get Newton's method. If we use normal line search, we get Newton's modified method.

10.3 Extensions for Newton methods

A more general search method can be expressed as:
$\begin{aligned} x^{(k+1)} &= x^{(k)} + t^{(k)} d ^{(k)}\\ &= x^{(k)} - t^{(k)} M (x^{(k)}) \nabla f(x^{(k)})\\ &= x^{(k)} - t^{(k)} (H(x^{(k)}) + \lambda I)^{-1} \nabla f(x^{(k)}) \end{aligned} This modification is called *(Newton) Marquardt's modification*.$

10.4 One-dimensional optimization

A one-dimensional optimization problem is a problem with one variable.
Line search or one-dimensional search.
Definition 10.3: A function $f(x)$ to be minimized in the interval $[\alpha, \beta]$ and has a minimum in $x^*$ is unimodal on the interval if all $t_1, t_2 \in [\alpha, \beta]$ such that $t_1 < t_2$ , $f(t_1) \neq f(x^*)$ and $f(t_2) \neq f(x^*)$ has the properties
$t_2 < x^* \Rightarrow f(t_1) > f(t_2)\\ t_1 > x^* \Rightarrow f(t_1) < f(t_2)$
The interpretation of a unimodal function in minimization is that the function (for increasing value of $x$ ) is non-increasing followed by non-decreasing.
Note: a function $f(t)$ can be unimodal despite being non-convex. A convex function is always unimodal.
Golden section method
Bi-section method
Newton-Raphson;s method
Armijo's method

11 Optimality conditions for nonlinear problems

11.1 Geometric interpretation of the KKT conditions

Definition 11.1: A cone C which is spanned by the vectors h_1, ..., h_s consists of all vectors y that can be written as a non-negative linear combination of the vectors h_i, i = 1, ..., s, i.e.
$C = \{y|y=\sum_{i=1}^s \alpha_i h_i, \alpha_i \geq 0, i=1,...,s\}$
For $\overline{x}$ to be a local maximum in maximization problem, the gradient of the objective function must lie in the cone spanned by the gradients of the active constraints. In other words, we should be able to write $\nabla f(\overline{x})$ as a non-negative linear combination of $\nabla g_1(\overline{x})$ and $\nabla g_2{(\overline{x})}$
If we have a nonlinear problem with $m$ inequalities we can generalize the above conditions and formulate it as follows:
$\begin{aligned} \nabla f(x) = \sum_{i=1}^{m} v_i \nabla g_i(x) (1a)\\ v_I \geq 0, i = 1,...,m (1b)\\ g_i(x) \leq b_i, i = 1,..., m (2)\\ v_i(b_i-g_i(x)) = 0, i = 1,..., m (3) \end{aligned}$
These constraints are called the Karush-Kuhn-Tucker (KKT) conditions, where (1) denotes dual feasibility, (2) denotes primal feasibility and (3) complementarity.

11.2 KKT conditions derived using the Lagrangian function

A general nonlinear optimization problem formulated as:

The Lagrangian function for problem (ILP) can be stated:

Note:
1. Lagrangian multipliers are also called dual variables.
2. In our formulation, all multipliers are non-negative as we have a minimization problem and all constraints are $\geq$ constraints.
3. The terms that are added to $f(x)$ and that define the Lagrangian function can be interpreted as penalty terms for not satisfying the constraints, and the multiplier $v_i$ can be interpreted as a penalty parameter for constraint $i$ .
Definition 11.2: The point $(x^*, v^*)$ is a saddle point to $L(x,v)$ if
$L(x^*, v) \leq L(x^*, v^*) \leq L(x, v^*)$
for all $x$ and $v \geq 0$
The definition of a saddle point states that $L(x,v)$ has a minimum with respect to $x$ and a maximum with respect to $v$ in the point $(x^*, v^*)$ . The point $(x^*, v^*)$ is a solution to the problem
$\max_{v \geq 0} \min_x L(x,v)$
Theorem 11.1 (Saddle point theorem): If the Lagrangian function $L(x,v)$ has a saddle point in $(x^*, v^*)$ , then $x^*$ is an optimal solution to the problem (ILP).
The saddle point theorem provides sufficient conditions for optimality but not necessary conditions. It may happen that a point is optimal without being a saddle point to the Lagrangian function.
A necessary condition for a point to be optimal is that it is a stationary point to the Lagrangian function. It can be proved using Lagrangain multiplier method that each stationary point satisfies the Karush-Kuhn-Tucker conditions.

11.3 Necessary and sufficient conditions for optimality

Theorem 11.2 (Necessary conditions for optimality)
Suppose that the point $x^*$ is a local minimum to the problem (ILP) and satisfies some regularity conditions. Then there is a solution $v \in R^m$ such that
$\begin{aligned} \nabla f(x^*) &= \sum_{i=1}^m v_i \nabla g_i(x^*)\\ v_i &\geq 0, i=1,...,m\\ g_i(x^*) &\geq b_i, i=1,...,m\\ v_i(b_i-g_i(x^*)) &=0, i=1,...,m \end{aligned}$
The theorem states that the KKT conditions must be satisfied if a point $x^*$ is to be a local minimum.
There are several types of "some regularity conditions", and the most common (and strongest) is that the gradients of the active and non-redundant constraints are linearly independent (also called Constraint Qualification condition).
Note: more constraint qualifications will be covered in nonlinear optimization. Because some implicit regularity conditions are met in LP or IP (e.g., in Lecture 14 slides 39, assume that there exists a point $\overline{x} \in S$ such that $g_I(\overline{x}) < 0, i \in L$ ).
Theorem 11.3 (Sufficient conditions for optimality)
Suppose that there is a point $x^*$ that satisfies the KKT conditions. If the problem is convex, then $x^*$ is both a local and global optimum.
(The comparison of necessary (串联) and sufficient (并联) condition): https://www.zhihu.com/question/30469121

11.4 Formulation of the KKT conditions

Sign restrictions of dual variable (also called Lagrangian multipliers) $v_i$

Primal problem $\leq$ -constraint $\geq$ -constraint =-constraint

Min problem $v_i \leq 0$ $v_i \geq 0$ $v_i$ free

Max problem $v_i \geq 0$ $v_i \leq 0$ $v_i$ free
The interpretation of the Lagrangian multiplier values in optimum is the same as for dual values in LP theory.

Primal problem	$\leq$ -constraint	$\geq$ -constraint	=-constraint
Min problem	$v_i \leq 0$	$v_i \geq 0$	$v_i$ free
Max problem	$v_i \geq 0$	$v_i \leq 0$	$v_i$ free

12 Methods for constrained nonlinear optimization

Two methods that can be used to solve nonlinear problems with linear constraints.
1. Frank-Wolfe method linearizes the objective function and solves a sequence of LP problems to find the optimal solution.
2. Quadratic programming based on KKT conditions
A nonlinear problem with nonlinear constraints can be reformulated and then solved by using a sequence of unconstrained nonlinear problems. Two methods:
1. Reformulating the problem using penalty functions
2. Using barrier functions

12.1 The Frank-Wolfe method

This method primarily is used for convex nonlinear problem with linear constraints
The idea is to linearize the function (make a first order Taylor series expansion) around the current point. By solving the resulting LP problem, we can determine a search direction and then perform a line search between the current point and the solution to the LP problem.
Considering as a convex function of a minimization problem and denotes a linear approximation of the function in the given point. If we solve the LP problem with as the objective function, we will obtain a solution in the point . The objective function values in the points and which are and respectively can be used to get bounds on the optimal objective function value
1. $f(x^{(k)})$ gives a pessimistic bound of the optimal objective function value $f(x^*)$ . This is a Upper Bound, UBD.
2. $z(y^{(k)})$ gives a optimisitic bound of the optimal objective function value $f(x^*)$ . This is a Lower Bound, LBD.
The search direction is $d^{(k)} = y^{(k)} - x^{(k)}$ . We need to ensure that we stay between the points $x^{(k)}$ and $y^{(k)}$ by limiting the step length $t$ by $0 \leq t \leq 1$ .
If the solution found satisfies $f(x^{(k)}) = z(y^{(k)})$ , then we have found the optimal solution.

12.2 Quadratic Programming

Quadratic programming problems have a quadratic objective function and linear constraints
A quadratic programming problem with equality constraints written in matrix form as
$\begin{aligned} &\min &\quad f(x) = & \frac{1}{2} x^T Q x + c^T x\\ &\text{s.t.} &\quad &Ax = b \end{aligned}$
It is assumed that the quadratic matrix $Q$ is symmetric and positive definite, which means that the problem is convex. It is also assumed that the matrix $A$ has full rank, which means there is always a feasible solution to the system of linear equations written as $Ax=b$ .
The KKT conditions for the problem above can be written in matrix form as
$\begin{aligned} c + Qx &= A^T v\\ Ax &= b \end{aligned}$
The system above can also be expressed as:
$\begin{bmatrix} -Q & A^T\\ A & 0 \end{bmatrix} \begin{bmatrix} x\\ v \end{bmatrix} = \begin{bmatrix} c \\ b \end{bmatrix}$
By assuming that the matrix $Q$ is positive definite and that the matrix $A$ has full rank, there is always a unique solution to the equations.
Algorithm - Active set method (The active set at $x$ is made up of those constraints $g_i(x)$ that are active at the current point.)

12.3 Penalty function methods

In this section, the constrained nonlinear problem is reformulated as an unconstrained optimization problem by adding a penalty term that measures and penalizes the distance from any infeasible point to the feasible region. The new function is a penalty function, and by using different values on a penalty parameter, we solve a sequence of unconstrained problems.
Referring to the Table 12.1 on Page 32, the larger the value of the penalty parameter, the closer it is to the optimal solution and the optimal objective function value. However, for really large values of the parameter, we get numerical problem.
A general nonlinear problem with equality constraints can be formulated as
$\begin{aligned} &\min &\quad &f(x)\\ &\text{s.t.} &\quad &h_i(x) = 0, i=1,...,m \end{aligned}$
The penalty function can be written as
$F_S(x, \mu) = f(x) + \mu \alpha(x)$
where the penalty term is $\alpha(x)$ , the penalty parameter is $\mu$ .
The unconstrained optimization problem becomes
$\min_x F_S(x, \mu)$
Recommended penalty term $\alpha(x) = \sum_{i=1}^m (max(0, g_i(x)))^2$

12.4 Barrier function methods

Instead of penalizing infeasible solutions, barrier function methods penalize the deviation from the border of the feasible region.
Penalty functions (also called exterior penalty functions) generate a sequence of infeasible points that converge to a feasible and optimal point.
Barrier functions (also called interior penalty functions) generate a sequence of feasible points that converge to the optimal solution.
A general nonlinear problem with inequality constraints can be formulated as
$\begin{aligned} &\min &\quad &f(x) \\ &\text{s.t.} &\quad &g_i(x) \leq 0, i=1,...,m \end{aligned}$
The penalty function can be written as
$F_B(x, \mu) = f(x) + \mu B(x)$
where the penalty term is $B(x)=\sum_{i=1}^{m} \frac{-1}{g_i(x)}$ or $B(x)= - \sum_{i=1}^{m} ln(-g_i(x))$ , the penalty parameter is $\mu$ .
The unconstrained optimization problem becomes
$\min_x F_B(x, \mu)$
Note: Barrier functions can only be used when all constraints are inequality constraints. This is because we need the feasible solutions to be interior points.
For initial large values of $\mu$ , the optimal solution will be an interior point relatively far from the border of the feasible region. For smaller values of $\mu$ , the penalty will be smaller and the solution will get closer to the border. Referrring to the Table on Page 318, we can see that we get closer to the optimal point and to the optimal function value as $\mu$ is decreasing.
Barrier functions always provide feasible solutions which can be used to generate pessimistic bounds of the optimal objective function value. In the same way, penalty functions provide optimistic bounds.

Lecture 15 - Example exams

Chalmers Course: Linear and integer optimization with applications

Course literature:

Computer exercises and software

Lectures plan

Problem solving sessions

Lecture 1 - Introduction, course map, modelling optimization applications (Chapter 1, 2.1-5, 3)

Chapter 1 - Introduction to optimization

Chapter 2 - Introductional examples and basic concepts

2.5 Search methods - a general description

2.5.1 Search method for continuous problems (primal method):

2.5.2 Search method for integer programming problems (primal method)

2.5.3 Primal and dual methods

Chapter 3 - Modelling

3.1 Index and summation

3.2 Production planning

3.3 Transport and distribution

3.3.1 Transportation problem

3.3.2 Transshipment problem

3.3.3 Facility location

3.4 Blending problem

3.4.1 Simple metal blending

3.4.2 Blending for fodder

3.4.3 Pooling problem

3.5 Allocation models

3.5.1 Investment strategy

3.5.2 Investment over several time periods

3.5.3 Loading of tankers

Lecture 2 JuMP and Gurobi

Lecture 3 Convexity and linear optimization (Chapter 2.4, 4.1)

2.4 Basic convexity analysis

Chapter 4 - Linear programming and the simplex method

4.1 Mathemetical characteristics of LP problems

Lecture 4 Linear optimization: basic feasible solutions (Chapter 4.2-4.4, 4.8)

4.2 Standard form

4.3 Basic solutions

4.4 Change of basis

4.8 Algebraic description of the simplex method

Lecture 5 Linear optimization: the simplex method (Chapter 4.5–4.10, (7.1))

4.5 The simplex method - an example

4.6 The simplex method - general algorithm description

4.7 Using simplex tableaux

4.8 Algebraic description of the simplex method

4.9 Finding a basic feasible solution

4.10 Convergence and degeneration

Lecture 6 - Duality in linear optimization models (Chapter 6 (7.3, 7.5))

6.1 Deriving and interpreting the dual problem

6.2 Formulation of the dual problem

6.3 Duality theory

Lecture 7 - Linear optimization: post-optimal and sensitivity analysis (Chapter 5.1–5.5, (5.6))

5.1 Sensitivity analysis and an example

5.2 Relaxation and restriction

5.3 Shadow prices and the connection to duality

5.4 Interpretation of output froma software package

5.5 Algebraic analysis of changes

Lecture 8 - Integer linear optimization: modelling, applications, complexity (Chapter 13, 2.6)

13.1 What is an integer programming problem

13.2 Knapsack problems

13.3 Facility location

13.4 Network design

13.5 Assignment problem

13.6 Generalized assignment problem

13.7 Matching problem

13.8 Set partitioning, set covering and set packing problems

13.9 Sequencing problem

13.10 Traveling salesman problem

13.11 Route planning

2.6 Basic complexity theory

2.6.1 Algorithm complexity

2.6.2 Problem complexity

Lecture 9 - Integer linear optimization: relaxation, bounds, branch & bound algorithm (Chapter 14.1-14.3, 15.1-15.3)

14.1 Overview of methods

14.2 Optimality and relaxations

14.3 To choose the right model

15.1 Introduction

15.2 General strategy for branch and bound

15.3 The Land-Doig-Dakin algoroithm

Lecture 10 - Integer linear optimization theory and algorthms: good/ideal formulation, cutting planes, lagrangean duality and relaxation (Chapter 14.4-14.5, 17.1-17.2, (17.3-17.4), 13.10-13.11, 15.4, (15.5))

14.4 Valid inequalities

14.5 Cutting plane methods

14.5.1 Gomory's method