Day 4: Linear Programs

Authors: Nauka

Started: Last Modification: 2022-12-04 , 1451 words, est. reading time: 7 minutes

Edit from the future: I got covid on day 4 of writing this, which means I will move the schedule to “as soon as done”.


Today we do the first deeper dive into technical details of a method. And given its ubiquity, I think it’s a good thing to start with the lingua franca of combinatorial optimization: Linear programming. This post will quickly recap the beginnings and then rush through the two main algirithmic templates currently in use:

we will cover Duality, another important concept tomorrow when we briefly touch on convex optimization in general.

Born from conflict

Linear programming developed in parallel in both the West and the USSR, driven by the needs to solve more and more complexoperations problems - hence the name “operations research” for a specific subset of optimization theory.

If you aren’t familiar with the term “Operations” in business and other organisational, it refers to the decision making that figures out how something can be done, purely based on a goal and the surrounding constraints, contrasting with executive decision making which is about what should be done. For a personal example: I have ADHD. I can focus and run the operations of e.g. writing, but I struggle directing this focus. . For Kantorovitch, Dantzig, Koopmans and Hitchcock (the big names dropped yesterday), the operations were military and industrial ones: how to maximize production of a specific set of goods, how to maximize sales by selling in different markets etc.

The canonical form of writing down these problems has become

\[\begin{align} \max\quad c^T&x\\ \text{s.t.}\quad A&x\leq b\\ \quad &x\geq 0\\ c,x\in \mathbb{R}^n,&\quad b\in \mathbb{R}^m,A\in\mathbb{A}^{m\times n} \end{align}\]

there are also the standard form, augmented form (used in the Simplex algorithm) and a few other presentations, but they are all equal and can be transformed into another.

Solving LPs requires one to first find feasible solutions (all the points fulfilling the constraints) and then from amongst these, find the ones that maximise \(c^Tx\). Since without the constraints we might simply set all \(x_i=\pm\infty\), it might be intuitive that the solutions of the LP will be found at the boundaries of the set the constraints box us into - a shape called polytype or polyhedron. This intution can be formalized in the language of duality (see tomorrows post) and is also exploited by the Simplex algorithm, Dantzigs famous method for solving LPs.

The Simplex algorithm

The Simplex algorithm exploits the fact that when optimising a function like \(c^Tx\) given a set of constraints, we will find the solution at the boundary of the constraint set (where we budge against a barrier keeping us from infinity). Every boundary point is thus what is referred to as a “basic feasible solution”. The question is how to find the best feasible solution - or even the first such boundary point.

Simplex does this by first converting the problem into a Simplex tableau which does the following things:

We can then always find a bfs by setting all \(x_i=0\) and will then perform the following procedure

  1. while there are still positive coefficients on the basic variables, their columns are possible pivot columns. Select one of them Implementation detail :-D
  2. Choose a row as the pivot row (corresponding to a basic variable) s.t. that all other basic variables remain positive
  3. perform the pivot on the tableau
  4. Go back to 1.

This will iteratively visit different vertices until we can no longer pivot, which means we have found a solution.

The Simplex algorithm: nitty gritty details

Is this efficient?

The answer to this is surprisingly ambiguous: in general and in the real world, a well crafted Simplex implementation (with apropriate pivot rules etc.) runs very efficiently and can scale to thousands of variables. BUT, you can prove (withe Klee-Minty cube) that it’s worst case efficiency is exponential in the dimension you are working in.

How much does this matter? Well, that’s an executive decision. What matters for us as researchers who want to understand this apparent contradiction (bad worst case behaviour, good real world behaviour) it serves as evidence that the real world has some additional structure which the Klee-Minty cube lacks and which is favourable to the Simplex algorithm by chance.

Interior points methods

However, you might not want to rely on this hidden structure to always be there. And as a researcher, well, you want proofs of efficiency! Leonid Khachiyan devised a method which gives you these guarantees (the Ellipsoid method), which provably has a worst case runtime of \(\mathcal{O}(n^2L)poly(n,m,L)\). Here \(\mathcal{O}(n^2L)\) is the number of iterations of the algorithm and \(poly(n,m,L)\) means we need to solve a subproblem with polynomial complexity depending on \(n,m,L\) (e.g., imagine the iterations takes time proportional to \(\mathcal{O}(n^4)\), then the total complexity would be \(\mathcal{O}(n^6)\) ). \(L\) here is the number of bits you need to represent \(x\), which captures a bunch of possible structures of the input domain (e.g. if you only care about a small range with a specific distribution, \(L\) will shrink), but also avoids cheating by encoding constraints and dimensions into special values in an attempt to shrink \(n\) or \(m\). The dependence on \(L\) will generally be a fixed \(L^2\).

We won’t go into details on the Ellipsoid method Prof. Arora got you covered though because while a big theoretical breakthrough and ushered in a whole , the ellipsoid is almost never used in practice due to numerical instability (the rate at which small errors accumulate).

Instead, Narendra Karmarkar’s algorithm (Karmarkar 1984) not only has a (provable) worst case complexity of \(\mathcal{O}(n^{3.5}L^2)\) instead of the original ellipsoid methods \(\mathcal{O}(n^{6}L^2)\), it’s also actually amenable to be implemented efficiently and numerically stable. Karmarkar notes

In the ellipsoid algorithm, the round-off errors in numerical computation accumulate from step to step. The amount of precision required in arithmetic onerations ~rows with the number of steps. In our algorithm, errors do not accumulate. On the contrary, the algorithm is self-correcting in the sense that the round-off error made in one step gets compensated for by future steps. If we allow \(\epsilon\) error in computation of the vector \(x^(k)\) in each step, we can compensate for the errors by running the algorithm \(2\epsilon\) steps.

The basic idea is that instead of staying on the boundaries of the polytope like in simplex, we construct a sphere around our current iterate which is guaranteed to only contain points in the feasible set, then select the best point within this sphere. To see the cleverness of this, you should appreciate the following facts:

  1. We can “easily” solve homogeneous linear equalities of the shape \(Ax=0\) as long as \(A^{-1}\) exsts because we don’t have inequality constraints to deal with
  2. We can map optimization on a sphere into this by adding an equality constraint s.t. \(\sum_{i=1}^n x_i=0\), which we can do by simply adding a row of ones to \(A\).
  3. It is easy to find the radius of a sphere which will stay within a simplex (a special polytope) using the formula \(r=\frac{1}{\sqrt{n(n-1)}}\) since we are dealing with a high dimensional generalisation of the eqilateral triangle.
  4. It is possible (as presented in the paper) to map the constraint polytype onto a Simplex centered around the current point that ensures a point within the Simplex is also a feasible point

Which leads us to the (surprisingly simple!) algorithm of

def karmarkars_algorithm(x,A,c,alpha):
  while c.dot(x)>alpha:
    T,T_invert= find_transformation(x,A) # 
    x_simplex=T(x)
    c_simplex=T(c)
    A_sphere=transform_and_augment(T,A)
    x_simplex=solve_for_new_x(A_simplex,c_simplex) # n^3 or n^2.5 complexity, most expensive operation here
    x=T_invert(x_simplex)
  return x
    

where the pseudo-python actually doesn’t hide that much mathematical detail (which I might add to this post once I have time).

Karmarkar’s algorithm : some details


And that’s it for today. Tomorrow we will dig a bit deeper on why both the Simplex and the IP method look for solutions at the boundaries and what this means when we think in general about solving problems with constraints - in other words “What the hell are primal and dual problems”?

See you tomorrow. And Слава Україні!

Karmarkar, N. 1984. “A New Polynomial-Time Algorithm for Linear Programming.” In Proceedings of the Sixteenth Annual ACM Symposium on Theory of Computing, 302–11. STOC ’84. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/800057.808695.