Day 3: History and fields

Authors: Nauka

Started: 2022-12-03 Last Modification: 2022-12-03 , 1782 words, est. reading time: 9 minutes

Yesterday I tried to give you a quick intuition on what optimization in general and combinatorial optimization in particular are, as well as hint a bit at the historical developments that brought them to the current form. Today I want to go a bit deeper on this. I should note this specific perspective on the timeline is my own - and thus you might want to compare and contrast with more venerable thinkers in the field, like Prof. Schrijvers.

Primordial times

As I tried to convey in the last post of this series, optimization is really the study on how we can write down puzzles in a way that allows us to numerically solve them, as well as the study and development of the procedures we use to do so. Before we had computers, i.e. machinery to perform the numerics for us that allow us to focus just on thinking about the procedures and problem structure on a meta-level, humans needed to do all the actual numerics and carefool bookkeeping that comprises actually solving problems.

This isn’t easy to do and so I think it’s time to highlight all the societal and intellectual innovations that needed to happen before optimization as a general thing could happen:

we needed to develop computing machinery that can take an algorithm and execute it for us
Which required two things: the hardware to actually run the algorithm and the mental tools to specify the algorithm ina way a machine can run (software both for the machine but also in our brains)
For the hardware we needed to develop electric and mechanical precision engineering that made the bomba and before that, Baggages computing machine possible. That in turn requires a lot of logistics, communication etc. which all had to develop without the aid of machines to help doing the computation and analysis
For the software, we needed to bootstrap math and formal languages, which itself involved the same auto-regressive, snake feeding on its tail type of development that hardware did: in order to discuss and communicate ideas you need to develop a formal language, but that language needs you to share the underlying ideas and concepts and for people to make the effort of understanding you, they will need to care about at least roughly the problems you are thinking about - or be giant nerds that will jump onto a puzzle just because they care bout it (Heyo!). These nerds will then have juicy drama about who invented things first. I should also note that I will be talking in a very Euro-centrist perspective because that’s the lineage I grew up in, but there are theories that the Europeans didn’t independently come up with all of this. Oh, and of course, this was and is mainly a rich peoples/privileged game: I spent 30 years learning about mathematics and other “useless” things before I felt I was able to do useful synthesis and contributions. If you need to feed your family, you need to either be gifted by nature so you can do this much faster, you’ll need to be lucky enough to quickly find monetizeable problems to work on or you will probably not be able to study this at all

So we have at least two main branches merging into modern optimization theory, while the output of this theory will of course influence the very fields that gave birth to it (even in its nascent forms). We will see some examples in the following section.

Proto-optimization

In the last post I dropped the names of some of the western ancestors of our modern theory - Newton and Cauchy - who were both working on problems that were directly or indirectly inspired by astronomy, navigations and construction (although Newton did a lot of other stuff as well, and Cauchy was already a mathematician in the modern sense of thinking about abstract problems who just happened to also have concrete motivations.

Newton made a lot of contributions western thought, but the one that bears his name is Newton’s method. The special case that made the method famous was presented by others in 1711, written up by Newton in 1669 and was was one of the earliest ways of finding roots (a.k.a. $x$ s.t. $f(x)=0$). The generalized, modern template looks like this

$$ x_{n+1}=x_n - \frac{f(x_n)}{\dot{f}(x_n)} $$

While I’m a Leibniz or Euler notation man myself, I felt it’s appropriate to use Newton notation in this post. From tomorrow onwords on, we’ll be using $\frac{d f(x)}{dx}$,$\frac{\delta f(x)}{\delta x}$ or $\nabla f(x)$ for derivatives, partial derivatives and gradients respectively.

and will be used to find the root of the derivative when looking for minima or maxima (since extrema have derivative/gradient zero), leading to the form

$$ x_{n+1}=x_n - \frac{\dot{f}(x_n)}{\ddot{f}(x_n)} $$

This was template influenced or at least was known by Cauchy when he wrote about gradient descent in 1848. This section basically summarizes [@lemarechalCauchyGradientMethod2012] He described and analyzed the template

$$ x_{n+1}=x_n - \eta\dot{f}(x_n) $$

which for a “small enough” $\eta$ will converge to a local minima or a least close enough to it to find the solution with Newtons method.

I find it interesting that the simpler method was “developed” over 100 years later (or was at least not known to be well studied). It might be even more suprising that it took until 1951 until we have a record of someone trying to apply a similar scheme to fitting curves to data [@robbinsStochasticApproximationMethod1951]. The modernized template of this algorithm (the stochastic gradient descent, or Robbins-Monro algorithm) looks like:

$$ x_{n+1}=x_n - \eta\dot{f}(x_n;\mathbf{y}_n);\quad \mathbf{y}_n\sim p(\mathbf{y}) $$

Here $\mathbf{y}$ captures the stochasticity we deal with in estimating the derivatives, and we assume that $\mathbb{E}_{\mathbf{y}}[\dot{f}(x,\mathbf{y}]=\dot{f}^*(x)$, i.e. averaging over the stochastic derivatives will yield the real derivatives.

Or at least, it might be surprising until you go back to the dependency tree I sketched above: to even have access to a stochastic function like this, we need to be able to access data samples and perform updates while those samples are still revelant. For Newton and Cauchy, these were really tools to find coefficients of functions which lead to insight by construction. Data was extremely expensive and rare, and you had no real way of dealing with it being unreliable by just collecting more as we do above with the expected value assumption. You had to do your best to measure precisely, then maybe perform some auxiliary computation refining your estimate, then you would use your proto-optimization algorithm to help you solve your system of equations for what you want to find.

Towards combinatorial optimization

In between Newton and Cauchy are two other luminaries of western mathematics: Carl Gauss and Leonhard Euler.

Both were incredibly influential mathematicians (really polymaths, but at the tame that came with the territory), although very different: Euler wrote prolifically (866 publications + correspondance). Gauss had the motto “few but ripe”, and didn’t publish most of his discoveries. For our purposes, Euler is important because he laid the foundations of (amongst many things) graph theory which is one area where combinatorial optimization probems occur frequenty. Euler also worked on special versions of Hamiltonian paths called the knights tour (have a knight visit every space on a chess board), the finding and optimization of which is a classical combinatorial optimization problem.

Gauss, a few decades later, mentioned a method of finding solutions to systems of linear equations of the form

$$ Ax=b $$

which is still in use today and marks the moment where systems of linear equations became “manageable” by those in the know. However, this is not optimization, but closed form solving.

Later in the 1940s, Leonid Kantorovitch, Tjalling Koopmans (who jointly got an economics Nobel out for this), Frank Hitchcock and George Dantzig all (apparently independently?) developed this further into what we now call Linear Programming And apparently, Fourier, another O.G. of mathematics was thinking about it already in 1827? The things you learn when researching while blogging! . They all were trying to figure out allocations of investments and other planning problems like transportation and assignment - all problems that are now, solidly, in the domain of combinatorial optimization as soon as you perform them on integers (see the last post).

Schrijvers tour

This is where my little excursion joins Prof. Schrijvers [@Schrijvers] except that he starts in 1784 with Monge’s assignment problem, which we will see in 3 days when we start talking about transportation theory (i.e. the figuring out how to most efficiently assign and move stuff from $n$ locations to $m$ destinations to fullfill some form of demand with some form of production). I recommend reading the actual document, but the TL;DR is:

1784: Gaspar Monge formalizes the transportation problem, using efficient iron ore-to-smelting routes as a motivation
1832: A salespersons manual first illustrates the Traveling Salesperson Problem
1912-1931: Frobenius and König study Bipartite Graph matching, a special case of assignment which we won’t cover this year
1926: Boruvka publishes a first solution to a shortest spanning tree problem for economical electric power design
1930: Menger is the first mathematician to publish on the TSP and brings it to Harvard
1930: A.N. Tolstoi studies the transportation problem and develops some of the heuristics re-discovered later
1931: Egerváry extends Königs work on weighted matchinigs, inspiring Kuhn to develop the Hungarian method in 1950
1939: Kantorovitch develops linear programming as a general framework to think about problems like transportation, assignment etc. (maybe Kantorovitch also found the Simplex method or at least a variant in 1938 already) Read Prof. Schrijvers review! He goes into way more detail
1941:Hitchcock studies the assignment problem and maybe also discovers a form of Simplex
1940-1950: Dantzig develops Simpex
1946: Easterfield publishes what one might call the first “real” assignment algorithm which improves the brute force $\mathcal{O}(n!)$ complexity to speedy $\mathcal{O}(2^nn^2)$
1949: Robinson is working on the Traveling salesperson problem (day 10) and “only” makes progress on the assignment problem
1951: Dantzigs Simplex method for solving LPs is published
1954: Ford and Flukerson formalize the Max-flow problem, publish the max-flow min-cut theorem
1955-1957: Hungarian method developed by Kuhn and analyzed by Munkres
1955-1959 Shortest Path problem introduced and quickly “solved” in various variations by Shimbel, Bellman-Ford, Dijkstra, Dantzig and others

where I’ve omitted bit of the TSP and transportation developments. Read the paper!

Beyond 1960

Prof. Schrijvers ends his survey in 1960 and mentions that this excludes the complexity work of Karp (which we’ll see on days 13-16) and important work by Jack Edmonds on matching involving matroids (which are mathematical objects which allow efficient solutions to the matching problem). Originally I wanted to continue beyond this, but I do not have the time or energy to continue. Next year!