Seminars

NO.125 Piecewise smooth system and optimization with piecewise linearization via algorithmic differentiation

Shonan Village Center

June 25 - 28, 2018 (Check-in: June 24, 2018 )

Organizers

  • Andreas Griewank
    • Universidad Yachay Tech, Ecuador
  • Andrea Walther
    • University of Paderborn, Germany
  • Koichi Kubota
    • Chuo University, Japan
  • Siegfried Rump
    • Hamburg University, Germany

Overview

Description of the Meeting

1.1 Overview and background

Functions with parameters that appear frequently as mathematical models in science, engineering and social phe-nomena often show a non-differentiable behavior, for example due to upper limits for temperatures, pressure, and the like. However, we usually simplify them to smooth and differentiable models in those fields because we analyze the models mathematically with well established methods suitable for smooth functions and need derivatives of the considered function with respect to the parameters to optimize the underlying system. Given the recent progress of algorithmic differentiation (AD) [1], it is now possible to implement software systems that compute the sub-gradient or Clarke generalized Jacobian at non-differentiable points of a certain class of non-smooth function called frequently “piecewise” smooth (PS) functions [3, 4]. In the simplest case, these functions are even piecewise linear (PL). Using the extend AD technology, we could adopt more general models that are described as large programs with some non-differentiable intrinsic functions, e.g., min(x, y), max(x, y) and abs(x), in nonlinear equation solving, integration of differential equations and optimization tasks. The proposed meeting will provide a unique possibility to discuss between worldwide leading researchers the recent progress within the above mentioned related fields from the view point of practical non-differentiable techniques. This discussion should initiate the development of new methods to optimize PS and PL functions, to solve systems of PL and PS nonlinear equations, and to integrate differential equations with PS right-hand sides. Moreover, the recent revival of artificial intelligence and machine learning, re-spectively, is remarkable. It is widely accepted that their fundamental and mathematical technique is the nonlinear optimization, where it was also well known that the back-propagation algorithm by D.E. Rumelhart was an instance of the reverse mode of algorithmic differentiation [1]. Hence, we also expect influence of the meeting in this direction of science, since also there non-smoothness plays an important role.
The starting point for the targeted non-smooth nonlinear optimization is the local linearization by differentiation as the mainstay of mathematical analysis and scientific computing. The local linearization of non-smooth continuous functions corresponds to the piecewise linearization by sub-gradients, the Clarke generalized Jacobian, and similar concepts. Using an adapted algorithmic differentiation, the piecewise linearizations of PS functions can be derived algorithmically, i.e., without additional effort. For their analysis and further usage, the abs-normal form (ANF) plays an important role. It was developed by A. Griewank and A. Walther [3], i.e., by two of the proposed organizers. This formalization of the local piecewise linear model will form a common basis throughout the proposed meeting for the discussion of further developments. For example, it has been used to derive first and second order optimality conditions for piecewise smooth objective functions, assuring the use of the piecewise smooth models for describing more complex phenomena.

1.2 Purpose

The purpose of the proposed meeting is twofold: First, the meeting will concentrate in the development of new algorithms for PS and PL system solving and optimization building on recent progress made with respect to piecewise linearization by means of algorithmic differentiation. This will also include substantial contributions from the users of these new techniques. Second, the heterogeneous background of the participants as described below should made it possible to identify new challenges or problems common to many research areas that may be solved with algorithmic differentiation of PL and PS functions.
Because the appropriate handling of non-differentiabilities is one of the fundamental current challenges, the participants will come from a wide range of research areas including linear programming, discrete programming, algorithmic differentiation, neural network, machine learning, numerical verifications, interval computations, physics, chemistry, control theory, and operations research.

1.3 Topics

Introduction   Some concepts are known under different names and/or imply different properties in the various communities. Furthermore, some fundamental facts are not generally understood. Hence one can expect some quick benefits from simply agreeing amongst each other to a common understanding of basic non-smooth concepts. It is well known, see, e.g., [5], that for square PL systems with as many dependent as independent variables one has
bijectivityopennesssurjectivity
where the first inclusion holds because of the invariance of domains for any continuous map. These three properties are equivalent to det(A) = 0 in the linear case, when the matrix A describes this mapping.
Discussion: Can we computationally verify any of this like coherent orientation, which is equivalent to openness?
      This discussion can build on partial results that are available. Thin includes for example the fact that coherent orientation follows from a certain Schur complement matrix having a signed real spectral radius [7] less than one. The decision whether this condition is met is NP hard. Sufficient conditions for coherent orientation in the context of Mathematical Programming Problems with Equilibrium Constraints are given in [6].

Representation and conversion of PS functions   As we have just done, even mathematicians tend to identify linear(=affine) maps with their matrix representation, assuming a more or less natural system of coordinates possibly taking sparsity into account. In the piecewise linear case, there are at least three fundamentally different representations, namely by linear pieces, by nodal values on a triangularisation, or as a straight line code. There are corresponding data structures and conversion procedures, none of them being obvious.
Discussion: An intriguing question is whether there are in a certain sense minimal canonical representations.

Approximate functions and solvers
    To reduce the number of linear pieces and the overall combinatorial complexity one may look for approximate representations and solvers.
Discussion: Is there a need to estimate and systematically control the approximation errors?

Basic PL subroutines   Not only the conversions mentioned just before require a lot of linear algebra calculations but one may also search for some factorizations and normal forms. It can for example be shown that almost all square piecewise linear systems of equations are transformable to a linear complementarity problem [9], for which questions of surjectivity and injectivity are fairly well understood. Moreover, there are well established solution algorithms and reductions to some canonical normal forms. Similar utilities should be provided for the general PL case.
Discussion: In some applications one needs to calculate the full solution set, which may consist of several disconnected component. Is there any way to provide it?

PS to PL and back   We have already noticed that some but certainly not all properties of PS functions and their PL models correspond to each other. For example, injectivity of the local PL approximation ensures only surjectivity, but not even openness of the underlying PS function.
Discussion: These connections must be elucidated.

Dealing with discontinuities   Even least squares projections onto ranges of PL functions are not necessarily continuous, and of course many right hand sides of dynamical systems have jumps, not only kinks.
Discussion: The corresponding PL approximations will also be discontinuous and one may have to solve algebraic as wells differential inclusions. What are suitable PL models?

Complexity bounds   Since the classical satisfiability problem (SAT) can easily be cast as a PL optimization problem, most basic numerical task appears to be NP hard in the number of variables.
Discussion: Is this still true if we impose certain restrictions or parameterize the problems for example with respect to the so-called switching depth?

Interval Computation   One way to deal with non-differentiable functions are interval methods. Here left and right derivatives can be included by AD or automatic slopes can be used. In the latter case the zeros of even nowhere differentiable functions can be included with result verification [8].
Discussion: What is a good approach to combine interval computation with piecewise linearization?

Semi-definite programming   Another way to handle non-differentiabilities are relaxations. In particular verified semi-definite programming seems very promising. Recently large problems arising in electronic structure analysis have solved with result verification with up to 20 million unknowns and up to 20 thousand constraints [2]. An appealing property is that the additional time for rigorous verification of bounds for the true result is very small compared to the time to compute an approximation.
Discussion: How do semi-definite programming and piecewise linearization benefit from each other?

2 Organization of the meeting

Role of participants   Invitees are nominated that are deeply related to the differentiation and piecewise lineariza-tion in their field. They are expected to bring their own problems or test cases that are related to non-differentiability.

Outline of the program   After introductory talks on recent progress of AD with respect to non-differentiabilities, some participants will give talks on problems in their field. Not only theoretical topics with respect to the piecewise linearization of PS functions but also practical software tools that will help to solve their problems are discussed in order to launch new project(s) for developing new theories, tools, or new capabilities.

References

[1] A. Griewank and A. Walther. Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation. SIAM, Philadelphia, 2008.

[2] D. Chaykin, C. Jansson, F. Keil, M. Lange, K.T. Ohlhus, and S.M. Rump. Rigorous results in electronic structure calculations. Optimization online, 2016.

[3] A. Griewank On stable piecewise linearization and generalized algorithmic differentiation, Optimization Methods and Software, vol. 28 (2013), pp. 1139–1178. doi = 10.1080/10556788.2013.796683.

[4] A. Griewank, J.U. Bernt, M. Radons, and T. Streubel, Solving piecewise linear equations in abs-normal form, Linear Algebra and its Application, vol. 471 (2015), pp. 500–530.

[5] D. Ralph. A new proof of Robinson’s homeomorphism theorem for PL-normal maps. Linear Algebra and Its Applications, 178:249-260, 1993.

[6] Z.Q. Luo, J. S. Pang, and D. Ralph. Mathematical Programs with Equilibrium Constraints. Cambridge University Press, 1996.

[7] S.M. Rump. Theorems of perron-frobenius type for matrices without sign restrictions. Linear Algebra and Its Applications, 266:1–42, 1997.

[8] S.M. Rump. Inclusion of zeros of nowhere differentiable n-dimensional functions. Reliable Computing, 3(1):5–16, 1997.

[9] B. De Schutter, W.P.M.H. Heemels, and A. Bemporad. On the equivalence of linear complementarity problems. Operations Research Letters, 30(4):211-222, 2002.

Report

No-125.pdf