# NO.125 Piecewise smooth system and optimization with piecewise linearization via algorithmic differentiation

June 25 - 28, 2018 (Check-in: June 24, 2018 )

## Organizers

- Andreas Griewank
- Universidad Yachay Tech, Ecuador

- Andrea Walther
- University of Paderborn, Germany

- Koichi Kubota
- Chuo University, Japan

- Siegfried Rump
- Hamburg University, Germany

## Overview

### Description of the Meeting

#### 1.1 Overview and background

Functions with parameters that appear frequently as mathematical models in science, engineering and social phe-nomena often show a non-diﬀerentiable behavior, for example due to upper limits for temperatures, pressure, and the like. However, we usually simplify them to smooth and diﬀerentiable models in those ﬁelds because we analyze the models mathematically with well established methods suitable for smooth functions and need derivatives of the considered function with respect to the parameters to optimize the underlying system. Given the recent progress of algorithmic diﬀerentiation (AD) [1], it is now possible to implement software systems that compute the sub-gradient or Clarke generalized Jacobian at non-diﬀerentiable points of a certain class of non-smooth function called frequently “piecewise” smooth (PS) functions [3, 4]. In the simplest case, these functions are even piecewise linear (PL). Using the extend AD technology, we could adopt more general models that are described as large programs with some non-diﬀerentiable intrinsic functions, e.g., min(x, y), max(x, y) and abs(x), in nonlinear equation solving, integration of diﬀerential equations and optimization tasks. The proposed meeting will provide a unique possibility to discuss between worldwide leading researchers the recent progress within the above mentioned related ﬁelds from the view point of practical non-diﬀerentiable techniques. This discussion should initiate the development of new methods to optimize PS and PL functions, to solve systems of PL and PS nonlinear equations, and to integrate diﬀerential equations with PS right-hand sides. Moreover, the recent revival of artiﬁcial intelligence and machine learning, re-spectively, is remarkable. It is widely accepted that their fundamental and mathematical technique is the nonlinear optimization, where it was also well known that the back-propagation algorithm by D.E. Rumelhart was an instance of the reverse mode of algorithmic diﬀerentiation [1]. Hence, we also expect inﬂuence of the meeting in this direction of science, since also there non-smoothness plays an important role.

The starting point for the targeted non-smooth nonlinear optimization is the local linearization by diﬀerentiation as the mainstay of mathematical analysis and scientiﬁc computing. The local linearization of non-smooth continuous functions corresponds to the piecewise linearization by sub-gradients, the Clarke generalized Jacobian, and similar concepts. Using an adapted algorithmic diﬀerentiation, the piecewise linearizations of PS functions can be derived algorithmically, i.e., without additional eﬀort. For their analysis and further usage, the abs-normal form (ANF) plays an important role. It was developed by A. Griewank and A. Walther [3], i.e., by two of the proposed organizers. This formalization of the local piecewise linear model will form a common basis throughout the proposed meeting for the discussion of further developments. For example, it has been used to derive ﬁrst and second order optimality conditions for piecewise smooth objective functions, assuring the use of the piecewise smooth models for describing more complex phenomena.

#### 1.2 Purpose

The purpose of the proposed meeting is twofold: First, the meeting will concentrate in the development of new algorithms for PS and PL system solving and optimization building on recent progress made with respect to piecewise linearization by means of algorithmic diﬀerentiation. This will also include substantial contributions from the users of these new techniques. Second, the heterogeneous background of the participants as described below should made it possible to identify new challenges or problems common to many research areas that may be solved with algorithmic diﬀerentiation of PL and PS functions.

Because the appropriate handling of non-diﬀerentiabilities is one of the fundamental current challenges, the participants will come from a wide range of research areas including linear programming, discrete programming, algorithmic diﬀerentiation, neural network, machine learning, numerical veriﬁcations, interval computations, physics, chemistry, control theory, and operations research.

#### 1.3 Topics

** Introduction **Some concepts are known under diﬀerent names and/or imply diﬀerent properties in the various communities. Furthermore, some fundamental facts are not generally understood. Hence one can expect some quick beneﬁts from simply agreeing amongst each other to a common understanding of basic non-smooth concepts. It is well known, see, e.g., [5], that for square PL systems with as many dependent as independent variables one has

*bijectivity*⇒*openness*⇒*surjectivity*where the ﬁrst inclusion holds because of the invariance of domains for any continuous map. These three properties are equivalent to det(A) = 0 in the linear case, when the matrix A describes this mapping.

*This discussion can build on partial results that are available. Thin includes for example the fact that coherent orientation follows from a certain Schur complement matrix having a signed real spectral radius [7] less than one. The decision whether this condition is met is NP hard. Suﬃcient conditions for coherent orientation in the context of Mathematical Programming Problems with Equilibrium Constraints are given in [6].*

**Discussion: Can we computationally verify any of this like coherent orientation, which is equivalent to openness?**** Representation and conversion of PS functions **As we have just done, even mathematicians tend to identify linear(=aﬃne) maps with their matrix representation, assuming a more or less natural system of coordinates possibly taking sparsity into account. In the piecewise linear case, there are at least three fundamentally diﬀerent representations, namely by linear pieces, by nodal values on a triangularisation, or as a straight line code. There are corresponding data structures and conversion procedures, none of them being obvious.

*Discussion: An intriguing question is whether there are in a certain sense minimal canonical representations.*

**To reduce the number of linear pieces and the overall combinatorial complexity one may look for approximate representations and solvers.**

*Approximate functions and solvers*

*Discussion: Is there a need to estimate and systematically control the approximation errors?*** Basic PL subroutines **Not only the conversions mentioned just before require a lot of linear algebra calculations but one may also search for some factorizations and normal forms. It can for example be shown that almost all square piecewise linear systems of equations are transformable to a linear complementarity problem [9], for which questions of surjectivity and injectivity are fairly well understood. Moreover, there are well established solution algorithms and reductions to some canonical normal forms. Similar utilities should be provided for the general PL case.

*Discussion: In some applications one needs to calculate the full solution set, which may consist of several disconnected component. Is there any way to provide it?*

**We have already noticed that some but certainly not all properties of PS functions and their PL models correspond to each other. For example, injectivity of the local PL approximation ensures only surjectivity, but not even openness of the underlying PS function.**

*PS to PL and back*

*Discussion: These connections must be elucidated.*** Dealing with discontinuities **Even least squares projections onto ranges of PL functions are not necessarily continuous, and of course many right hand sides of dynamical systems have jumps, not only kinks.

*Discussion: The corresponding PL approximations will also be discontinuous and one may have to solve algebraic as wells diﬀerential inclusions. What are suitable PL models?*

**Since the classical satisﬁability problem (SAT) can easily be cast as a PL optimization problem, most basic numerical task appears to be NP hard in the number of variables.**

*Complexity bounds*

**Discussion: Is this still true if we impose certain restrictions or parameterize the problems for example with respect to the so-called switching depth?**** Interval Computation **One way to deal with non-diﬀerentiable functions are interval methods. Here left and right derivatives can be included by AD or automatic slopes can be used. In the latter case the zeros of even nowhere diﬀerentiable functions can be included with result veriﬁcation [8].

*Discussion: What is a good approach to combine interval computation with piecewise linearization?*** Semi-deﬁnite programming **Another way to handle non-diﬀerentiabilities are relaxations. In particular veriﬁed semi-deﬁnite programming seems very promising. Recently large problems arising in electronic structure analysis have solved with result veriﬁcation with up to 20 million unknowns and up to 20 thousand constraints [2]. An appealing property is that the additional time for rigorous veriﬁcation of bounds for the true result is very small compared to the time to compute an approximation.

**Discussion: How do semi-deﬁnite programming and piecewise linearization beneﬁt from each other?**#### 2 Organization of the meeting

** Role of participants **Invitees are nominated that are deeply related to the diﬀerentiation and piecewise lineariza-tion in their ﬁeld. They are expected to bring their own problems or test cases that are related to non-diﬀerentiability.

** Outline of the program **After introductory talks on recent progress of AD with respect to non-diﬀerentiabilities, some participants will give talks on problems in their ﬁeld. Not only theoretical topics with respect to the piecewise linearization of PS functions but also practical software tools that will help to solve their problems are discussed in order to launch new project(s) for developing new theories, tools, or new capabilities.

#### References

[1] A. Griewank and A. Walther. Evaluating Derivatives: Principles and Techniques of Algorithmic Diﬀerentiation. SIAM, Philadelphia, 2008.

[2] D. Chaykin, C. Jansson, F. Keil, M. Lange, K.T. Ohlhus, and S.M. Rump. Rigorous results in electronic structure calculations. Optimization online, 2016.

[3] A. Griewank On stable piecewise linearization and generalized algorithmic diﬀerentiation, Optimization Methods and Software, vol. 28 (2013), pp. 1139–1178. doi = 10.1080/10556788.2013.796683.

[4] A. Griewank, J.U. Bernt, M. Radons, and T. Streubel, Solving piecewise linear equations in abs-normal form, Linear Algebra and its Application, vol. 471 (2015), pp. 500–530.

[5] D. Ralph. A new proof of Robinson’s homeomorphism theorem for PL-normal maps. Linear Algebra and Its Applications, 178:249-260, 1993.

[6] Z.Q. Luo, J. S. Pang, and D. Ralph. Mathematical Programs with Equilibrium Constraints. Cambridge University Press, 1996.

[7] S.M. Rump. Theorems of perron-frobenius type for matrices without sign restrictions. Linear Algebra and Its Applications, 266:1–42, 1997.

[8] S.M. Rump. Inclusion of zeros of nowhere diﬀerentiable n-dimensional functions. Reliable Computing, 3(1):5–16, 1997.

[9] B. De Schutter, W.P.M.H. Heemels, and A. Bemporad. On the equivalence of linear complementarity problems. Operations Research Letters, 30(4):211-222, 2002.