This webpage contains a number of information, tools and applications for piecewise linear functions in abs-normal form. Feel free to contact us if you have any suggestions or questions concerning the presented material or content. Please be aware that we do not guarante the correctness of any software or theoretical result that is provided or referred to on this webpage.

Piecewise linear approximations of piecewise smooth functions

In applied mathematics and numerical optimization, one often employs linear first-order models \[ F(x+\Delta x) \approx F(x) + \frac{\partial F}{\partial x}(x)\cdot (\Delta x) \] or quadratic second order models according to Taylors expansion theorem \[ F(x+\Delta x) \approx F(x) + \Delta F(x) \Delta x + \tfrac{1}{2} (\Delta x)^\top \frac{\partial^2 F}{\partial^2 x}(x) (\Delta x) \] to approximate a nonlinear function $F:\mathbb{R}^n \to \mathbb{R}^m$. The approximations are typically used within some iterative method to determine a suitable search direction such as some gradient descent algorithm or (a variante of ) Newton's method to find a minimum of the function $F$ (with $m=1$) or a root $x^\ast$ with $F(x^\ast)=0$ (with $m=n$), respectively. Obvioulsy, these models require the computation of first or second order derivative information in terms of the Jacobian or Hessian matrix and the solution of a linear system.

Obviously, the convergence of any method that is based on these approximations is strongly dependend on the quality of the simplified models. However, in several cases it turns out that a simple linear or quadratic approximation is insufficient, for example, if $F$ is only piecewise smooth. This is mainly due to the fact that the Taylor approximation do not contain any information about the non-smoothness of the original function.

Recent developments in Algorithmic Differentiation now allow the computation of piecewise linear models, which contain valuable information about the non-smoothness of the underlying function and yield qualitatively better approximations. However, the piecewise linear approximations cannot be represented by a simple Jacobian or Hessian matrix but require more elaborate structure. A suitable algebraic structure seems to be the abs-normal form (ANF), which will be described in the following. PBLAS is a generalization of standard basic linear algebra systems for exactly this structure. The goal is to provide highly efficient, easy-to-use interfaces and methods that avoid the neccessity of recoding all operations for ANFs from scratch.

Notation and basic assumptions

In the sequel, we will only consider piecewise smooth functions that are abs-factorable, namely Lipschitz-continuous functions $F:X\subseteq \mathbb{R}^n \to \mathbb{R}^m $ that can be represented by an evaluation procedure \[ \begin{aligned} \hline v_{i-n} &= x_i & \qquad i =1\dots n\\ \hline v_{i} &= \varphi_i(v_j)_{j\prec i} &\qquad i =1\dots l\\ \hline y_{i-l} &= v_{i-m} &\qquad i=l+1 \dots l+m\\ \hline \end{aligned} \] In detail, we assume that the function $y=F(x)$ is given by a straight line code, which consists of finitely many intermediate values $v_i=v_i(x)\in \mathbb{R}$ and only smooth elemental functions $\varphi_i \in \Phi$ over the considered domain $X$ besides the absolute value function, i.e. $\Phi = \{\pm,*, \text{rec}, \text{sin}, \text{cos}, \text{exp}, \text{log}, \dots, \textbf{abs}, \dots\}$. Note that this also includes the minimum and maximum, which can be written in terms of the absolute value, e.g. $\min(u,v)=\tfrac{1}{2}(u+v-|u-v|)$.

For any abs-factorable function one can derive a piecewise linear approximation $\Delta F(\cdot, \cdot): \mathbb{R}^{n} \times \mathbb{R}^{n} \to \mathbb{R}^{m}$, which satisfies the second order approximation \[ \|F(x+\Delta x)-F(x)-\Delta F(x,\Delta x)\|\leq \gamma \|\Delta x\|^2, \quad \gamma \in \mathbb{R}_+, \] for any fixed basepoint $x \in X$ and directional increment $\Delta x \in \mathbb{R}^n$. The piecewise linear approximation $\Delta y = \Delta F(x, \Delta x) \in \mathbb{R}^m$ of $F(x)$ is obtained by augmenting each line within the original evaluation procedure following the rules for piecewise linear differentiation:

\[ v_{i} = \varphi_i(v_j)_{j\prec i} \Longrightarrow \Delta v_{i} =\Delta \varphi_i(v_j,\Delta v_j)_{j\prec i}= \begin{cases} abs(v_j+ \Delta v_j) -abs(v_j) & \text{if } \varphi_i(v_j)_{j\prec i} \equiv abs(v_j),\\ \sum_{j\prec i} \left(\frac{\partial\varphi_i}{\partial v_j}(v_j)\right) \Delta v_j & \text{else.} \end{cases} \]The resulting extended evaluation procedure is similar to the standard general tangent procedure for smooth functions. \[ \begin{aligned} \hline [v_{i-n}, \Delta v_{i-n}] &= [x_i,\; \Delta x_i] &i =1 \dots n\\ \hline [v_{i},\Delta v_{i}] &= [\varphi_i(v_j)_{j\prec i},\Delta \varphi_i(v_j,\Delta v_j)_{j\prec i}]&i =1 \dots l\\ \hline [ y_{i-l}, \Delta y_{i-l}] &= [v_{i-m}, \Delta v_{i-m}] &i=l+1 \dots l+m\\ \hline \end{aligned} \]

Note that the proposed extended evaluation procedure is a generalization of the standard general tangent procedure known from Algorithmic Differentiation and reduces to it if $F$ is smooth. In fact, if the evaluation procedure of $F$ does not contain any abs, min, max or any other non-smooth function calls, then the extended evaluation procedure simply yields the usual Jacobian-vector product $F'(x)\Delta x$ in the variable $\Delta y$.

If the evaluation routine of $F$ contains some absolute value function calls, then it can be represented by the reduced evaluation procedure \[ \begin{aligned} \hline v_{i-n} &= x_i & i =1 \dots n\\ \hline z_{i} &= \psi_i(v_j)_{j\prec i} &i =1 \dots s\\ v_{i} &= abs(z_i) &\\ \hline y_{i-s} &= \psi_i(v_j)_{j\prec i} & i=s+1 \dots s+m\\ \hline \end{aligned} \] This reduced representation contains only the arguments $v_j \in \mathbb{R}$ and results $v_i=|v_j|$ of all $s\in \mathbb{N}$ absolute value function calls within the evaluation procedure besides the (in-)dependent variables $x$ and $y$. It can be obtained by preaccumulating all occurring smooth elemental functions $\varphi_i$ to some complex functions $\psi_i$.

Similar to the reduced evaluation routine, one can derive a reduced representation for the corresponding extended evaluation procedure:

\[ \begin{aligned} \hline [v_{i-n}, \Delta v_{i-n}] &= [x_i,\Delta x_i] & i =1 \dots n\\ \hline [z_{i},\Delta z_{i} ] &= [\psi_i(v_j)_{j\prec i},\Delta \psi_i(v_j,\Delta v_j)_{j\prec i}] &i =1 \dots s\\ [v_{i},\Delta v_{i}] &= [abs(z_i),abs(\Delta z_i) -abs(z_i)] & \\ \hline [y_{i-s},\Delta y_{i-s}] &= [\psi_i(v_j)_{j\prec i},\Delta \psi_i(v_j,\Delta v_j)_{j\prec i}] & i=s+1 \dots s+m\\ \hline \end{aligned}, \]where $\Delta z_i=\Delta z_i(x,\Delta x)$ now denotes the argument $v_i+\Delta v_i$ and $z_i=z_i(x)\equiv v_i$ of all absolute values for all $i=1,\dots,s$. The reduced version of the extended evaluation procedure then induces the algebraic formulation that will be presented shortly.

All evaluation procedures can also be understood in terms of computational graphs, which will help to interpret the entries of the ANF. Namely, each evaluation procedure induces a directed, acyclic graph $G=(V,E)$, where each intermediate variable $v_i$ corresponds to exactly one vertex $v_i \in V$ and the edges $E\subset V\times V$ represent the underlying direct data dependence according to the precedence relation, i.e., there exists an directed edge $(v_j,v_i) \in E$ from the vertex $v_j$ to $v_i$ if and only if $j \prec i$.

Analogously, all intermediate variables $\Delta v_i$ and their precedence relation induce a computational graph for the extended evaluation procedure.

The reduced evaluation procedure then correspond to graphs that is obtained by vertex elimination. In detail, all vertices $v \in V$ (and accumulate the corresponding edges) are eliminate that do not belong to any (in-)dependent variable or intermediate variable, which is either the argument or the result of an absolute value function call.

The reduced computational of the extended evaluation procedure is derived in a similar way.

An algebraic representation for piecewise linear approximations

The piecewise linear approximations $\Delta F$ of $F$ allow for a compact algebraic representation. Namely, the piecewise linear approximations $\Delta y =\Delta F(x, \Delta x)$ can be represented by the set of piecewise affine equations \[ \begin{bmatrix} \Delta z \\ \Delta y \end{bmatrix} =\begin{bmatrix} a \\b \end{bmatrix} +\begin{bmatrix} Z & L\\J&Y \end{bmatrix} \begin{bmatrix} \Delta x \\|\Delta z| \end{bmatrix} \] using an additional vector of switching variables $\Delta z \in \mathbb{R}^s$ of suitable dimension $s\in \mathbb{N}$ and its component-wise absolute value $|\Delta z|\in \mathbb{R}^s$. The vectors and matrices $a=a(x) \in \mathbb{R}^s$, $b=b(x)\in \mathbb{R}^m$, $Z=Z(x)\in \mathbb{R}^{s\times n}$, $L=L(x)\in \mathbb{R}^{s\times s}$, $J=J(x)\in \mathbb{R}^{m\times n}$, $Y=Y(x)\in \mathbb{R}^{m\times s}$ represent certain derivative information that might depend on the base point $x$ and can be computed by techniques from Algorithmic Differentiation. Furthermore, it was observed that the matrix $L\in \mathbb{R}^{s\times s}$ can be assumed to be strictly lower (left) triangular, i.e. the typical sparsity pattern of the ANF for piecewise linear function $\Delta F_i:\mathbb{R}^{n_i}\times \mathbb{R}^{n_i} \to \mathbb{R}^{m_i}$ with $s_i$ switching variables typically looks like follows:

The reduced evaluation procedures and the corresponding computational graphs can be subdivided in to four different parts, each of which corresponds to one of the four mappings $F_1$, $F_2$, $F_3$ and $F_4$ as can be seen from the following figures:

and

In detail, $F_1$ corresponds to the mapping $x\to z(x)$ (orange), which only involves the direct computations that do not contain any absolute value calls. Accordingly, the function $F_2$ (yellow) contains the parts of $F$ that directly map the result of all absolute values functions $|z_j(x)|$ to the argument $z_i(x)$ of the other absolute value calls. Hence, the function $F_3$ (green) coincides with the direct mapping $|z(x)| \to y(x,z(x))$ and $F_4$ (blue) belongs to the direct mapping $x \to y(x,z(x))$. It turns out that these parts correspond exactly to the entries of the ANF, namely, the matrices of the ANF are given by the Jacobi-matrices \[ Z = F_1'(x) \equiv\frac{\partial z(x)}{\partial x}(x),\; L = F_2'(x) \equiv\frac{\partial z(x)}{\partial |z(x)|}(x),\; Y = F_3'(x) \equiv\frac{\partial y(x,z(x))}{\partial |z(x)|}(x),\; \text{ and } J = F_4'(x) \equiv\frac{\partial y(x,z(x))}{\partial x}(x), \] where the partial derivatives now only considers direct dependencies, i.e. only the parts of $F_i$ that do not belong to paths in the computational graph, which are interrupted by some absolute value call. Also the two vectors $a=a(x)$ and $b=b(x,z(x))$ can be expressed using those functions: \[ a(x) = z(x)-L|z(x)| \text{ and } b= -Y|\Delta z(x,\Delta z(x,0))|=-F_3'(x)|\Delta z(x,0)|=-F_3'(x)|z(x)|. \]

The (parts of an) ANF for an abs-factorable function can be obtained in several ways. The exact way always depends on the programming language and the used AD tool. In most cases only some minor modifications of the provided standard drivers are required if not already some specialized drivers exist. A tool, which supports Piecewise Linear Algorithmic Differentiation and the computation of ANFs, is ADOL-C. It only requires that the source code of the abs-factorable function is present. Namely, $F$ could be given by the routine

` ````
void eval(int m, int n, double* x, double* y){
... // Original code for y = F(x) using double variables
}
```

besides a copy of the original source code, which uses the adouble type provided by ADOL-C.

` ````
#include "<adolc/adolc.h>"
void eval(int m, int n, adouble* x, adouble* y){
... // Original code for y=F(x) using adoubles
}
```

The tracing of the function is similar to the smooth case - see the ADOL-C documentation for more details.

` ````
#include "<adolc/adolc.h>"
int main( int argc, char **argv ){
int n=..., m=...; // Number of (in-)dependents variables
double* x = new double[n]; // Vector of independent variables
double* y = new double[m]; // Vector of dependent variables
... // Set values for x and other computations
int tape = ...; // Tape-number - has to be specified by the user
enableMinMaxUsingAbs(); // Required to use tracing with min and max
trace_on(tape); // Set tape for current mode and start tracing
adouble *ax = new adouble[ n ]; // adouble vector for independent variables
adouble *ay = new adouble[ m ]; // adouble vector for dependent variables
for (int i=0; i<n; i++) // Set independent variables
ax[i] <<= x[i];
eval(m, n, ax, ay); // Call original function with adoubles
for (int i=0; i<m; i++) // Set independent variables
ay[i] >>= y[i];
delete[] ax; // Free vector of intermediate adoubles ax
delete[] ay; // Free vector of intermediate adoubles ay
trace_off(); // Stop tracing
... // Some other operations
delete[] x; // Free vector of intermediate doubles x
delete[] y; // Free vector of intermediate doubles y
}
```

ADOL-C then provides several simple drivers that can be used to determine the number of switching variables $s$, to evaluate the vector of switching variables $z$ and each row of the ANF:

` ````
int keep = 1;
int s = get_num_switches(tag);
zos_an_forward(tape,m,n,keep,x,y,z); // Return all arguments z of fabs
...
for(int i = m + s; i > 0; i--)
fos_an_reverse(tag,m,n,s,i-1,row); // Return the i-th row of the ANF in the array "row"
...
```

Standard tasks and basic calculus rules for ANFs

A standard task is the evaluation of ANFs, i.e. evaluate the piecewise linear approximation $\Delta y = \Delta F(x, \Delta x)$ of the function $F$ at a given base point $x$ for some given directional increment $\Delta x$, which is equivalent to the evaluation of the corresponding ANF. Here the strict triangular structure of $L$ allows a sequential evaluation for the components of the intermediate switching variable $\Delta z \in \mathbb{R}^s$, which is needed to compute the result $\Delta y$. In detail, a basic implementation of the ANF evaluation of an ANF is described by the following algorithm \[ \begin{aligned} &\text{Evaluate } u = a+Z\Delta x \\ &\text{Initialize } \Delta z = 0 \\ &\text{For }i=1,\dots,s \\ &\;\;\;\text{Evaluate } \Delta z_i= u_i +\langle L_{(i,:)}, |\Delta z|\rangle , L_{(i,:)}-i\text{th row of }L\\ &\text{EndFor} \\ &\text{Evaluate } v = Y|\Delta z| \\ &\text{Evaluate } w = b+J\Delta x \\ &\text{Evaluate } \Delta y = v+w \\ &\text{Return } \Delta y \text{ and }\Delta z \end{aligned} \] Naturally, parts of the algorithm exhibit parallel structures, which can be (partially) exploited within a numerical efficient realization.

Another standard task is the solution of piecewise linear functions in abs-normal form, i.e. compute $\Delta x$ such that $\Delta y=\Delta F(x,\Delta x)$ for given $\Delta y \in \mathbb{R}^m$. For example, two simple methods are the simple Gauss-like iteration \[ \Delta z^{j+1}=[a-ZJ^{-1}b]+S|\Delta z^j|, \quad \text{for } j =0,1,\dots \] and the modulus fixed-point iteration \[ \Delta z^{j+1}=[I-S\Sigma_{\Delta z^j}]^{-1}[a-ZJ^{-1}b],\quad \text{for } j =0,1,\dots, \] which can be used if $m=n$, $\det(J)\neq 0$ and the matrix $S$ satisfies certain further assumptions. Here \[ S = L - ZJ^{-1}Y \in \mathbb{R}^{s \times s} \] represents the Schur complement of the ANF system matrix and $\Sigma_{\Delta z}\in \mathbb{R}^{s \times s}$ is the diagonal matrix containing the signatures of the current switching variable $\Delta z$ on its main diagonal such that $\Sigma_{\Delta z}=|\Delta z|$. The fixed-point $\Delta z^\ast$, if existing and found, then yields \[ \Delta x^\ast=-J^{-1}[b+Y|\Delta z^\ast|], \] which satisfies $\Delta y=\Delta F(x,\Delta x^\ast)$.

PBLAS also provides standard operations such as the summation $\sum_{i=1}^r \Delta F_i(x,\Delta x)$ of several piecewise linear functions $\Delta F_i:\mathbb{R}^n \times \mathbb{R}^n \to \mathbb{R}^m$ in ANF or the compositon $\Delta F_r(x_r,\dots \Delta F_2(x_2, \Delta F_1(x_1, \Delta x))\dots)$ of several piecewise linear functions $\Delta F_i:\mathbb{R}^{n_i}\times\mathbb{R}^{n_i} \to \mathbb{R}^{m_i}$ in ANF with $n_i=m_{i-1}$, $x_i=F(x_{i-1})$ and $\Delta x_i=F(x_{i-1},\Delta x_{i-1})$ for $i=2\dots r$. The operations can be regarded as a generalization of general matrix addition and matrix multiplication from standard BLAS and provide convenient, user-friendly drivers to compute the resulting ANFs such as the ANF for the sum of several ANFs \[ \begin{bmatrix} \Delta z_1\\\vdots\\\Delta z_r\\ \Delta y \end{bmatrix} = \begin{bmatrix} a_1\\\vdots\\a_r\\\sum_{k=1}^r b_k \end{bmatrix} + \begin{bmatrix} Z_1 & L_1 & \cdots & 0\\ \vdots & \vdots & \ddots & \vdots\\ Z_r & 0 & \cdots & L_r\\ \sum_{k=1}^r J_k& Y_1 & \cdots & Y_r \end{bmatrix} \begin{bmatrix} \Delta x\\|\Delta z_1|\\\vdots\\|\Delta z_r| \end{bmatrix} \]

Related research topics and goals of the PBLAS project

Piecewise linear Algorithmic Differentiation and piecewise linear approximations can be used within several applications such as the numerical simulation of non-smooth physical processes. It also opens the door to several new optimization methods that benefit from the additional information given by the piecewise linear approximations. In our current research, we develop the theoretical framework for an extended linear algebra system that can store these piecewise linear approximations in terms of ANFs and naturally blend into existing frameworks, for example, cuBLAS, Armadillo,... . The main focus is on a highly efficient implementation for all standard tasks for ANFs (evaluation, solution, summation, multiplication,...) that is able to run on state-of-the-art hardware and exploit parallelism.