The algebra of truncated polynomials
Here we introduce, formally, a basic algebraic structure over the set of truncated polynomials and we show how such a structure allows to compute the partial derivatives of multivariate functions up to arbitrary order.
Formal definition
Consider the set \(\mathcal{P}_{n,m}\) of all polynomials of order \(\le m\) in \(n\) variables and having coefficients in \(\mathbf K\). We indicate with the symbols \(T_f, T_g, T_h\), etc. the generic members of such a set. Such a set is an algebra over the field \(\mathbf K\) if we introduce the truncated multiplication as the standard polynomial multiplication truncated at order \(m\). When needed, we will indicate such a multiplication with the symbol \(T_f \cdot T_g\).
This algebra is commonly referred to as the algebra of truncated polynomials. A first important property of this algebra is that, under the multiplication, all polynomials having a zero constant coefficient are nil-potent of order \(m+1\), as easily verified. We will indicate the generic truncated polynomial \(\in \mathcal{P}_{n,m}\) as \(T_f\) and often we will consider its constant part separated from the rest, writing \(T_f = f_0 + \hat f\). It is worth noting at this point how such an algebra is unitary and associative. The first property, in particular, deserves a few more words as it is a property that the algebra of (non-truncated) polynomials does not possess. Formally \(\forall T_f \in \mathcal{P}_{n,m}, \: \exists !\: T_g\in \mathcal{P}_{n,m} \Rightarrow T_g\cdot T_f = 1\). In practice:
as its easily verified by performing the truncated multiplication \(T_g \cdot T_f\) and accounting for the nilpotency of \(\hat f\).
The link to Taylor expansions
We make use of the multi-index notation according to which
\(\alpha = (\alpha_1, ..,\alpha_n)\) and \(\mathbf x = (x_1, .., x_n)\) are n-tuples and the Taylor expansion around the point \(\mathbf a\) to order \(m\) of a multivariate function \(f\) of \(\mathbf x\) is written as:
where:
The summation \(\sum_{|\alpha| = 0}^n\) must then be taken over all possible combinations of \(\alpha_j \in N\) such that \(\sum_{j=1}^n \alpha_j = |\alpha|\). The expression above, i.e. the Taylor expansion truncated at order \(m\) of a generic function \(f\), is a polynomial \(\in \mathcal{P}_{n,m}\) in the variables \(\mathbf{dx} = \mathbf x-\mathbf a\). We now show that if \(T_f, T_g \in \mathcal{P}_{n,m}\) are Taylor expansions of two functions \(f, g\) then the Taylor expansion of \(f\pm g, fg, f/g\) can be found operating on the algebra \(\mathcal{P}\), thus computing \(T_f\pm T_g, T_f\cdot T_g, T_f/T_g\). We may thus compute high order derivatives of multivariate functions computing their Taylor expansions and then extracting the desired coefficient.
Multiplication
We here prove that the product of two truncated Taylor expansions is the truncated Taylor expansion of the product. We perform the proof for \(n=1\) as the notation is there clearer. The multivariate case is formally identical, requiring a more complex notation. The truncated Taylor expansion of the product between two functions \(f\) and \(g\) is:
where we indicate with the superscript \((i)\) the \(i\)-th derivative with respect to the independent variable. We show how the same expression is derived by multiplying the Taylor expansions of \(f\) and \(g\):
The coefficients \(c_k\) in the last power series are determined as the Cauchy product of the two Taylor series (or discrete convolution) and are:
applying now the general Leibniz rule to the last expression we get:
which allows us to conclude:
Reciprocal
We here prove that the reciprocal of a truncated Taylor expansion, as defined in the algebra \(\mathcal{P}_{n,m}\) is the Taylor expansion of the reciprocal. Consider the generic function \(f\) and its truncated Taylor expansion \(T_f\). We denote with \(T_{(1/f)}\) the truncated Taylor expansion of the reciprocal and apply the multiplication rule to derive that, necessarily, \(T_f T_{(1/f)} = 1\). We separate the constant part of \(T_f\) from the rest writing \(T_f = f_0 +\hat f\) and we compute the product between \(T_f\) and the definition of reciprocal:
which allows us to conclude:
Elementary functions
Consider the MacLaurin expansion of a generic function \(g(x) = \sum g_n x^n\). Consider now a multivariate function \(\hat f(\mathbf x) = \sum_{|\alpha|=1} f_\alpha \mathbf x^\alpha\) whose MacLaurin Taylor expansion does not have a constant term. The composition between these two functions will then be, trivially, \((g \circ \hat f) (x) = \sum g_n (\sum_{|\alpha|=1} f_\alpha \mathbf x^\alpha)^n\). If we now truncate such an expansion to order \(m\), we get \(T_{g\circ f}= \sum_{n=0}^m g_n (\sum_{|\alpha|=1}^m f_\alpha \mathbf x^\alpha)^n\), which can be written as:
The above equation is called the composition rule and is only valid for functions whose Taylor expansion does not have a constant term and, is thus nil-potent of order \(m+1\) in \(\mathcal{P}_{n,m}\). In general, we cannot compute the truncated Taylor expansion of a composition function directly composing the truncated Taylor expansions. For most elementary functions, though, we can consider \(T_f = f_0 + \hat f\) and use some addition formula to be able to ‘’extract`` \(\hat f\) and thus exploit its nil-potency. The details on how this is done differ for each particular \(f\) considered and are reported in the following subsections for some commonly used functions.
Other functions suc as tan, cosh etc. can also be treated similarly and are not reported for convenience.
Exponential
Let us consider the case of the exponential:
We want to compute the truncated Taylor expansion of \(\exp(f(\mathbf x))\) starting from the truncated Taylor expansion \(T_f = f_0 + \hat f\). We thus write:
note that, now, we can apply the composition rule to \(\exp (f(\mathbf x) - f_0)\) since the MacLaurin Taylor expansion of \(f(\mathbf x) - f_0\) does not have a constant term. Hence:
and, finally:
Logarithm
Let us consider the case of the natural logarithm:
We want to compute the truncated Taylor expansion of \(\log(f(\mathbf x))\) starting from the truncated Taylor expansion \(T_f = f_0 + \hat f\). We thus write:
We can now apply the composition rule to get:
and, using the known expression for MacLaurin expansion of \(\log(1+x)\), we get:
Note that the above expression is only defined if \(f_0 \ge 0\).
Sine and cosine
Let us consider the case of the sine and cosine functions:
We want to compute the truncated Taylor expansion of \(\sin(f(\mathbf x))\), \(\cos(f(\mathbf x))\) starting from the truncated Taylor expansion \(T_f = f_0 + \hat f\). We thus write:
and, applying the composition rule to \(\cos(f(\mathbf x) - f_0)\) and \(\sin(f(\mathbf x) - f_0)\), we get:
Exponentiation
Let us consider the case of the power function.
We want to compute the truncated Taylor expansion of \(f(\mathbf x)^\alpha\) assuming to have access to the truncated Taylor expansion of \(f\), \(T_f = f_0 + \hat f\). We thus write:
We can now apply the composition rule to get:
Inverse functions
The case for the inverse trigonometric functions and hyperbolic functions is more complex and deserves some extended discussion.
Inverse hyperbolic tangent
We consider the inverse hyperbolic tangent function:
As before, we want to compute the truncated Taylor expansion of \(\mbox{atanh} (f(\mathbf x))\) assuming to have access to the truncated Taylor expansion of \(f\), \(T_f = f_0 + \hat f\). Since there is not a convenient addition formula we must seek a different expression. We start from the identity:
which allows us to write:
and, using the addition formula for the logarithms, we get:
we may now apply the composition theorem and find:
Using the Taylor expansions for the logaritmic functions we get the final expression used in AuDi:
Inverse tangent
We consider the inverse tangent function:
As before, we want to compute the truncated Taylor expansion of \(\mbox{atanh} (f(\mathbf x))\) assuming to have access to the truncated Taylor expansion of \(f\), \(T_f = f_0 + \hat f\). We can apply the following identity involving the imaginary unit \(i\):
and therefore write:
We may avoid to compute the above expression using complex arithmetic by expanding explicitly its terms. Separating the even from the odd powers of $hat f$ we get:
and, expanding further:
Using the binomial theorem it is possible to show:
and
which, substituted in the expression above, return a formula not involving any more imaginary units:
This formula is rather friendly to a computer implementation and is used in AuDi.
The other inverse functions
To compute the other inverse functions we may now use the identities:
which apply also for the Taylor expansions.
Practical Examples (to be done by hand)
In the above sections we derived a number of results that allow operating on simple Taylor expansions to compute Taylor expansions of increasingly complex expressions. We summarize here those results (keep in mind that \(T_f = f_0 + \hat f\)) :
It is worth mentioning here that other functions such as the inverse functions, the hyperbolic functions etc. can also be treated in this way. The above equations can be used to find Taylor expansions of increasingly complex functions by simply operating on the algebra \(\mathcal{P}_{n,m}\). Once a Taylor expansion is computed, its coefficients can be extracted to obtain the value of any desired derivative. We have thus built an automated differentiation system. While the formalism presented can, at first, appear complex, the system is rather simple as we hope will appear from the following examples.
Example 1 - A multiplication
Consider the simple function of two variables:
Its Taylor expansion \(T_f \in \mathcal{P}_{2,2}\) can be computed as:
Let us explicitly compute such an expression at the point \(x=3\), \(y=7\). The exact sequence of computations to be performed is:
and
We can then derive the final expression:
and we may easily extract the derivatives comparing this expression to the generic form of a Taylor expansion:
Example 2 - A division
Consider the simple function of two variables:
Its Taylor expansion \(T_f \in \mathcal{P}_{2,2}\) in (say) \(x=0\), \(y=1\) can be computed as follows:
and, applying the reciprocal rule, we conclude
where \(\hat p = 3 dx + 2 dy +2 dxdy + 0 dx^2 + 1 dy^2\), hence:
which allows, as in the previous example, to compute all derivatives up to order two:
Example 3 - An exponential
Consider the elementary function of two variables:
Its Taylor expansion \(T_f \in \mathcal{P}_{2,2}\) in (say) \(x=1\), \(y=0\) can be computed as follows:
and, applying the rule for the exponential of Taylor series, we conclude:
and,
Taylor models
Taylor series can be arbitrarily good approximations of functions and as such, can provide derivatives and solutions to ODEs to high orders. However, by definition a Taylor series does not consider its truncated part. While this part is very often small, it doesn’t allow for a rigorous assessment of the set of reachable states of a system/function. This knowledge would be useful for anybody that wants to have a mathematical guarantee of the range of the function. A simple example is of course the initial uncertainty of a satellite state propagated through space. With a mathematical description of the remainder, one can guarantee that two objects will not collide (assuming the dynamics of the propagation perfectly resembles reality).
To describe this remainder term, which represents the magnitude of all the terms that were truncated by the Taylor series, interval arithmetic is used and added to a new object called a Taylor model.
Taylor’s theorem
A core part of the Taylor model arithmetic is Taylor’s theorem (here for one variable only for simplicity):
Let \(k \geq 1\) be an integer and let the function \(f: R \rightarrow R\) be \(k\) times differentiable at the point \(a \in R\). Then there exists a real number \(\xi \in [a, x]\) such that:
This tells us that you can approximate the range of a function within a given domain by taking the \(k+1\) th derivative of \(f\) at a given point within that domain. A simple intuition would be: if you wanted to approximate a 0th order function, you could take the first order derivative and state that \(f(x) = f(0) + f'(\xi) \cdot x\) if the domain is \([0, x]\). This first order derivative would then indicate the maximum deviation. In reality more steps are required for a mathematically sound reasoning, and details can be found here2 and here3.
Formal definition
Now, we define a Taylor model as a tuple of a (now multi-dimensional) Taylor polynomial \(P_f\) of order \(k\) expanded around \(\vec{x_0}\) and valid over a domain \([\vec{a}, \vec{b}]\) and an interval \(I_f\). In other words:
Arithmetic
While all arithmetic operations can be found in [1], I will show a few example. For addition, you would have:
so that a Taylor model \(T_{f+g}\) for \(f+g\) can be obtained via
It should be noted that gduals are used for all operations involving Taylor polynomials. Thus we define
And for multiplication
Note that \(P_{f} \cdot P_{g}\) is a polynomial of \((2 k)\) th order. We split it into the part of up to \(k\) th order, which agrees with the Taylor polynomial \(P_{f \cdot g}\) of order \(k\) of \(f \cdot g\), and the excess polynomial \(P_e\), so that we have
A Taylor model for \(f \cdot g\) can now be obtained by finding a bound interval for all the terms except \(P_{f \cdot g}\). For this purpose, let \(B(P)\) be bounds of the polynomial \(P:[\vec{a}, \vec{b}] \subset R^v \rightarrow R\), namely,
Altogether, interval remainder bounds for \(f \cdot g\) can be found via
Definitions are also given of all intrinsic functions such as exponential, logarithm, sine etc. (again, see [1]).
It was just shown that polynomials of the excess polynomial \(P_e\) need to be calculated. For this, we use bernstein polynomials. Bernstein polynomials are a reformulation of any particular polynomial in a certain form. It turns out that the coefficients of these Bernstein polynomial terms indicate something about the range. In particular the Bernstein form theorem (for univariate polynomials) is:
In particular, the maximum and minimum coefficient give a tight bound on the range of the polynomial over a given domain if the minimum and maximum actually correspond to the first and last bernstein coefficient. More can be read in [2] about its basic form.
Exhaustively evaluating these Bernstein coefficients is a computationally intensive task, especially with growing dimension and monomial count. To remedy this, a recently developed matrix method was implemented from [3].