The regularized least squares time-varying cost function

Due to the large number of possible solutions given by Eq. (6.7), additional constraints are needed to decrease this freedom. In this particular case it is achieved by a two dimensional extended regularization.

In the case of a two dimensional IRF, there are two types of prior information available. The first one is that the considered systems are smooth. This smoothing is applied once over the system time 𝜏 which refers to the “classical” impulse response time and once over the global time 𝑡 which gives a support to handle the time varying behavior.

In addition to smoothing properties, another property of the stable impulse response function can be used: the IRFs are exponentially decaying.

Chapter 6- The regularized linear time-varying model

In order to include the above-mentioned prior information in our nonparametric representation, an extended LTV cost function (𝑉 ) must be defined.

DEFINITION 6.1The extended regularized LS LTV cost function (𝑉 ) consists of the ordinary least squares cost function (𝑉 _, , see Eq. (6.6)) and the regularization cost function (𝑉 , ), i.e.:

𝑉 = 𝑉 , + 𝑉 , = 𝑉 , + 𝜎 h × P h × (6.8)

where P is a (two dimensional) covariance hypermatrix (or block matrix) containing the prior information. The minimization of Eq. (6.8) with respect to h _× provides the solution which estimates the two dimensional IRF of an LTV system (as it is shown in Appendix A.2):

h _{× ,} = K K + 𝜎 P K 𝑦 (6.9)

The detailed statistical properties of this estimator can be found in Appendix A.3.1 –A.3.2.

To show the power of the regularized estimation, the simulation made for the simple ML estimation shown in Figure 6.3 is repeated now with the regularization technique, but at this time 10% noise (compared to the output rms) is added to the true output. In this case prior information is used –smoothness and decaying. The result is shown in Figure 6.5. The method (that will be called flexible approach) used for this simulation is explained in Section 6.4.2.

Figure 6.5: The regularized estimate of the two dimensional IRF shown in Figure 6.3. The system is excited by white noise with a high SNR.

0 50

100 150

0 10

20 30

-0.2 0 0.2 0.4

W^[samples]

t [samples]

h[t,W]

Chapter 6- The covariance hypermatrix

In the LTI case, when the measurement has a very good quality – or the measurement is sufficiently long – the variance will be smaller and smaller such that the regularization term can be neglected.

However, in the case of LTV systems, the regularization term must always be active. It is because the constraints for the system of linear equations – describing the LTV model – are defined here and they are fundamental in the estimation procedure.

In this particular case – as it is shown in the previously defined kernel functions – P consists of a matrix (denoted here as P ) and a scalar multiplier (denoted here as 𝑐́). Using this information, the cost function given by Eq. (6.8) can be rewritten as:

𝑉 = 𝑉 , + 𝑉 , = 𝑉 , +𝜎

𝑐́ h × P h × (6.10)

where _́ cannot be estimated separately (they are stuck together).

The covariance hypermatrix 6.3

The covariance hypermatrix P contains the prior knowledge on the properties of the observed system and its structure determines the expected behavior of the LTV system dynamics.

Note, that the novelty of this chapter is to formulate the regularization technique for the special case of the estimation of a two-dimensional time-varying impulse response function. In the next sections, the practical implementation of the inclusion of the prior knowledge into the covariance matrix is addressed.

It is important to remark that there is a similar formulation of the two dimensional impulse response function of nonlinear systems in [104], where Volterra kernels are being identified by using regularization.

The structure and the indexation system in the covariance matrix are very important. Unlike in the ordinary one dimensional case, where the construction of P is straightforward, here P becomes a very large matrix consisting of different sub- matrices and therefore it can be seen as a (two dimensional) hypermatrix (2D hypermatrix is known as block matrix as well). This consists of 𝑁 × 𝑁 sub-matrices, and every sub-matrix contains 𝐿 × 𝐿 elements.

The reason is that the covariance should be computed for every 𝑡 time instant and within two 𝑡 instants it should be evaluated for every pair of 𝜏 time instants.

Considering the structure of h _× (see Eq. (6.3)) and K (see Eq. (6.5)) it can be observed that the first 𝐿 block in h _× stands for the first instant of 𝑡 and within

Chapter 6- Building the covariance hypermatrix

this block there are 𝐿 instants of 𝜏. The second 𝐿 block of h _× belongs to the second time instant of 𝑡 which has also 𝐿 instants of 𝜏, and so on. This is illustrated in Eq. (6.3) and in Figure 6.3.

Using this information the structure of P is:

P = 𝔼{h _× h _× } =

P { , } ⋯ P { , }

⋮ ⋱ ⋮

⋯ ⋯ P { , }] (6.11)

Every P _{{ , }} represents a sub-matrix as follows:

P _{{ , }}=

𝔼{ℎ[𝑡 , 𝜏 ]ℎ[𝑡 , 𝜏 ] } ⋯ 𝔼{ℎ[𝑡 , 𝜏 ]ℎ[𝑡 , 𝜏 ] }

⋮ ⋱ ⋮

⋯ ⋯ 𝔼{ℎ[𝑡 , 𝜏 ]]ℎ[𝑡 , 𝜏 ] } (6.12) Every element at {𝜏 , 𝜏 } indices in P _{{ , }} can be computed by the expected value as follows:

P _, _{,{ , }}= 𝔼 ℎ[𝑡 , 𝜏 ]ℎ[𝑡 , 𝜏 ] (6.13)

The numerical values of these elements are defined by kernel functions linking the different values of P _{{ , }} and the desired behavior. This is explained in the next sections.

The numerical values in P are derived by using kernel functions. The kernels that are used in this thesis are explained in Section 5.4.

Building the covariance hypermatrix 6.4

6.4.1 General observations

By observing the structure of Figure 6.3.and Eq. (6.3), it can be clearly seen that the main- and co-diagonals in H _× are the impulse responses.

The first identifiable IRF can be found in the main diagonal. Every element before the main diagonal refers to the (transient) impulse responses originated before the observation window. Due to the insufficient knowledge of the past values of the input they will be not estimated (see ASSUMPTION 6.7), and the corresponding elements in the covariance hypermatrix P are not taken into account, respectively. A method to eliminate transient is shown in Section 7.2.

Next, two different ways of constructing the covariance matrix will be shown.

The key idea and the goal are the same: pushing down the degrees of freedom by introducing constraints (see Appendix A.5). The first approach gives a little better

Chapter 6- Building the covariance hypermatrix

result when the time variations are less smooth. It is because fewer constraints are implemented in the covariance matrix.

The second approach is a robust method which performs a little better when the time variations are systematic – and not random. In this particular case more constraints are used. This is done at the price of a higher computational load as it is detailed in Appendix A.4.

In document Nonparametric identification of linear time-varying systems (Pldal 73-77)