Parallelism across the time

Block methods, special PC methods and RK methods allows a limited amount of parallelism which was called method parallelism. The only way that it seems possible to use large-scale parallelism on small problems is via parallelism across time with iterative methods.

In order to improve the efficiency of numerical integrators, it was proposed to use some continuous time iteration methods to decouple the ODE and then to discretize the resulting subsystems. One such approach is called WR method.

A method that is massively parallel across time for ODEs must simultaneously evaluate the right side of the differential equation y'=f(t,y) at many different time points tn. These evaluations require approximations of the solution, y(tn), to compute a better approximation, so such a method is essentially an iterative method. Let the successive iterates be denoted by yn (m) for m=0,1, ... . For each value of m the approximation can be through as a discrete approximation to a waveform, so such a method is really a waveform method. The number N of discrete points, tn, n= M+1, ... , M+N handled simultaneously is the degree of parallelism of the method. The obvious approach for nonlinear ODEs is to linearize and use iteration. The simplest iterative method is the Picard method.

Example: Picard 's method

There is no natural parallelism across time when solving IVPs, so that if the system size is moderate then it is necessary to use temporal techniques in order to exploit any massive parallelism. The simplest techniques in this respect is the Picard method . The Picard method generates a sequence of iterative solutions y (0) (t), y (1) (t), ... on the region of integration satisfying the equations

y' (k+1) (t)=f(t,y (k) (t)), y (k+1) (t0)=y0.

starting from y (0) (t)=y0. If the dimension size is m, then this iterative approach allows the decoupling of the problem into m independent parallel quadrature problems. Unfortunately, the iterations converge very slowly unless ||d f/d y|| is small. Once the decoupling has take place different quadrature methods are used to discretized the resulting subsystems. One way to improve the convergence is by the technique of time-windowing in which the region of integration is split up into a series of windows and the iterative process then takes place on each window.% and the iterative process then takes place on each window. Another way to improve the convergence behaviour is via the concept of the splitting of the right-hand side and this leads to the shifted Picard method (Skeel) which takes the form

y' (k+1) (t)-Mky (k+1) (t)=f(t,y (k) (t))-Mky (k) (t).

Clearly there are difficulties in knowing how to choose the window size and the splitting matrix Mk automatically.

Waveform relaxation method

General form . The Picard method is a particular example of a much more general class of iterative methods known as waveform relaxation (WR) methods.

Waveform relaxation, also called dynamic iteration or Picard - Lindelof iteration , is a technique for solving large systems of ordinary differential equations. These methods are based on an iterative scheme extended over the whole interval of integration [t0,tN]. As a part of this scheme, decoupled subsystems are solved concurrently. Waveform relaxation methods have been developed for circuit analysis. The system is decomposed into

yi'=fi(t,yi,pi), i=1, ... , r , yi(t0)=y 0i

with the decoupling vectors pi=(y1, ... ,y i-1 ,y i+1 , ... ,yr)T. Treating the vectors pi as known parameters allows the simultaneous solution of the r independent IVPs. The relaxation process starts with a initial approximation over the whole interval [t0,tN] needed to initialize the decoupling vectors p1, ... ,pr.

WR techniques can perform extremely efficiently on problems arising from electrical network modeling. These techniques have been generalized to general systems of equations with less success because the automatic reordering and efficient groupings of the subsystems is often performed on the basis of physically which is not always available. A slow convergence of the iterations can occur in the case of strong coupling between subsystems.

To consider large-scale parallelism we have to consider iterations that encompass the space of the system or a significant part of the time interval. Some information from a previous iterate is used to compute the next iterate by replacing the ODE y'=f(t,y) with

y' (n+1) =f(y (m) )+G(y (m+1) ,y (m) ), y (m+1) (0)=y0,

where G(y,z) is any function such that G(z,z)=0.

The general form of a continuous-time WR method is given by

z' (k+1) (t)=G(t,z (k+1) (t),y (k+1) (t),y (k) (t)), z (k+1) (t0)=y0

y (0) (t)=y0, y (k+1) (t)=g(t,z (k+1) (t),y (k) (t)),

where G:[t0,T] x Rm x Rm x Rm to Rm and g:[t0,T] x Rm x Rm x Rm to Rm satisfy G(t,y,y,y)=f(t,y), g(t,y,y)=f(t,y). The functions G and g are called the splitting functions and are chosen in an attempt to decouple (as much as possible) the original system into independent subsystems which can be solved by a number of different numerical methods in an independent fashion.

A natural generalization of the Picard method is the more simple WR method

y' (k+1) (t)=F(t,y (k+1) (t),y (k) (t)), y (k+1) (t0)=y0,

where the splitting function F:[t0,T] x Rm x Rm to Rm satisfies F(t,y,y)=f(t,y). The Picard - Lindelof iteration is obtained for F(t,y,z)=f(t,z). One of the main purposes of splitting function is that of decoupling the original system as much as possible. In other words, the aim is to get an iterative process, in which there are as many independent subsystems as possible. If f is continuously differentiable and Lipschitz continuous in y and F is also continuously differentiable and is Lipschitz continuous in the second and the third argument, then the sequence y (k) converges uniformly in [t0,tf] to the solution y of ODE. The rate of convergence is of superlinear type. If the interval of integration is long, it can be necessary to perform a very large number of iterations before reasonable approximations of y(t) are obtained far from t0.

In waveform iteration, a processor obtains the current values of the waveforms from other processors prior to the start of the next iteration and then integrates its subsystem of equations over the integration interval independently of the progress of other processors. Thus, stepsizes appropriate to the subsystems can be used. In waveform methods it is necessary:

The main weakness of waveform relaxation is the iteration process which may suffer slow convergence or even divergence. Another disadvantage is the need for storing the waveforms which adds substantially to the memory requirement.

The WR has been extended to the scalar and parallel solution of PDEs.

There are a number of possibilities for waveform relaxation iterations:

The advantage of the Jacobi approach is that each component of the system can be solved independently in parallel, while a drawback is that a great deal of past information must be stored if m is large. Storage is not such a problem in the WRGS case, but there is no obviously decoupling of the systems. The SOR methods can be parallelized by different orderings of the components (the so-called chequer-board effect).

A different approach to the previous iterations is to linearize the problem, as in the Newton wavefrom approach (WRN), in which the function F is

F(t,y (k+1) ,y (k) )=f(t,y (k) )+fy(t,y (k) )(y (k+1) -y (k) )

This is a linear problem and so, for example, quadrature techniques can be used or techniques based on the fast parallel solution of linear recurrences.

One of the difficulties associated with the Jacobi an and GS scheme is their poor convergence behaviour. One way of accelerating this convergence is to use a multigrid WR approach. In particular, if a system of ODEs is derived from a parabolic PDE by a semi-discretization of the spatial variables, a multigrid acceleration of WR is possible which exploits the geometry of the problem. Standard time-stepping techniques for solving parabolic PDEs cannot be parallelized or vectorized efficiently unless the problem to be solved is very large and the number of processors is small. New parallel algorithms are needed for solving relatively small parabolic problems, i.e. problems with few unknowns per time level or problems solved on large scale parallel machines.