EPODE

Parallelism across the system

Depending on the number of available processors, the system of differential equations is divided into subsystems which allow parallel calculation. In contrast to parallel methods, massive parallelism can be obtained here to very large systems of ODE (for instance PDE discretized using the method of lines approach). The degree of parallelism depends on the number of differential equations. If the system of ODEs can be decoupled into independent subsystems, the resulting subsystems can be solved parallel without the otherwise unavoidable communication delays, possibly with different methods.

Parallelization in this way is normally not possible for systems of stiff differential equations as their solution requires implicit procedures. At every step of a parallel implicit procedure every single processor has to communicate with all other processors. However, all available parallel linear algebra software and tools can be used in the stiff case.

For parallelism across space it is useful to classify systems of ODEs into two types:

homogeneous, namely those in which all of the ODE subsystems are similar, as usually happens in the semidiscretization of PDEs. For system of equations with a regular structure, parallelism across the system can be very effective. It is attractive to think of integrating each subsystem on a separate processor synchronously.
heterogeneous, as usually happens when the equations are derived from VLSI system models. If the system is irregular, load balancing between processors can pose a significant problem. In this case it may be better to consider waveform methods or allocate tasks to processors as the processors complete prior tasks - technique called pool of tasks.

Parallel processing on a large task level: the equation segmentation method

Having available an array of processing elements a straightforward approach to solve the set of differential equations is to partition the set and then to allocate a certain part of the n equations to each of the available processing elements (PE - a processor and his local memory). Each PE is responsible for performing the function evaluations and integration associated with its assigned equations. These arithmetic operations can occur in parallel, with the y-values necessary to do succeeding function evaluations being communicated periodically between the PEs (equations segmentation method). How much time is saved, over a uniprocessor procedure, depends on the multiprocessor system concerned (especially the intercommunication delay at each integration step), the chosen integration algorithm, the interrelation structure in the system of equations to be solved, and the way the equations are allocated to the PEs.

Parallel processing on a medium task level

The above-described ES procedure can be augmented with a partitioning of the functions evaluations themselves: (compounds) parts of the function evaluations are distributed over the PEs involved in the ES algorithm in such a way, that the PE work load is to same extent equally spread.

Parallel processing on a small task level

The evaluation of the right hand side of the differential equation system, i.e. the function evaluation plus integration, is partitioned on a basic operator level. Note that operator parallelism is exactly the concept underlying data flow computers. When solving ODEs those integration methods are preferable which lead to arithmetic expression that can be well paralelized, such as expressions with a scalar product type structure. In this respect semi-analytic integration methods might be attractive.

Multirate methods

The basic idea of multirate methods is the decomposition of the system with respect to different kinds of variation, in particular for stiff systems (with different time scales). For example, the system is partitioned into two coupled subsystems y₁'=f₁(t,y₁,y₂), y₂'=f₂(t,y₁,y₂) where y₁ in R^l comprises the rapidly varying components of y and y₂ in R^m-l the slowly varying components. Multirate method numerically integrate such partitioned system with different step sizes adapted to the rapidity of variation of the respective selection components. The decoupled computations for the subsystems are synchronized in such a way that the largest step size is an integral multiple of all smaller step sizes (for instance H=qh). All computations can be structured into compound steps which involve one step for the components with the largest step size H and the respective number of steps needed for the more rapidly varying components to proceed from T_i to T_i+H (q steps).

There are two strategies to integrate the complete system y'=f(t,y)

the computations starts with q steps (with step size h) of the numerical integration of subsystem in y₁ using extrapolation values of the component vector y₂. Afterwards one step (with step size H=qh) is performed to integrate system in y₂ numerically;
the computation starts with one step (with step size H) of the numerical integration of subsystem in y₂ using extrapolated values of the component vector y₁. Afterwards q steps (with step size h=H/q) are performed to integrate system in y₁ numerically.

The generalization of this strategies to a larger number of subsystems of y'=f(t,y) is straightforward.

For problems y'=f(t,y) with a loose coupling between the subsystem, multirate formulas are more efficient even for uniprocessors machines than the respective conventional method which treats the system as a whole. For parallel computers all extrapolation steps are performed simultaneously and subsequently the subsystems are integrated concurrently on different processors.