As I’m quite sure every OrcaFlex user knows by now, OrcaFlex 9.0 introduced an implicit time domain solver in addition to the existing explicit solver. This change led to huge improvements in run times and thus prompted us to award the release ‘major upgrade status’ and promote it from version 8 to version 9!
In August 2007 we released OrcaFlex 9.1, which introduced many further improvements to the implicit solver — version 9.1 is now significantly faster than 9.0. The purpose of this blog post is to discuss some of the improvements in version 9.1, present a runtime comparison with Flexcom, and to discuss some future developments we plan to make OrcaFlex even faster still.
Integration schemes and dissipation
In version 9.0 we used the Newmark integration scheme with γ = ½ and β = ¼. This choice of parameters is known as the constant average acceleration (CAA) scheme.
Compared to the OrcaFlex’s explicit integration scheme this resulted in huge improvements in run times. However, we weren’t completely happy with the performance.
The principal cause for concern was the fact that some models would only run to completion with relatively small time steps. We implemented a variable time step algorithm which allowed OrcaFlex to use longer time steps if possible. For the time frame of the 9.0 release we were simply unable to investigate and implement a better solution.
After releasing version 9.0 we then had time to study the problem in more depth. The key is a feature present in most integration schemes (but not Newmark CAA) called numerical dissipation. This dissipation is designed to damp out spurious responses in non-physical high-frequency modes of the numerical model. Such spurious high-frequency responses are inevitable in structural finite element programs.
There are versions of the Newmark scheme which include numerical dissipation, but these are known to over-damp responses at lower frequencies. In other words they can affect the signal as well as removing the noise. Other commonly used integration schemes which include dissipation are:
- The Wilson-θ method. This has very good dissipation of the spurious high-frequency response but unfortunately over-damps the lower frequency modes.
- The α method of Hilber, Hughes and Taylor (HHT).
- The Generalised-α method of Chung and Hulbert.
For OrcaFlex 9.1 we chose to use the Generalised-α scheme. It has some rather obscure technical advantages over the HHT method, but in reality there is little to choose between Generalised-α and HHT in terms of run time and accuracy.
The Generalised-α scheme results in much better and more stable convergence of the implicit iteration. The net result of this is much faster simulations even than OrcaFlex 9.0.
The OrcaFlex “High frequency dissipation” parameter
OrcaFlex includes a data item called “High frequency dissipation”. This is the parameter denoted ρ∞ in Chung and Hulbert’s paper. It controls how much numerical high frequency dissipation is provided by the Generalised-α integration scheme.
The high frequency dissipation must take a value between 0 and 1. Perhaps counter-intuitively, larger values correspond to lower levels of dissipation. A value of 1 gives no dissipation and a value of 0 gives asymptotic annihilation, whereby high frequency response is annihilated after one time step.
The default value is 0.4 which has been chosen to give fast simulation run times without compromising accuracy. We have yet to come across an OrcaFlex model for which it is better to use some other value. So it is likely that in a future release of the program we will remove this data item and use an in-built value of 0.4.
Variable and constant time step schemes
The other problem with the Newmark implicit solver of OrcaFlex 9.0 is due to its use of a variable time step algorithm.
Variable time step schemes can introduce high frequency noise into a system which in turn can lead to inaccurate results, for example noisy time histories, non-physical spikes in results etc. This is a feature of all variable time step algorithms and not something particular to OrcaFlex. For the majority of systems no problems arise when using a variable time step. However, if you are using variable time steps then we do recommend that you check the quality of your results.
For OrcaFlex 9.0 we had no choice but to use a variable time step scheme, because we had no numerical dissipation built in to the integration scheme. Changing to use the Generalised-α scheme allowed us to offer a constant time step option in OrcaFlex. This is now the default setting and you should use a constant time step whenever possible.
Run Times
After all that theory, what difference do these changes actually mean in terms of run times? Well, that’s an impossible question to answer comprehensively because the answer varies for different models. However I can say that the majority of cases are many times faster in 9.1 than in 9.0, and we have not seen any case that is slower in 9.1 than in 9.0.
We have performed a comprehensive timing comparison for the deepwater SCR case as described in our validation document 99/101. This is a model of a 12″ SCR in 1800m water depth. Results of various analyses are compared against Flexcom, and extremely close agreement is achieved for all cases considered.
For the timing comparison we considered three environmental variants on the basic model as follows:
Case | Wave | Hs (m) | Tz (s) |
---|---|---|---|
Extreme | ISSC | 15 | 11 |
Fatigue#1 | ISSC | 1 | 4 |
Fatigue#2 | ISSC | 5 | 8 |
These were run in OrcaFlex 9.1 and Flexcom 7.3. The models had identical spatial discretisation and a variety of time steps were used. Here we present results for time steps 0.1s and 0.25s.
Time | Runtime (mins) | ||||
---|---|---|---|---|---|
step (s) | Case | OrcaFlex | Flexcom | ||
0.1s | Extreme | 47 | 202 | ||
Fatigue#1 | 37 | 182 | |||
Fatigue#2 | 37 | 191 | |||
0.25s | Extreme | 26 | 79 | ||
Fatigue#1 | 15 | 73 | |||
Fatigue#2 | 16 | 77 |
For this model OrcaFlex is between 3 and 6 times faster than Flexcom, depending on the time step and case considered. Although we have only varied the time step and the wave, we would expect run times to be sensitive to other model parameters. So it is impossible to make any general statement comparing the run times of the two programs. However, it is clear from this that the simulation run times for OrcaFlex 9.1 are extremely competitive.
These comparisons were performed by an independent consultant with access to both OrcaFlex and Flexcom.
Future improvements
We have recently made various improvements to the linear solvers that the implicit solver uses for each iteration of a time step. These changes, to be released as part of OrcaFlex 9.2, yield between 10% and 20% improvements on the run times of 9.1
Another area where we can see room for improvement is in the multi-threading performance of OrcaFlex. By multi-threading we mean the capability of processing a single job in parallel on each processor core of a computer.
OrcaFlex has been able to do this since version 8.7. However the facility is limited by the fact that to achieve best speed-ups you need at least as many OrcaFlex Lines in your model as processor cores. So as computers with more and more cores become commonplace, there will be more cases where the hardware is not being used as effectively as possible. In addition the speed-ups are not as significant when using implicit as when using explicit.
This is an area which we need to improve as multi-core machines are now becoming widespread. I hope that we can make significant improvements to our multi-threading capability over the next 18 months.
Of course, it’s quite common to have a lot more load cases than processor cores. In this scenario you can run multiple copies of OrcaFlex and process the load cases in parallel that way. This becomes difficult to manage if you have a large number of processor cores available and so one option is to use Distributed OrcaFlex (DOF).
As a simpler alternative to DOF, we are currently changing the batch facility in OrcaFlex to process multiple simulations in parallel, one per processor core. We expect to release this in version 9.2. This form of parallel processing (one simulation per core) allows pretty much optimal speed-up, i.e. n processor cores gives an n times speed-up.
In a similar vein we have recently revamped the fatigue calculation also to make use of multiple cores. This also yields close to optimal speed-up.
Finally, we are also planning to implement a frequency domain option and possibly a linear time domain option (OrcaFlex is currently fully non-linear). These obviously improve run time at the expense of accuracy. At the moment we anticipate that these options would be available as part of the standard OrcaFlex package — in other words we don’t expect to charge extra for frequency domain or linear time domain capabilities.
As always we would encourage and welcome any feedback you have on this topic. So please do e-mail us or post comments on the blog.