OrcaFlex 9.1, the fastest OrcaFlex yet!
Wednesday, January 23rd, 2008 by David HeffernanAs I’m quite sure every OrcaFlex user knows by now, OrcaFlex 9.0 introduced an implicit time domain solver in addition to the existing explicit solver. This change led to huge improvements in run times and thus prompted us to award the release ‘major upgrade status’ and promote it from version 8 to version 9!
In August 2007 we released OrcaFlex 9.1, which introduced many further improvements to the implicit solver — version 9.1 is now significantly faster than 9.0. The purpose of this blog post is to discuss some of the improvements in version 9.1, present a runtime comparison with Flexcom, and to discuss some future developments we plan to make OrcaFlex even faster still.
Integration schemes and dissipation
In version 9.0 we used the Newmark integration scheme with γ = ½ and β = ¼. This choice of parameters is known as the constant average acceleration (CAA) scheme.
Compared to the OrcaFlex’s explicit integration scheme this resulted in huge improvements in run times. However, we weren’t completely happy with the performance.
The principal cause for concern was the fact that some models would only run to completion with relatively small time steps. We implemented a variable time step algorithm which allowed OrcaFlex to use longer time steps if possible. For the time frame of the 9.0 release we were simply unable to investigate and implement a better solution.
After releasing version 9.0 we then had time to study the problem in more depth. The key is a feature present in most integration schemes (but not Newmark CAA) called numerical dissipation. This dissipation is designed to damp out spurious responses in non-physical high-frequency modes of the numerical model. Such spurious high-frequency responses are inevitable in structural finite element programs.
There are versions of the Newmark scheme which include numerical dissipation, but these are known to over-damp responses at lower frequencies. In other words they can affect the signal as well as removing the noise. Other commonly used integration schemes which include dissipation are:
- The Wilson-θ method. This has very good dissipation of the spurious high-frequency response but unfortunately over-damps the lower frequency modes.
- The α method of Hilber, Hughes and Taylor (HHT).
- The Generalised-α method of Chung and Hulbert.
For OrcaFlex 9.1 we chose to use the Generalised-α scheme. It has some rather obscure technical advantages over the HHT method, but in reality there is little to choose between Generalised-α and HHT in terms of run time and accuracy.
The Generalised-α scheme results in much better and more stable convergence of the implicit iteration. The net result of this is much faster simulations even than OrcaFlex 9.0.
The OrcaFlex “High frequency dissipation” parameter
OrcaFlex includes a data item called “High frequency dissipation”. This is the parameter denoted ρ∞ in Chung and Hulbert’s paper. It controls how much numerical high frequency dissipation is provided by the Generalised-α integration scheme.
The high frequency dissipation must take a value between 0 and 1. Perhaps counter-intuitively, larger values correspond to lower levels of dissipation. A value of 1 gives no dissipation and a value of 0 gives asymptotic annihilation, whereby high frequency response is annihilated after one time step.
The default value is 0.4 which has been chosen to give fast simulation run times without compromising accuracy. We have yet to come across an OrcaFlex model for which it is better to use some other value. So it is likely that in a future release of the program we will remove this data item and use an in-built value of 0.4.
Variable and constant time step schemes
The other problem with the Newmark implicit solver of OrcaFlex 9.0 is due to its use of a variable time step algorithm.
Variable time step schemes can introduce high frequency noise into a system which in turn can lead to inaccurate results, for example noisy time histories, non-physical spikes in results etc. This is a feature of all variable time step algorithms and not something particular to OrcaFlex. For the majority of systems no problems arise when using a variable time step. However, if you are using variable time steps then we do recommend that you check the quality of your results.
For OrcaFlex 9.0 we had no choice but to use a variable time step scheme, because we had no numerical dissipation built in to the integration scheme. Changing to use the Generalised-α scheme allowed us to offer a constant time step option in OrcaFlex. This is now the default setting and you should use a constant time step whenever possible.
Run Times
After all that theory, what difference do these changes actually mean in terms of run times? Well, that’s an impossible question to answer comprehensively because the answer varies for different models. However I can say that the majority of cases are many times faster in 9.1 than in 9.0, and we have not seen any case that is slower in 9.1 than in 9.0.
We have performed a comprehensive timing comparison for the deepwater SCR case as described in our validation document 99/101. This is a model of a 12″ SCR in 1800m water depth. Results of various analyses are compared against Flexcom, and extremely close agreement is achieved for all cases considered.
For the timing comparison we considered three environmental variants on the basic model as follows:
| Case | Wave | Hs (m) | Tz (s) |
|---|---|---|---|
| Extreme | ISSC | 15 | 11 |
| Fatigue#1 | ISSC | 1 | 4 |
| Fatigue#2 | ISSC | 5 | 8 |
These were run in OrcaFlex 9.1 and Flexcom 7.3. The models had identical spatial discretisation and a variety of time steps were used. Here we present results for time steps 0.1s and 0.25s.
| Time | Runtime (mins) | ||||
|---|---|---|---|---|---|
| step (s) | Case | OrcaFlex | Flexcom | ||
| 0.1s | Extreme | 47 | 202 | ||
| Fatigue#1 | 37 | 182 | |||
| Fatigue#2 | 37 | 191 | |||
| 0.25s | Extreme | 26 | 79 | ||
| Fatigue#1 | 15 | 73 | |||
| Fatigue#2 | 16 | 77 | |||
For this model OrcaFlex is between 3 and 6 times faster than Flexcom, depending on the time step and case considered. Although we have only varied the time step and the wave, we would expect run times to be sensitive to other model parameters. So it is impossible to make any general statement comparing the run times of the two programs. However, it is clear from this that the simulation run times for OrcaFlex 9.1 are extremely competitive.
These comparisons were performed by an independent consultant with access to both OrcaFlex and Flexcom.
Future improvements
We have recently made various improvements to the linear solvers that the implicit solver uses for each iteration of a time step. These changes, to be released as part of OrcaFlex 9.2, yield between 10% and 20% improvements on the run times of 9.1
Another area where we can see room for improvement is in the multi-threading performance of OrcaFlex. By multi-threading we mean the capability of processing a single job in parallel on each processor core of a computer.
OrcaFlex has been able to do this since version 8.7. However the facility is limited by the fact that to achieve best speed-ups you need at least as many OrcaFlex Lines in your model as processor cores. So as computers with more and more cores become commonplace, there will be more cases where the hardware is not being used as effectively as possible. In addition the speed-ups are not as significant when using implicit as when using explicit.
This is an area which we need to improve as multi-core machines are now becoming widespread. I hope that we can make significant improvements to our multi-threading capability over the next 18 months.
Of course, it’s quite common to have a lot more load cases than processor cores. In this scenario you can run multiple copies of OrcaFlex and process the load cases in parallel that way. This becomes difficult to manage if you have a large number of processor cores available and so one option is to use Distributed OrcaFlex (DOF).
As a simpler alternative to DOF, we are currently changing the batch facility in OrcaFlex to process multiple simulations in parallel, one per processor core. We expect to release this in version 9.2. This form of parallel processing (one simulation per core) allows pretty much optimal speed-up, i.e. n processor cores gives an n times speed-up.
In a similar vein we have recently revamped the fatigue calculation also to make use of multiple cores. This also yields close to optimal speed-up.
Finally, we are also planning to implement a frequency domain option and possibly a linear time domain option (OrcaFlex is currently fully non-linear). These obviously improve run time at the expense of accuracy. At the moment we anticipate that these options would be available as part of the standard OrcaFlex package — in other words we don’t expect to charge extra for frequency domain or linear time domain capabilities.
As always we would encourage and welcome any feedback you have on this topic. So please do e-mail us or post comments on the blog.
January 23rd, 2008 at 16:18
I am very thankful for the speed increase that has been achieved with Version 9.1/ However, an area where a big speed increase would be welcome is in post-processing. It would be great if it were possible to distribute the gathering of results over multiple cores, wheter on one machine or over multiple machines. I am currently postprocessing 8 load cases from a flowline analysis and it takes about a day to get all the results. This is all done using one core, while I have 32 cores available. I know that I can break up the post-processing of different runs over different files, but that requires additional time and effort.
January 23rd, 2008 at 19:43
Caspar,
That’s a very interesting suggestion.
Within the post-processing spreadsheet’s context of a VBA macro multi-threading is impossible (VBA doesn’t support it). Potentially we could multi-thread the results extraction in our DLL but I’m sure that would not offer significant or scalable benefits.
The right approach is as you suggest to process each simulation file in a separate thread (perhaps even distributed across several workstations).
We’ll have to think about this one because there is no easy solution to offer you.
In the meantime I would be interested in looking at your spreadsheet (and of course a typical dat file to run it with) to see if there’s any obvious way we can optimise what we currently have.
Cheers, David.
October 20th, 2008 at 04:18
I just want to clarify, if i’m running a 20 cases of a model with two lines in it, and have a two core machine only (no DOF etc) i can run:
a) a batch script with all 20 cases, running one at a time, multi threading,
b) open two orcaflex windows, each running 10 cases at a time,
c) one orcaflex window, running two at a time in parallel (cores set to 2).
I understood b) and c) to be as fast as each other, but from the recent user group meeting think case a) is slower.
Does option b) multi thread in both windows or use one core for each?
Which (if any) of these cases are equivalent?
October 20th, 2008 at 08:38
Phil,
Your options b) and c) will run at roughly the same speed. If anything I would expect c) to be slightly faster. You are right that option a) will be around twice as slow as the others.
Option b) does multi-thread in both windows so there will be 4 threads competing for resources on your dual core machine. This is why option c) is a little more efficient.
The best way to handle multiple load cases now is:
1. Upgrade to version 9.2 (9.2b is better than 9.2a).
2. Make your batch script use the “SaveData” command rather than “Run” so that it simply generates a .dat file for each load case.
3. Add each .dat file to the batch window and run.
Of course if you can get a quad core, or even a dual quad core (=8 cores) then your load cases will be processed even faster.
Hope this is all clear.
January 6th, 2009 at 12:04
I am an Orcaflex user in Technip and really impressed by the speed of new 9.1 implicit integration method.
It helped me alot in speeding up my analyses. However, it seems there is still quite an area of improvement to further increase the speed.
The fixed time step implicit is not stable in some rapid change systems and the variable time step also gets stuck between two time steps changing from one to another frequently which itself slows down the process.
The last analysis I did was the modelling of spooling of a reel lay completely which has taken me more than 2 month in a super fast 2 core processor !!!
Good luck for all the Orcina team, hope new year would bring good staff to users.
January 6th, 2009 at 20:11
VNikzad,
Thanks for your comment. Of course you can now use 9.2 which is just a little faster still than 9.1.
You mention that the implicit time step tends to go unstable when have rapid changes, impulses, high frequency responses etc. Such systems do generally need shorter time steps and as far as I am aware there is no getting around that. Variable time step schemes can help by using longer time steps when possible.
It sounds like you have an interesting model and if you would like us to take a look at it we’d be more than happy to try to speed it up either by making modelling changes or try possible changes to the software.