Results: Extreme value statistics results

There is often a requirement to predict the extreme responses of a system, for example to determine the likelihood of a load exceeding a critical value that may lead to failure. Such values are needed when using standards such as DNV OS F201 and API RP 2SK.

OrcaFlex can estimate extreme values for any given time domain result by analysing the simulated time history of the variable using extreme value statistical methods. You may, for instance, perform a mooring analysis in an irregular sea-state and then estimate the maximum mooring line tension for a 3-hour storm.

The statistical theory for this estimation is well-established and is described in the theory section. The procedure is essentially this:

You select the statistical distribution to be used to model the distribution of extremes.
OrcaFlex estimates the distribution model parameters that best fit the simulation time history of the variable.
OrcaFlex uses the fitted distribution to estimate and report the required extreme statistic (e.g. return level), for a specified period of exposure.
OrcaFlex provides diagnostic graphs that you should use to judge the reliability of the results.

The extreme value statistics results form is designed to lead you through this process.

When you open the extreme value statistics results form, for a selected result, you will come first to the data page, where you will select the distribution. Moving then to either of the other pages (results or diagnostic graphs) will cause OrcaFlex to carry out the estimation part of the procedure. The diagnostic graphs assist in testing the model. The results page reports the estimated statistics, e.g. the return value for the specified period, the estimation uncertainty inherent in that value etc.

Data

For convenience, the time history result graph is reproduced on the data page. The data required for the fitting of the model are entered on this page, and are as follows.

Distributions

These fall into two groups, according to the statistical method with which they are applied. For details see the extreme value statistics theory section.

Rayleigh distribution. This method assumes that the variable is a stationary Gaussian process. This is perhaps a reasonable assumption for waves, particularly in deep water, and for responses which are approximately linear with respect to wave height. However, for many other variables of interest, the Gaussian assumption is invalid and leads to poor estimates of extreme values.
Weibull and generalised Pareto (GPD) distributions. These distributions are both fitted using the maximum likelihood method. Historically, the Weibull distribution has often been used for marine systems, but the generalised Pareto is preferred by the extreme value statistics community because of its sound mathematical foundations.

Extremes to analyse

Specifies whether maxima (upper tail) or minima (lower tail) are to be analysed.

Threshold and decluster period

These data are only required when using the Weibull and GPD distributions, which are fitted to extremes of the time history, and those extremes are selected using the peaks-over-threshold method with (optional) declustering.

The threshold controls the peaks-over-threshold method. This allows you to control the extent to which the analysis is based on only the extreme values in the data (the tail of the distribution).

The decluster period controls the declustering. This helps avoid or reduce any statistical dependence between the extreme data values used in the analysis. It can be set to one of the following:

Zero, in which case no declustering will be done, and all values above the specified threshold will be included. This is generally not recommended since the values are unlikely to be independent.
A positive value. In this case OrcaFlex will break the sequence of time history values into clusters of successive values that stay above the threshold. It will then decluster by merging successive clusters that are separated by periods (during which the variable is less than the threshold) that last no longer than the specified decluster period. The most extreme value of each of the resulting merged clusters will then be included in the analysis.
'~'. This special value may be used to tell OrcaFlex to take the clusters to be the groups of values between successive up-crossings of the mean value (or down-crossings if analysing lower tail). The most extreme value of each such cluster will then be included in the analysis, but ignoring any that do not exceed the threshold.

The threshold is drawn on the time history graph, to help visualise its value relative to the extremes of the data. The number of data points that will be included in the analysis (after the threshold and declustering have been done) is also displayed. This helps with setting the threshold and decluster period.

The best value for the threshold is one that strikes a balance between a not-extreme-enough value (which will increase the number of data points fitted but may give biased fitting by allowing less extreme values to influence the fitting too much), and a too-extreme value (which will fit to only the more relevant extreme data points, but may give very wide confidence intervals if there are too few such extremes in the data).

Note:

OrcaFlex provides a default value for threshold. This is calculated as $\mu + 3\sigma$ where $\mu$ and $\sigma$ are the mean and standard deviation, respectively, of the time history. This value is provided because OrcaFlex needs to have an initial value. However, there is no reason to believe that this initial value will be an appropriate threshold value and so we do not recommend that you use this value in your analysis.

Results

The following data items, found on the results page, do not affect the fitting of the statistical model. Rather, they are applied to the fitted model to obtain the reported results.

Rayleigh

Storm duration is the return period for which the return level is reported. The length of the simulation, relative to this duration, will determine the accuracy of the estimate for the return level.

Risk factor is the probability of exceeding (or falling below, for lower tail) the estimated extreme value. For example, you may ask for the 3-hour extreme value that is exceeded with a probability of 0.01 (i.e. a risk factor of 1%).

Weibull and GPD

Storm duration is defined as for the Rayleigh distribution.

The maximum likelihood fitting procedure used for these distributions allows the estimation of a confidence interval for the return level, for a specified confidence level. OrcaFlex reports this estimated confidence interval in addition to the estimated return level.

The reported return level is defined to be the level whose expected number of exceedences in the specified storm duration is one. The fitted values of the model parameters and corresponding standard errors are also reported.

Note:

For some values of storm duration (usually small values) it might not be possible to calculate the return level. This is indicated by the value 'N/A' (meaning 'not available'). Similarly, for some combinations of storm duration and confidence level, the calculation may fail to determine the confidence limits, and again these are then denoted by 'N/A'.

Diagnostic graphs

The diagnostic graphs will help you to assess the goodness-of-fit of the model, and how appropriate or not the fitted distribution is. They should be interpreted together, not in isolation, as follows.

The quantile plot displays quantiles of the empirical data plotted against model quantiles. If the model is a good fit, then the points should lie close to the superimposed diagonal line, and any significant departure from this (for example a systematic trend away from the diagonal) indicates poor model fit. The vertical lines are pointwise 95% tolerance intervals and may be used as a guide to deciding whether any departure from the diagonal is statistically significant. If all the points are contained within their associated tolerance intervals, then the modelled values are probably sufficiently close to the empirical value for the model fit not to be of concern. If, however, a significant number of the points fall outside their associated tolerance intervals, then that may raise concerns about the validity of the fitted model.
The return level plot shows return level against return period (i.e. storm duration), with the latter on a logarithmic scale to highlight the effect of extrapolation. The central line on the graph is the return level for the fitted model, and the pair of outer lines the corresponding pointwise 95% confidence limits. The points are the empirical return levels, based upon the data, and should lie between the confidence limits if the model fits the data well. As with the quantile plot, a significant number of points contravening these limits indicates poor model fit. Again, OrcaFlex may sometimes be unable to determine the confidence limits for some return periods – this may result in gaps in the confidence limit lines, or even in their not appearing at all.

An example of diagnostics graphs indicating a good model fit is shown below:

Figure:

Diagnostics graphs for a good model fit

If either of these graphs indicates a poor model fit, then you should reconsider the entries on the data page:

Distribution. The distribution may be inappropriate – the data may simply not conform to the selected distribution.
Threshold. The threshold may be too low, hence including too many points which are not in the tail of the distribution; or too high, resulting in too few data points for the analysis and consequent large variation in the results.
Decluster period. This may be too long (so too few data points), or too short (so successive data points might not be independent).

Automation

The extreme value statistics capabilities can be automated in a number of different ways.

OrcaFlex spreadsheet

The OrcaFlex spreadsheet post-processing facility supports analysis using the Rayleigh distribution via the Rayleigh extremes command. The Weibull and GPD distributions are not available in the current version due to the complexity of threshold selection.

OrcaFlex programming interface

The C/C++, Delphi, Python and MATLAB programming interfaces to OrcaFlex all support automation of extreme value statistics. As with all other functionality, the Python and MATLAB interfaces are the easiest to use.

The full analysis capability is available via the programming interface. That is, in contrast to the OrcaFlex spreadsheet, analysis using the Weibull and GPD distributions is available.