Get Complete Project Material File(s) Now! »
A MULTI-DISTRIBUTION ADAPTATION OF THE EXISTING POT FRAMEWORK
The works presented in section 1.4.1 lasted from 2007 to 2014. They began with a simple updating of a methodology for determining extreme wave heights to bring it in line with the state of the art. The methodology used at the time by SOGREAH consisted of the following four steps, as per the recommendations of the IAHR Working Group (Mathiesen et al., 1994):
processing of the time series and identification of directional wave sectors;
selection of storm peaks using the Peaks-Over-Threshold (POT) approach;
fitting of a Weibull distribution to the peaks using the least square method;
computation of quantiles (extreme wave heights).
This approach was updated by incorporating the Extreme Value Theory (EVT) described in detail in Coles (2001). At the end of the internship, the following methodological improvements had been implemented:
introduction of the Generalized Pareto Distribution (GPD);
use of the Maximum Likelihood Estimator (MLE);
extension to a multi-distribution framework by considering other distributions such as Weibull and Gamma/Pearson-III;
goodness-of-fit assessment using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC).
These improvements did not significantly alter the general framework of the methodology. In particular, the POT approach was kept, as it is particularly well suited to the maritime field where the number of significant storms per year is generally large enough for it to be deemed preferable to the annual maxima method (Cunnane, 1973). However, almost every step of the methodology was closely examined and new ideas arose from the discussions with fellow scientists and engineers. It must also be said that the dual approach of academic research on one hand and everyday coastal engineering on the other hand was particularly fruitful for setting up methods capable of blending the rigor of the theory and the flexibility required to deal with real-world projects.
A TWO-STEP FRAMEWORK FOR OVER-THRESHOLD MODELLING
The multi-distribution model presented above, and in particular this double threshold approach, seemed very convenient, and it appeared to be rather popular in the literature in the following years. However, at the time it was merely considered practical, with no conceptual consequences. Nevertheless, discussions within the OSSË working group and other engineers and researchers led to work focusing on that double-threshold approach. The related concepts were also examined. First of all, it was obvious that this approach was not restricted to wave heights, or to the meteooceanic field, but could be applied to hydrology and to all kind of environmental data.
Secondly, an in-depth literature review showed that the choice of threshold was based sometimes on physical arguments, sometimes on statistical ones, and sometimes on both (see for instance the paper of Lang (1999) for a literature review). The need for clarification became obvious. We based our reasoning on the observation that the analyst deals with a time series of observations of the variable (from measurements or modelling) at a given time step, while the conventional tools provided by the EVT assume that the dataset is independent and identically distributed (i.i.d.). Consequently, a step is needed to go from the former to the latter. This step is a declustering process (Smith, 1989). Once the i.i.d. sample has been set up, a new step of statistical optimization can begin in order to find the best statistical model to fit to the data. At this point, the solution becomes clear: independent and identically distributed peaks must be extracted from the original time series using physical considerations, and statistical tools may then be used in order to determine the optimal threshold above which a statistical law is fitted to the exceedances.
The first step thus consists in extracting i.i.d. peaks from the time series. Of course, considering the exceedances over a threshold is very practical tool to achieve this goal. But it is necessary to think further about the significance of this operation. The time series provides discrete values, at a given time step, of a continuous environmental variable, i.e. discrete observations of a physical quantity describing a physical phenomenon: wave height, temperature, wind speed, river discharge, rainfall, etc. The basic laws of physics (ultimately the conservation of mass and energy) are such that these quantities are temporally autocorrelated, and their temporal rate of variation is usually bounded. The finer the time step of the series, the stronger the correlation. So setting a threshold will not extract individual values: it will identify time intervals within which the observation is far from its average value. In other words, it will identify a storm, a flood, a heatwave, etc. These anomalies have a duration and a magnitude, and the peak is a very partial description of these.
MAXIMUM LIKELIHOOD ESTIMATOR AND ITS VIRGAE
A technical point also arose in relation to over-threshold modelling. The use of a double threshold allows for a detailed sensitivity study with respect to the statistical threshold, since it is no longer necessary to decluster the time series at each threshold value, a process that can be rather timeconsuming.
If we consider the classic case study of Haltenbanken wave height peaks provided by the IAHR Working Group on Extreme Wave Analysis (van Vledder et al., 1994). The accuracy of the data is 0.01 m and the dataset is such that the range of thresholds can be between 7 and 10 m. Instead of letting the threshold value vary with an approximate step of, say, 0.2 m, we can now make the step match the accuracy of the data and let the threshold vary between 7 and 10 m every 0.01 m, i.e. a total of 301 values to be examined. More specifically, for each value, a GPD is fitted to the peak excesses over this value and the changes in the shape and modified scale parameters and the quantile value (e.g. the 100-yr peak) with respect to 9 are analysed. In accordance with the literature (in particular Coles, 2001), in Mazas and Hamm (2011) we had used the Maximum Likelihood Estimator (MLE), notably for its asymptotic properties of robustness, consistency and efficiency. But using MLE for such refined sensitivity studies led to the surprising plots shown in Figure 17.
Table of contents :
CONTEXT AND ACKNOWLEDGEMENTS
PART 1 PRESENTATION OF THE RESEARCH WORK
PRESENTATION OF THE DATASETS
1. AN INTRODUCTION TO METOCEAN EVENTS
1.1. WHAT IS METEO-OCEANOGRAPHY?
1.1.1. Spatial variability: a useful distinction in geographical domains
1.1.2. A far-reaching variety of time scales
1.1.3. Input data: measurements and model databases
1.2. METEO-OCEANIC EXTREMES IN ENGINEERING, RISK AND SOCIETY
1.2.1. Analyses for engineering
1.2.2. A simple definition of risk
1.2.3. Illustrative examples, at home
1.3. PHYSICS AND STATISTICS: A MATTER OF TERMINOLOGY
1.3.1. Physical definitions and… non-definitions
1.3.2. Statistics: probabilities of… what exactly?
1.3.3. A first approach to events: etymology and definitions
1.4. BRIEF DESCRIPTION OF PUBLICATIONS
1.4.1. A multi-distribution adaptation of the existing POT framework
1.4.2. A two-step framework for over-threshold modelling
1.4.3. Maximum Likelihood Estimator and its virgae
1.4.4. Extreme sea levels: a first approach to bivariate analysis
1.4.5. Joint occurrence of extreme waves and sea levels: from bivariate to multivariate _
2. FROM STORM PEAKS TO EXTREME UNIVARIATE EVENTS
2.1. A MULTI-DISTRIBUTION ADAPTATION OF THE EXISTING POT FRAMEWORK
2.2. A TWO-STEP FRAMEWORK FOR OVER-THRESHOLD MODELLING
2.3. MAXIMUM LIKELIHOOD ESTIMATOR AND ITS VIRGAE
2.4. CONCLUSIONS
3. EXTREME MULTIVARIATE EVENTS: FROM SAMPLING TO RETURN PERIOD, A MATTER OF POINT OF VIEW
3.1. EXTREME SEA LEVELS: A FIRST APPROACH TO BIVARIATE ANALYSIS
3.2. JOINT OCCURRENCE OF EXTREME WAVES AND SEA LEVELS: FROM BIVARIATE TO
MULTIVARIATE
3.2.1. A new classification for multivariate analyses
3.2.2. Sampling: a description of events
3.2.3. Dependence: assessment and modelling
3.2.4. Joint distribution: a first interpretation
3.3. CONSIDERATIONS ON RETURN PERIODS
3.3.1. What is a multivariate return period?
3.3.2. Bivariate return period of source variables vs. univariate return periods of response variables
3.3.3. Return periods and contours
3.3.3.1. Contours for event-describing values
3.3.3.2. Contours for sequential values
4. CONCLUSIONS AND PERSPECTIVES
4.1. MAIN RESULTS
4.2. DISCUSSION
4.3. PERSPECTIVES
GLOSSARY
REFERENCES