Distribution functions of extreme sea waves and river discharges Fonctions de distribution des vagues de mer et des décharges de rivières extrêmes - PDF

Journal of Hydraulic Research Vol. 46, Extra Issue 2 (2008), pp International Association of Hydraulic Engineering and Research Distribution functions of extreme sea waves and river discharges

Please download to get full document.

View again

of 11
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.


Publish on:

Views: 80 | Pages: 11

Extension: PDF | Download: 0

Journal of Hydraulic Research Vol. 46, Extra Issue 2 (2008), pp International Association of Hydraulic Engineering and Research Distribution functions of extreme sea waves and river discharges Fonctions de distribution des vagues de mer et des décharges de rivières extrêmes PIETER H.A.J.M. VAN GELDER, IAHR Member, Section of Hydraulic Engineering, Faculty of Civil Engineering and Geosciences, Delft University of Technology, The Netherlands. Stevinweg 1, 2628 CN Delft, P.O. Box 5048, 2600 GA Delft, The Netherlands. Tel.: ; fax: ; CONG V. MAI, Faculty of Coastal Engineering, Water Resources University of Vietnam, 175 Tayson, Dong Da, Hanoi, Vietnam & Section of Hydraulic Engineering, Faculty of Civil Engineering and Geosciences, Delft University of Technology, The Netherlands. Stevinweg 1, 2628 CN Delft, P.O. Box 5048, 2600 GA Delft, The Netherlands. Tel.: ; fax: ; ABSTRACT Some of the important elements to be considered by the designer in the field of water defenses and hydraulic structures include the determination of the maximum environmental loads such as maximum wave height, maximum water level and maximum river discharge at locations of interest. These maxima can be estimated by various statistical methods based on observations. The main point of interest is the behaviour of each method for predicting these extrema and its deviation from the true value. An overview of statistical methods is given to determine the extreme values of river and sea related variables. The method of Regional Frequency Analysis (RFA) is proposed to predict the occurrence probabilities of these extrema. The RFA intends to extend the dataset by pooling the data of neighboring locations and establishes the regional growth curve of a so-called homogenous region. An important aspect is the formation of the homogeneous regions because only for those regions data can be pooled. Statistical methods will also be presented to derive homogeneous clusters. Applications include the prediction of extreme river discharges in Northwest and Central Europe and extreme waves along the Dutch North Sea coasts. RÉSUMÉ Certains des éléments à considérer par l ingénieur opérant dans le domaine des défenses fluviales et maritimes et des structures hydrauliques comprennent la détermination des chargements environnementaux maximaux tels que la hauteur des vagues, le niveau des eaux et la décharge d eau maximaux aux points critiques. Ces maxima peuvent être estimés par diverses méthodes statistiques basées sur des observations. Le point d intérêt principal est le comportement de chaque méthode pour prédire ces extrema et leur déviation de la valeur réelle. Les méthodes statistiques pour prédire les valeurs extrêmes des rivières et de la mer sont passées en revue. La méthode de l analyse fréquentielle régionale (AFR) est proposée pour prédire les probabilités d occurrence de ces extrema. L AFR consiste a étendre le jeu de données en groupant les données de régions voisines et établir la courbe de croissance régionale d une région dite homogène. Un aspect important est la formation de régions homogènes car seules ces régions peuvent être groupées. Des méthodes statistiques sont aussi présentées pour définir des groupes homogènes. Ces méthodes peuvent être appliquées aux décharges extrêmes des rivières au Nord Ouest et au centre de l Europe ainsi qu aux vagues extrêmes le long des côtes Hollandaises de la mer du Nord. Keywords: Extreme values, goodness of fit, homogeneity, L-moments, maximum wave height, probability distribution, robust statistics, wind setup 1 Introduction In water defense and hydraulic engineering, important elements to be considered by the designer include the determination of the maximum environmental variables such as the significant and the maximum wave height, the maximum water level (sea, river, and lake), the maximum river discharge and its corresponding water level at a location of interest. The design of hydraulic engineering structures and insurance risk calculations, usually rely on knowledge of the occurrence frequency of these extreme events. The estimation of these frequencies is, however, difficult because extreme events are by definition rare and data records are often short. In other words: The uncertainties related to the distribution analysis of extreme values are high. The parameters of these distribution functions for estimation of extreme values can be estimated by various methods. The main point of interest is the behaviour of each method for predicting the p-quantile, i.e., the value which is exceeded by the random variable with probability p, where p 1 (van Gelder, 2000). The estimation of extreme quantile corresponding to a small probability of exceedance is commonly required in the risk analysis of hydraulic structures. Such extreme quantiles may represent Revision received August 28, 2007/ Open for discussion until December 31, paper /6/29 12:32 page 280 #1 Journal of Hydraulic Research Vol. 46, Extra Issue 2 (2008) Distribution functions of extreme sea waves and river discharges 281 design values of environmental loads (wind, waves, snow, and earthquake), river discharges, and flood levels specified by design codes and regulations. It is desirable that the quantile estimate is unbiased, that is, its expected value should be equal to the true value. It is also desirable that an unbiased estimate be efficient, i.e., its variance should be as small as possible. The problem of unbiased and efficient estimation of extreme quantile from small samples is commonly encountered in the civil engineering practice. For example, annual flood discharge data may be available for the past 50 to 100 years and on that basis one may have to estimate a design flood level corresponding to a 1000 to 10,000 years return period. Due to this fact the statistical extrapolation to predict extreme values is often contaminated by sampling and model uncertainty. This has motivated the development of approaches to enlarge the sample size in the extreme value analysis. This paper presents actual research available to estimate the extreme waves, surges and river distributions including techniques to enlarge the dataset by using available information. The method of Regional Frequency Analysis (RFA) is applied to predict the occurrence probabilities of extreme values of these variables. Two specific applications are addressed in this paper: (i) Determination of probability distributions of river data and (ii) estimation of extreme maximum values for sea data. 2 Overview of study approach It has long been recognized that many annual environment datasets are too short to allow for a reliable estimation of extreme events. Thus, the difficulties are related both to the identification of the appropriate statistical distribution for describing the data and to the estimation of the parameters of a selected distribution. Therefore, the distribution for one sample can be more accurately estimated by using information not just from that sample but also from the other related samples. In the environmental sciences the data samples are typically measurements of the same kind of data made at different sites, and the process of using data from several sites to estimate the frequency distribution is known as regional frequency analysis. Regionalization provides a means to cope with this problem by assisting in the identification of the shape of potential parent distributions, leaving only a measure of scale to be estimated from the at-site data. This approach is used in this study to overcome the previously mentioned difficulties. Regional flood frequency analysis involves two major steps: (1) Grouping of sites into homogeneous regions and (2) regional quantile estimates at the sites of interest. The performance of any regional estimation methods strongly depends on the grouping of sites into homogeneous regions. Geographically contiguous regions were used for a long time in hydrology, but have been criticized for being of arbitrary character, because the geographical proximity does not guarantee hydrological similarity. During the last ten years researchers have attempted to develop methods in which similarity between sites is defined in a multidimensional space of catchment-related or statistical characteristics. A significant contribution to solve the delineation issue is the region-of-influence approach (e.g., Burn, 1990; Feaster and Tasker, 2002). This method dispenses completely with the classical notion of regions in that each site is allowed to have its own region. The site of interest is located at the center of gravity in a space of relevant flood and/or catchments characteristics, each weighted properly according to its relevance. The method also involves the choice of a distance threshold; only sites whose distance to the target site (in the weighted attribute space) does not exceed this threshold are included in the region-of-influence. An advantage of the region-of-influence method is that each site can be weighted according to its proximity to the site of interest in the estimation of a regional growth curve. In this paper, the cluster analysis is used as a first attempt to group sites into homogeneous region. The delineation of a homogeneous region is closely related to the identification of the common regional distributions that apply within each region. A region can only be considered homogeneous if sufficient evidence can be established that the data at different sites of the region are drawn from the same parent distribution, except for the scale parameter. L-moment ratio diagrams and L-moment diagrams were used as popular tools for regional distribution identification, testing for outlier sites, identifying a regional distribution in numerous other studies (Chowdhury et al., 1989; Pilon and Adamowski, 1992; Vogel and Fennessey, 1993; Hosking and Wallis, 1993). Further, Hosking and Wallis (1997) developed several tests for regional studies. They gave guidelines for judging the degree of homogeneity of a group of sites, and for choosing and estimating a regional distribution. These L-moments related techniques are also applied in this study to test the sea wave and river discharge datasets. 3 Statistical background for RFA 3.1 L-moment statistics L-moments are summary statistics for probability distributions and data samples. They are analogous to ordinary moments, because of providing measures of location, dispersion, skewness, kurtosis, and other aspects of the shape of probability distributions or data samples, based on the linear combination of the ordered data. The theoretical advantages of L-moments over ordinary moments and detail determination of L-moments for sample data and probability distributions can be found in Hosking and Wallis (1997). 3.2 Discordance measure Suppose that there are N sites in the regions. Let u i = [t (i) t (i) 3 t (i) 4 ]T be a vector containing the t, t 3, and t 4 values for site i with the superscript T denoting transposition of a vector. Let u be the un-weighted group average. The matrix A of sums of squares and cross-products is defined by N A = (u i u)(u i u) T. (1) i=1 paper /6/29 12:32 page 281 #2 282 P.H.A.J.M. van Gelder and C.V. Mai Journal of Hydraulic Research Vol. 46, Extra Issue 2 (2008) Then the discordance measure D i is D i = 1 3 N(u i u) T A 1 (u i u) (2) A site is declared as discordance if its D i value exceeds the critical value. The critical value depends on the number of sites to be considered; for regions with more than 15 sites, the critical value is 3.0. The critical value decreases as the number of sites reduces (Hosking and Wallis, 1997). 3.3 K-means clustering Suppose that there are already hypotheses concerning the number of clusters in the considered cases or variables. One may wish to form exactly K clusters that are to be as distinct as possible. This is the type of research question that can be addressed by the K-means clustering algorithm. In general, this produces exactly K different clusters of greatest possible distinction. In a river map, a hunch may fall basically into eight different categories with regard to the physical aspects. It can be questioned whether this intuition can be quantified, i.e., does a K-means cluster analysis of the physical quantities indeed produce the eight clusters as expected. If so, the means on the different measures of physical quantities for each cluster would represent a quantitative way of expressing the hypothesis or intuition such as mountainous sites in cluster 1 or densely populated sites in cluster 2, etc. Computationally, one may think of this method as an analysis of variance (ANOVA) in reverse. The method will start with K random clusters, and then move objects between those clusters with the goal to (i) minimize the variability within the clusters and (ii) maximize the variability between the clusters. This is analogous to ANOVA in reverse in the sense that the significant test in ANOVA evaluates the between group variability against the within-group variability when computing the significant test for the hypothesis that the means in the groups are different from each other. In K-means clustering, objects are moved in and out of groups (clusters) to get the most significant ANOVA results. Usually, as a result of the K-means clustering analysis, the means for each cluster on each dimension would be examined to assess how distinct the K clusters are. Ideally, very different means for most, if not all dimensions, used in the analysis would be obtained. The magnitude of the F values from the analysis of variance performed on each dimension is another indication of how well the respective dimension discriminates between clusters (Hartigan and Wong, 1979; Everitt, 1993). 3.4 Robust distances One may think of the variables as defining a multidimensional space in which each observation can be plotted. Also, one can plot a point representing the means of all variables. This mean point in the multidimensional space is also called the centroid. The Mahalanobis distance is the distance of a case from the centroid in the multidimensional space. Thus, this measure provides an indication of whether or not an observation is an outlier. The classical Mahalanobis distance is defined as (Mahalanobis, 1934) MD 2 i = (x i T(X)) T C(X) 1 (x i T(X)) (3) where T(X) and C(X) are the usual mean and covariance estimates T(X) = x = 1 n C(X) = 1 n 1 n x i (4) i=1 n (x i x)(x i x) T i=1 and x i = (x i1,x i2,...,x ip ) T for i = 1,...,nis the ith row of data matrix X; n is the number of observations; and p the dimension of space. Points whose MD 2 i is large are flagged. It is well known, however, that the sample mean and covariance in a multivariate dataset are extremely sensitive to outliers. Rousseeuw and Leroy (1987) proposed to replace the classical mean and covariance in the expression of the Mahalanobis distance by their high breakdown point (HBP) robust analogues (such as the MVE (Minimum Volume Ellipsoid) or MCD (Minimum Covariance Determinant). The breakdown point (BP) is the smallest percentage of contaminated data that can cause the estimator to take on arbitrarily large aberrant values. The BP of the classical estimates based on the method of Maximum Likelihood, the method of Moments, or the method of least squares is zero. The MCD estimator for location T(X) is defined as the mean of the k points of X, where k is equal to [(n+p+1)/2], for which the determinant of the covariance matrix is minimal. Moreover, Vandev and Neykov (1993) showed that the breakdown point of the MCD is equal to (n k)/n if k is within the bounds (n + p + 1)/2 k n p 1 and n 3(p + 1). Note that the number of observations should be at least three times larger than the dimensionality +1. If k = (n + p + 1)/2, then the BP is equal to 1/2 asymptotically. For more information on the efficient algorithms for calculating the MCD and other robust covariance estimates with high breakdown points, see Rocke and Woodruff (1996). To improve the efficiency of the estimates, Rousseeuw and van Zomeren (1990) performed one step improvements for the location and scatter as T 1 (X) = w ix i w i (5) n C(X) = w i (x i T 1 (X))(x i T 1 (X)) T / w i i=1 where { 1 if (xi T(X)) T C(X) 1 (x w i = i T(X)) c 0 otherwise with the cut-off value c = χ 2 p, The observations with zero weights can be interpreted as outliers. (6) paper /6/29 12:32 page 282 #3 Journal of Hydraulic Research Vol. 46, Extra Issue 2 (2008) Distribution functions of extreme sea waves and river discharges 283 Other weights which are based on one-step M-estimates, taking T(X) and C(X) as initial values are Re-weighting (REW) { 1 if t c w i (t) = 0 if t c with c = χ 2 (p, 0.975) Huber s weight (HUB) { 1 if t c w i (t) = with c/t if t c { }/ c = (2p 1) + e 2 and e = 2.25 (7) Hampel s weight (HAM) 1 if t c w i (t) = c { t exp 12 } (t c)2 /b 2 with if t c { }/ c = (2p 1) + e 2 and e = 2.25, b = 1.25 Hereafter these are denoted as RD i (MCD), RD i (REW), RD i (HUB), RD i (HAM), and RD i (T-BW), respectively which are the robust distances (analogs of the Mahalanobis distance) based on the MCD estimator of the multivariate location and scatter and some of its one-step improvements based on HUB, HAM, Constrained M-estimates (T-BW) (Rousseeuw and van Zomeren, 1990; Rocke and Woodroff, 1996). 4 Regional frequency analysis The main idea of the regional frequency analysis is to extend the environmental dataset by using information from neighboring locations which are considering from homogeneous regions. The RFA resolves this problem by trading space for time, because data from several sites are used to extend datasets in estimating the event frequencies at any site. L-moments are used to facilitate the estimation process in this regional frequency analysis. In RFA the data are required to originate from homogeneous regions. To aid the presentation, a formal definition is given as follows: Let Q ij, j = 1,...,n i, be observed data at N sites of a region, with sample size n i at site i, and let Q i (F), 0 F 1 be the quantile function of the distribution at site i. A region of N sites is called homogeneous if Q i (F) = µ i q(f), i = 1,...,N. (8) The quantile function of the regional frequency distribution is q(f) = Q i (F)/µ i, i = 1,...,N. (9) where µ i is the site-dependent scale factor and q(f) is the quantile of the regional frequency distribution. The site dependence scale is naturally estimated as µ i = Q i, the sample mean of data at site i. The basic dimensionless rescaled data q ij = Q ij / µ i, j = 1,...,n i, i = 1,...,N. are used for estimating the regional growth curve q(f),0 f 1. How can homogeneous regions be derived on the basis of statistical techniques and physics-based considerations? Hosking and Wallis (1997) developed a unified robust approach to the RFA, based on the L-moments described by Hosking (1990) involving objective and subjective techniques for defining homogeneous regions, of assigning sites to regions, identifying and fitting regional probability distributions to data and testing hypotheses about distributions. By robustness Hosking and Wallis (1997) refer to statistics that work well even if the data are contaminated or the model assumptions are slightly violated. The advantages of their approach over the conventional method of moments and the maximum likelihood method are the smaller effect of outliers and more reliable inference from small samples, as the L-moments are a linear combination of data. The physical based consideration supports this data screening to see if datasets have similar physical characteristics for a homogeneous consideration and it later can be evident for excluding heterogeneity sites. In general the four main steps of the RFA procedure are: (i) screening of data; (ii) identification of homogeneous regions; (iii) choice of a
Related Search
Similar documents
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks