applet-magic.com
Thayer Watkins
Silicon Valley
USA

The Probability Distributions
of Rainstorm Intensities:
Levy Stable Distributions?

## Background

Generally the topic is the nature of the statistical distribution of rainstorm intensities. Hydrologists use some standard types of probability distributions to describe the histograms of precipitation. These standard types include as the normal, the log-normal and the negative exponential distributions. Based upon these distributions and the parameters for these distributions estimated from past data hydrologists calculate the probability of extreme storms. Usually this analysis is stated in terms of the size of the storm that would have a specified level of probability; i.e., a 100-year storm is one whose probability of occurrence in a year is 1/100. More precisely a 100-year storm is one such that the probability of the occurence a storm of that severity or greater in one year is 1/100.

The perplexing problem is that extreme storms seem to occur more frequently than the probability analysis indicates. The catastrophic floods in China in August of 1975 is a case in point. Dams in the Huai River Valley in North Central China were supposedly built to withstand 1000-year floods but a typhoon that moved into the area dropped so much rain over a three-day period, more than three times the design limit, that two major dams failed and took with them a string of sixty two dams causing much loss of life and property. Eighty five thousand people lost their lives and eleven million people severely affected by the catastrophe, many hundreds of thousand were homeless for months. There were contributing factors in the way the dams were managed but the root cause was probably the underdesign of the dams for the extreme storms of the area and this most likely came from the flaws in the probability model used to estimate the probabilities of different size storms.

Using a particular probability distributions such as the negative exponential curve is inherently risky. The parameters of the curve are estimated on the basis of past data which might not involve any extreme occurrences. To use this result to estimate the probabilities of extreme cases without any basis for the belief that the probability distribution is, in fact, a negative exponential curve is very shaky.

There is however sometimes a theoretical basis for expecting that phenomena will have certain types of probability distributions. This basis is what is known as the Central Limit Theorem. The classical Central Limit Theorem says that under special conditions if a variable is the sum of a large number of independent random variable then it will have a normal distribution. The special conditions are in fact quite general and for a long time it was thought that the Central Limit Theorem was universally true. The special condition is that the independent random variables must have finite variance. Other than that they can have any probability distribution whatsoever.

The normal distribution is however only one example of a stable distribution. A stable distribution is one such that if two random variables have that type of distribution then their sum will also have a distribution of that same type.

In the 1920's the French mathematician Paul Levy discovered that there were other types of stable distributions besides the normal distribution. The Italian sociologist Wilfredo Pareto had discovered that the income distributions in a wide variety of economic systems (capitalist, socialist and fascist) all could be described by the same type of distribution, the negative exponential. Because of Pareto's early investigation of probability distributions the stable distributions are usually called Lévy-Pareto stable distributions.

The significance of the L&eacut;vy stable distribution is that there is a generalization of the classical Central Limit Theorem; i.e., the sum of a large number of independent random variables will approach a Lévy stable distribution.

The Lévy stable distributions are characterized by four parameters:

• Alpha: usually a is called the stability parameter. For the normal distribution a = 2. Alpha has to be in the interval 0<a≤2.
• Beta: usually ß is called the skewness parameter. For the normal distribution and any other symmetric distribution ß = 0. Beta can have any value in the interval -1≤ß≤+1.
• Nu: ν is called the scale parameter or the dispersion parameter. For the normal distribution ν is equal to the standard deviation. For non-normal distribution ν has a finite non-negative value but it is not the same as the standard deviation, which for non-normal stable distributions is infinite. nu can have any positive real number value; i.e.; ν>0..
• Delta: δ is called the mean or the measure of centrality. For the normal distribution and other stable distributions for which a>1 δ is the same as the mean value for the distribution. When a=1 the mean value is not defined so the value of δ for the distribution is not the same as the mean value. δ may have any real value; i.e., -∞<δ<∞.

Some cases for particular values of the parameters are shown below:

Prior to Paul Levy's mathematical analysis empirical investigators were finding cases in which the histograms of some variable, while generally looking like normal distributions, were deviating from the normal distribution in a systematic manner. For example, the economist Wesley Claire Mitchell in 1915 found that the distribution of the percentage changes in stock prices when compared to the best-fitting normal distribution consistently deviated from the normal distribution as shown below:

This sort of deviation means that there would be too many very small deviations from the average, too many very large deviations and too few moderate deviations. The extreme large changes were of particular interest because those were the cases of stock market booms and busts. Because a higher proportion of the probability was in the tails of the distribution compared with the case of the normal distribution such distributions were called fat-tailed distributions. They were also given a name based upon Greek, leptokurtic.

There are Lévy stable distributions that are leptokurtic. Furthermore, as mentioned above, there is a generalization of the Central Limit Theorem that says that the sum of a large number of independent random variables will have a stable distribution. Thus if some phenomenon such as changes in stock prices or rain from a storm is the result of a large number of independent influences then it would be expected that the distribution would be a stable distribution. If the distribution is a fat-tailed distribution then that fact would account for the unexpected occurrences of catastrophes.

The course project proposed is to use rainfall distribution data to estimate the stable distribution parameters that give the best fit. The first problem is to formulate what data is needed and compare that with the nature of the data that is available. The ideal data would have storms as the basic unit and give the total rainfall of each storm. The data that are collected are for specific units of time. The longer time units involve many, many storms. Some examples of the rainfall distributions for four different areas are shown below.

In economics the family of probability distributions that have unusually high probability of extreme events are called fat-tailed distributions. Benoit Mandelbrot found that these types of distributions are relevant for explaining financial market statistics.

## The Data

Below is a presentation of the results of fitting a Levy stable distribution to the rainfall statistics for San Jose, California using the records for 1908 to 2000. San Jose is not a heavy rainfall area but it had its case of a unexpectedly severe storm. In San Jose the rain typically comes from November to April. The summers and early fall have very little rain. In 1918 on September 11-13 San Jose had a heavy rainfall of about six inches. This is not a heavy rainfall by the standards of heavy rainfall areas around the world but for San Jose it was a catastrophe. San Jose, at that time, had an economic base in prune production. It was the prune capital of the world. The orchardists grew French plums which were collected and put out to dry into prunes. The September 11-13, 1918 rains came when the entire prune crop was out on the ground drying. Instead of drying the plums fermented and the entire crop was lost.

Rainfall data are usually reported for fixed periods of time such as months and years. The aggregation of data for storms into data for months would be affected by the extent of temporal correlation. Temporal correlation of weather is an important matter but information relating to the extreme cases may be lost in the process of aggregating the data into months or years.

Fortunately data are available are daily rainfall totals. Thanks to the generous effort of Eric Olson of the Santa Clara Valley Water District the daily rainfall figures for San Jose over the period from 1908 to 2000 were provided in computer files for analysis. From this data some semblance of the distribution of rainfall by storms can be computed.

Hydrologists characterize rainstorms by three aspects: 1. intensity of rainfall 2. areal distribution 3. temporal pattern and duration. Another variable needs to be considered as well, the storm's speed because the amount of water dropped on a fixed area depends on how rapidly the storm passes over that area as well as on the storm's intensity. A pattern has been discerned concerning the spatial distribution of a rainstorm such that two parameters, the maximum intensity and a gradient, adequately describe the spatial distribution of rainfall intensity. Likewise the the time pattern of a rainstorm has a reasonable degree of regularity, perhaps differing only by duration. Thus the crucial characteristic of a storm is the maximum intensity.

The frequency and intensity of rainstorms in a particular area depends upon the time of the year. Fortunately data available allow for the computation of seasonal distributions. For the project only monthly distribution are tabulated but more detailed or more aggregate distributions can be determined if there is any evidence that they would be more relevant than monthly tabulations.

Rainfall data are for rainfall at a fixed location rather than for a particular location within a rainstorm, such as the point of maximum intensity. Analysis indicates that the data for a fixed location will show more variability than the data for the maximum rainfall intensity would show.

## Preliminary Bibliography

### Hydrology

• Elizabeth M. Shaw, Hydrology in Practice, 2nd edition, Van Nostrand Reinhold (Internataional) Co., London, 1988.
• Nathaniel B. Guttman, "The Use of L-Moments in the Determination of Regional Precipitation Climates," in Journal of Climate Vol. 6 (December 1993) pp. 2309-2325.
• Nathaniel B. Guttman, J.R.M. Hosking and James R. Wallis, "Regional Precipitation Quantile Values for the Continental United States Computed from L-Moments," in Journal of Climate Vol. 6 (December 1993) pp. 2326-2340.
• Armand Nzeukou and Henri Sauvageot, "Distribution of Rainfall Parameters Near the Coasts of France and Senegal," in Jouirnal of Applied Meteorology , Vol. 41, No. 1 (January 2002), pp. 69-82.
• Johnny Wei-Bing and J. David Neelin, "Considerations for Stochastic Convective Parameterization," in Journal of the Atmospheric Sciences, Vol. 59, No. 5 (March 2002), pp. 959-975.

### Stable Distributions

• Paul Levy, Calcul des Probabilité, Gauthier-villars, Paris, 1925.
• Paul Levy, Processus Stochastiques et Mouvement Brownian, Gauthier-villars, Paris, 1948.
• Paul Levy, Théorie de l'Addition des Variables Aléatoires, Gauthier-villars, Paris, 1954.
• Paul Levy, Quelques Aspects de la Pensée d'un Mathématicien, Albert Blanchard, Paris, 1970.
• B.V. Gnedenko and A.N. Kolmogorov, Limit Distributions for Sums of Independent Random Variables, (tr) K.L. Chung, Addison-Wesley Publishing, Cambridge, MA, 1954.
• Gennady Samorodnitsky and Murad S. Taqqu, Stable Non-Gaussian Random Processes, Chapman & Hall, New York, 1994.
• William J. Adams, The Life and Times of the Central Limit Theorem<.I>, Kaedmon Publishing Co., New York, 1974.
• William Feller, An Introduction to Probability Theory and Its Applications, Vol. II, John Wiley & Sons, New York, 1966.
• Estimation of Distribution Parameters and Applications
• Benoit Mandelbrot, "The Variation of Certain Speculative Prices," Journal of Business, Vol. 36, No. 4 (October 1963), pp. 394-419.
• William H. DuMouchel, "On the Asymptotic Normality of the Maximum Likelihood Estimate When Sampling from a Stable Distribution," in The Annals of Statistics, Vol. 1, No. 5 (1973), pp. 948-957.
• Bruce M. Hill, "A Simple General Approach to Inference About the Tail of a Distribution," in Annals of Statistics, Vol. 3, No. 5 (1975), pp. 1163-1174.
• Eugene Fama and Richard Roll, "Parameter Estimates for Symmetric Stable Distributions," in Journal of the American Statisitical Association, Vol. 66, No. 334 (June 1971), pp. 331-338.
• Stephen M. Kogon and Douglas B. Williams, "Characteristic Function-Based Estimation of Stable Distribution Parameters," in Robert J. Adler et al (eds), A Practical Guide to Heavy Tails, Birkhäuser, Boston, 1998, pp. 311-335.
• S. James Press, "Estimation of Univariate and Multivariate Stable Distributions," Journal of the American Statitistical Association, Vol. 67, No. 340, (December 1972), pp. 842-846.
• I.A. Koutrouvelis, "Regression-type Estimation of the Parameters of Stable Laws" Journal of the American Statitistical Association, Vol. 75, No. , (1980), pp.918-928.
• I.A. Koutrouvelis, "An Iterative Procedure for the Estimation of the Parameters of Stable Communications in Statistics-Simulation and Computation, Vol. B10, (1981), pp. 17-28.
• Steve Mittnik and Marc S. Paolella, "A Tail Estimator for the Index of the Stable Paretian Distribution," in Communications in Statistics--Theory and Methods, Vol.27 No. 5 (1998) pp. 1239-1262.