Statistical Concepts for Risk Management
The concept of Statistical Concepts for Risk Management will be analyzed in this article.The risk manager should have working knowledge of certain statistical concepts as a basis for understanding risk management theory. The discussion here is not intended to be complete; basic books on statistics should be consulted for a more elaborate treatment. Business Training in Kenya has more topics worth reading.
Terminologies Used in Statistical Concepts for Risk Management
Probability: Probability is the chance of occurrence of a given event. In insurance situations probability often is expressed as percentage of times which in the long run a loss-producing event will happen. Thus, fire frequency may be stated as a probability of .5 percent per year (.005) in a given territory for a certain type of construction.
Probability Distribution: A probability distribution is a listing of all possible events in a set together with the probability that each event will occur. Suppose, for example, we are interested in studying how accidents are distributed in a given plant which employs 1,000 men.
From past records over several years, the risk manager discovers that in 60 percent of the years there were no accidents. In 20 percent of the years there was one accident, in 10 percent of the years there were two accidents, in 6 percent of the years there were three accidents, and in 4 percent of the years there were four accidents. A probability distribution describing these findings would appear as follows:
Possible event (accidents) | Probability of occurrence |
01234Total | .60.20.10.06.041.00 |
Theoretical Probability Distribution: They are those whose shape is established by some mathematical formula. These distributions are useful because they possess known characteristics which can facilitate the analysis of loss frequencies they describe. Examples of theoretical distributions often used in insurance problems are the binomial, the normal, and Poisson. Each of these has complex formulas which will not be given here. Examples of how theoretical probability distributions may be useful follow.
Mean: The mean is an arithmetic average of a group of numbers. For example, the mean of a binomial probability loss distribution may be given by the letters np where n is the number of possible events and p is the probability of loss. Thus, if there are 100 automobiles; n would be 100 since it is theoretically possible for all 100 autos to be involved in a loss. If the annual probability of loss is found to be 5 percent, the mean annual loss would be .05(100), or 5 autos.
Standard Deviation: Standard deviation is a measure of dispersion of a probability distribution. It is also the most widely accepted measure of risk. The larger (smaller) the variation of numbers in a probability distribution from the mean, the larger (smaller) will be the standard deviation. For example, if a risk manager learns that each year the number of deaths in a work force of 10,000 is, say, 10 and that this number has never been less than 9 or more than 11, it is obvious that the dispersion, and standard deviation, will be less than if the deaths ranged, say, from 5 to 15, averaging out to be 10.
In actually calculating standard deviation, one proceeds as follows.
Assume that for the past five years deaths in a work force have numbered 10, 8, 12, 13, and 7, respectively. The total is 50 and the mean, 10 deaths per year. Now calculate the deviation of each year’s deaths from 10 and square the results. The deviations are 0, 2, — 2, — 3, and 3. and the squared deviations are 0, 4, 4, 9, and 9, Next, sum these numbers, take the average, and extract the square root. The sum is 26, mean is 5.2 (also known as the variance), and the square root of 5.2 is 2.28, the standard deviation.
Standard error is the standard deviation of mean values taken from successive samples of data drawn from a given population.
Coefficient of Variation: A way to gauge the importance of any standard deviation and to compare different standard deviations as a measure of relative risk in different situations is to divide the standard deviation by the mean. This measure is known as the coefficient of variation. In the above example, the coefficient of variation is the ratio of 2.28 to 10, or .228, or 22.8 percent. A coefficient of variation is also a useful way to express risk (uncertainty) and to compare the risk attaching to different sets of loss exposures. In a typical situation, for example, the risk in automobile liability losses is much higher than the risk in workers’ compensation losses because auto liability losses are usually less frequent, but more severe than industrial injury losses.
Confidence Intervals: In theoretical loss probability distributions, the analyst may state in advance the number of losses which are expected to occur within different ranges of the mean—that is, within so many standard deviations either side of the mean. In the normal distribution, for example, which is bell-shaped, 68.27 percent of all of the numbers in the distribution fall within one standard deviation of the mean, 95.45 percent fall within two standard deviations of the mean, and 99.73 percent fall within three standard deviations of the mean?
When the problem is to estimate the mean number of losses in a “population” using sample information, the concept of confidence intervals is useful. The risk manager can select the degree of confidence he wishes in making such estimates. If certain statistical conditions are met in selecting the sample, the risk manager may behave as if the mean number of losses occurring in the sample represents the true mean number of losses in the total population, within a given error range and with a given probability; of being correct.
Thus, the risk manager may be able to state, “I can be 95.45 percent sure” that the populations mean number of losses will fall within the range of two standard errors from the sample mean. If the mean is 10 losses and the standard error is 1.02 losses, this means that the risk analyst can predict that the population mean number of losses will be within the range of 10 plus or minus 2 (1.02) or between 7.96 and 12.04. The probability that he will be right is .95.
Expected Value: One of the most useful statistical concepts in risk management is that of expected value, the result obtained by multiplying the value of each possible event times its respective probability and then summing. For example, assume there are only two possible events, “fire” and “no fire” with corresponding probabilities of .01 and .99, respectively. Assume that if fire occurs the loss is $10,000, but if no fire occurs the loss is zero. The expected value of loss by fire is $10,000(.01} + $0(.99) = $100. The expected value is the mean of the above probability distribution. It expresses the average long-run loss which an insurer would have to pay if it insured this event, and thus summarizes the “pure premium” calculation which is the starting point for determining the final premium.
Rules of Probability Analysis in Statistical Concepts for Risk Management
There are certain basic underlying assumptions of probability analysis which should be observed if sampling technique is to be successfully employed in loss prediction.
The sample (or set) from which conclusions are drawn must be randomly selected from the larger population comprising the universe of all possible events. If this requirement is not met, in the case of the simple random sample, for example, not all the items have an equal chance of being drawn and generalizations about the larger population of events will not necessarily be true.
All weights assigned to probability statements must be positive. Probability is so defined that it cannot be a negative number. Rather, probability is expressed as a number between 0 and 1. Probabilities assigned to a set of mutually exclusive and collectively exhaustive events must total to 1. Events are mutually exclusive when there is no possibility that if one event occurs, the other can also occur. A set of events is collectively exhaustive if it represents all possible events in the set. We will illustrate this situation below.
If events in the sample occur independently of one another and are randomly selected, certain calculations become possible which are of great value in risk management and decision making. Events are said to occur independently of one another if the outcome in one event does not affect the probability of occurrence of another event. Thus, if it may be assumed that because a fire has occurred once, there is no necessary change in the probability of having a second (or third) fire, we can say that fire losses are independent of one another.
If we know the probability of an independent even tin a set of mutually exclusive and collectively exhaustive events, we may employ certain rules such as the additive and compound probability rules. Under the additive rule, for example, if the probability of occurrence of four events in such a set is .25 for each event, the probability of occurrence of either of two events is .50, (.25 + 25); any three events, .75; and any four events, 1.0. Thus, the probability assigned to all the events must total to 1. The compound probability rule states that the probability of simultaneous or consecutive occurrence of two or more events in a set of mutually exclusive and collectively, exhaustive events is the product of their individual probabilities. For example, assume there are two decks of well-mixed cards and we wish to know the probability of drawing an ace from each deck on the first draw. The events would be independent of each other since drawing an ace from one deck would not influence the probability of drawing an ace from the second deck. The probability of this occurrence would be the product of the separate probabilities, or 4/52 X 4/52 = 16/2704, However, if we draw two cards from one deck only and we obtain an ace on the first draw, the event “draw an ace on the second draw” is not independent of the first, since there ate now only 51 cards left to draw from. Accordingly, the probability of drawing two aces from the same deck would be 4/52 x 3/51, or 12/2652. This second example is an example of conditional probability. The probabilities of all possible events in this example must total to 1.0, as shown in Table 3-1.
Conditional probability is the probability of some event, given the occurrence of some other event or some combination of events. The event in question is no longer independent, but depends on some prior condition being fulfilled. For example, assume that there are four possible events with the probability given in Table 3-2.
Assume that we wish to know what the probability of two or more collisions will be, if there are any collisions at all. By the additive rule, we know that the probability of two or more collisions is .10 (the sum of probabilities of events 3and 4). However, we are redefining events and are imposing a limitation that involves only a part of the sample set by the conditions set forth. This part is restricted to the sum of the probabilities involving any collision (the sum of events 2, 3, and 4). These probabilities total .30. The denominator of the probability equation is therefore .30, and the conditional probability is the ratio .10/.30, or 1/3, that if there are any collisions at all, the probability of two or more collisions is one in three.
Probability | |||
Event | 1^{st} Draw | 2^{nd} Draw | |
Draw two acesDraw no acesDraw one ace only,1^{st}drawDraw one ace only, 2^{nd} draw | 4/5 x48/52 x4/52 x48/52 x | 3/5147/5148/514/51 | 12/26522256/2652192/2651 |
Table 3-2
Event | Probability |
No collisions occur in one yearOn collision occursTwo collisions occurs.three or more collisions occurTOTAL | .70.20.06.041.00 |
In symbols,
P (E_{1}) given E_{2} = P(E_{1}) =
P (E_{2})
where E[ is the event two or more collisions will occur (sum of 3 and 4 above), and E_{2} is the event one or more collisions will occur (sum of 2, 3, and 4 above).
Applying Probability Rules in Statistical Concepts for Risk Management
The above rules of probability analysis must be used with care because in actual practice the assumptions on which these rules are based may not be met, or may be met only approximately. Often one of the principal weaknesses of data available to the risk manager is that the data may not be truly representative of the larger population from which they are drawn and thus may lead to inaccurate conclusions. For example, a risk manager may observe accidents in a plant over a period of five years and calculate the mean and standard deviation of losses. From this he reasons that the best single estimate of loss for the next year is the average of the past five years. How might this be misleading? Lack of representativeness may be due to several factors; (1) the sample may not be large enough; (2) the sample may be biased; (3) the sample may not have been drawn at random; or (4) the events may not be independent of one another. In other words, there may be large sampling error which should be recognized in interpreting the results. Sampling error might be caused by generalizing from a sample which is too small and hence unstable. (The number of losses may vary 100 percent from year to year.) The sample may be drawn from a single month, such as January, or from a single plant, neither of which may be a typical month for accidents or a typical plant for working conditions. In some plants losses may be unusually high because they are not independent; e.g., unsafe acts of some workers may be copied by other workers in the plant. Accident frequency may increase over time as long as this condition exists.
The firm may utilize sources other than its own records in order to increase the accuracy of estimates of probability and variation of losses. Insurance industry records, trade association data, and governmental studies are among the sources which might be utilized.
In the absence of objective data on which to base estimates of probability and variance of losses, subjective estimates can be made, based on prior general experience of managers.
Another basic rule of probability, the central limit theorem, may be utilized to forecast losses by statistical means.
Forecasting Losses by Statistics.
In most risk management problems involving loss estimation, it is not practical to take a large number of samples. Sampling experiments have shown that when a number of random samples are taken from a population, the mean of each sample will vary from the mean of the population. However, if the number of samples taken is large enough, and the sample means are plotted on graph paper, a normal curve of error will result, i.e., it will be bell shaped. This happens even if the data in the original population or from a single sample are not distributed normally. This result has been proved mathematically and is the essence of the central limit theorem, of which the law of large numbers is a special case.
Standard Error.
The standard deviation of these sample means has a special name, standard error. The standard error is used to draw inferences about the universe. These inferences include: (1) the mean of the random samples approaches the mean value of the population from which they are drawn when the number of samples is large enough; (2) one standard error includes 68.25 percent, two standard errors include 95.45 percent, and three standard errors, 99.73 percent of the area under a normal curve. The formula for standard error is
SE =
where n is the sample size and s is the assumed standard deviation of the population. Let us illustrate the use of standard error in the following risk management problem: A risk manager observes a sample of loss data (see below) from a sample of 1,000 workers (n) in one year, and he wishes to draw inferences about future losses which might be expected over a large number of years. He calculates the mean (M) and the standard deviation (s) of losses in the sample. The risk manager does not know what M and s are for the whole “population,” but the mean and standard deviation of his sample are the best single estimate of the mean and standard deviation of all losses in the population—i.e., all losses to be expected in a large number of years in the future. What variation in losses can be expected in all future years? To answer this question, the risk manager first calculates the standard error, by the above formula. The steps are as follows:
Step 1. Calculation of the mean of losses in the sample is:
Dollars of Loss X | Number of workers n | Total loss |
0110200500 | 80010070301000 | 0110001400015000$40000 |
Mean loss (M) $40
Calculation of the standard deviation of this sample is:
X-M | (X-M)^{2} | n | Weighted squared deviation |
-4070160460Total | 16000x4900x25600211600x | 80010070301000 | 12800004900001729200063480009910000 |
Mean squared deviation $9,910 Standard deviation =
Step 2 Calculate the standard error (SE)
SE =
Note that although the standard deviation ($99.55) is relatively large, when compared to the mean loss (S40) the standard error ($3.15) is quite small.
The risk analyst can assume in the above case that for all possible future periods, the mean loss will lie within a known range of $40. Specifically, the risk analyst can be 95.45 percent confident that the mean loss will fall in the range $40 ± 6.30 (two standard errors). At the 99.73 percent confidence level, the mean loss will fall within the range $40 ± $9,45 (three standard errors). Note that in this calculation the risk analyst may draw inferences only about the mean loss for a large number of years. The mean loss for the next year of any single year may have a greater variation than that shown above. Thus, in the above example the distribution of losses from 1,000 workers in a single year could look very differently from the above.
Setting Loss Reserves. Using the above analysis and knowing that the firm has 1,000 workers, the risk analyst may predict that the average annual losses will not exceed $49.45 x 1,000, or $49,450, 99.73 percent of the time. Using this information, the risk manager can estimate the size of a loss reserve fund in the event he plans to recommend self-insurance for the risk. In this case, he might recommend a self-insurance fund of $50,000,^{;} even though the best estimate of losses for any one year is $40,000.
Rate Negotiation. The risk manager may also use the above analysis in his negotiations with commercial insurers on premium rates, The pure premium, or expected value of the loss, is $40; the relative risk (at the 99.73 percent confidence level) is $9.45/540, or 24 percent. If the insurer requires an expense loading of 35 percent, the gross premium should approximate $40/(1 – .35), or $61.54. plus whatever charge for risk the insurer might make. Because the risk manager can demonstrate that there is little chance that in the long run the pure premium will exceed the expected by more than $9,45 per worker, the final premium quoted per worker should not exceed
Some insurers may quote less than this amount because their aversion toward risk may be less than other insurers. (Measurement of subjective risk attitudes is discussed below.)
Conclusion on Statistical Concepts for Risk Management
The above calculation understanding requires a prerequisite of basic statistical knowledge