Areas covered in the fourth edition include a new chapter on risk prediction, risk reclassification and evaluation of biomarkers, new material on propensity analyses, and a vastly expanded chapter on genetic epidemiology, which is particularly relevant to those who wish to understand the epidemiological and statistical aspects of scientific articles in this rapidly advancing field. Biostatistics and Epidemiology was written to be accessible for readers without backgrounds in mathematics.

It provides clear explanations of underlying principles, as well as practical guidelines of "how to do it" and "how to interpret it. Ignore and show page. Sylvia Wassertheil-Smoller , Jordan W. Book will arrive in about weeks. Please allow another 2 weeks for shipping outside Estonia. Larger Image.

Keywords: Epidemiology - Statistical methods Clinical trials - Statistical methods. Description Table of Contents Author Biography Goodreads reviews Since the publication of the first edition, Biostatistics and Epidemiology has attracted loyal readers from across specialty areas in the biomedical community. Show more Show less. This book presents the basics on biostatistics and epidemiology in biomedical science and incorporates new information on genetic epidemiology and risk prediction based on new biomarkers. Panchal, Doody's Book Reviews, June, Fisher's Exact Test 2 Appendix 3.

This may be a mistake since in some cases a type II error is a more serious one than a type I error. In designing a study, if you aim to lower the type I error you automatically raise the 16 Biostatistics and Epidemiology: A Primer for Health Professionals type II error probability. To lower the probabilities of both the type I and type II error in a study, it is necessary to increase the number of observations.

It is interesting to note that the rules of the Food and Drug Administration FDA are set up to lower the probability of making type I errors. In order for a drug to be approved for marketing, the drug company must be able to demonstrate that it does no harm and that it is effective. Thus, many drugs are rejected because their effectiveness cannot be adequately demonstrated.

In other words, the FDA doesn't want a lot of useless drugs on the market. Drug companies, however, also give weight to guarding against type II error i. Remember, a type I error also known as alpha means you are stating something is really there an effect when it actually is not, and a type II error also known as beta error mean you are missing something that is really there.

If you are looking for a cure for cancer, a type I I error would be quite serious. You would miss finding useful treatments. If you are considering an expensive drug to treat a cold, clearly you would want to avoid a type I error, that is, you would not want to make false claims for a cold remedy. It is difficult to remember the distinction between type I and II er rors. Perhaps this small parable will help us. Once there was a King who was very jealous of his Queen.

## Epidemiology and Biostatistics - Biosecurity Science - LibGuides at Box Hill Institute

He had two knights, Alpha, who was very handsome, and Beta, who was very ugly. It happened that the Queen was in love with Beta. The King, however, suspected the Queen was having an affair with Alpha and had him beheaded. Thus, the King made both kinds of errors: he suspected a relationship with The Scientific Method 17 Alpha where there was none, and he failed to detect a relationship with Beta where there really was one.

The Queen fled the kingdom with Beta and lived happily ever after, while the King suffered torments of guilt about his mistaken and fatal rejection of Alpha. More on alpha, beta, power, and sample size appears in Chapter 6. Since hypothesis testing is based on probabilities, we will first present some basic concepts of probability in Chapter 2. The probability of the occurrence of an event is indicated by a number ranging from 0 to 1.

An event whose probability of occurrence is 0 is certain not to occur, whereas an event whose probability is 1 is certain to occur. This is an a priori definition of probability, that is, one determines the probability of an event before it has happened. Assume one were to toss a die and wanted to know the probability of obtaining a number divisible by three on the toss of a die. There are six possible ways that the die can land.

Of these, there are two ways in which the number on the face of the die is divisible by three, a 3 and a 6. In many cases, however, we are not able to enumerate all the possible ways in which an event can occur, and, therefore, we use the relative frequency definition of probability. This is defined as the number of times that the event of interest has occurred divided by the total number of trials or opportunities for the event to occur.

Since it is based on previous data, it is called the a posteriori definition of probability. For instance, if you select at random a white American female, the probability of her dying of heart disease is. This is based on the 19 20 Biostatistics and Epidemiology: A Primer for Health Professionals finding that per , white American females, died of coronary heart disease estimates are for , National Center for Health Statistics 7. When you consider the probability of a white American female who is between ages 45 and 64, the figure drops to.

For white men 65 or older it is. The two important points are 1 to determine a probability, you must specify the population to which you refer, for example, all white females, white males between 65 and 74, nonwhite females between 65 and 74, and so on; and 2 the probability figures are constantly revised as new data become available. This brings us to the notion of expected frequency.

When in tossing a coin we say the probability of it landing on heads is. For example, when TV announcers proclaim there will be say, fatal accidents in State X on the Fourth of July, it is impossible to say whether any individual person will in fact have such an accident, but we can be pretty certain that the number of such accidents will be very close to the predicted based on probabilities derived from previous Fourth of July statistics.

First, if there are mutually exclusive events i. A useful thing to know is that the sum of the individual probabilities of all possible mutually exclusive events must equal 1. Second, if there are two independent events i. The only number both even and divisible by 3 is the number 6. The joint probability law is used to test whether events are independent. If they are independent, the product of their individual probabilities should equal the joint probability. If it does not, they are not independent. It is the basis of the chi-square test of significance, which we will consider in the next section.

Let us apply these concepts to a medical example. Thus, the probability that a patient with a heart attack admitted to this coronary care unit will die is. If two men are admitted to the coronary care unit on a particular day, let A be the event that the first man dies and let B be the event that the second man dies. Note, however, that the probability that either one or the other will die from the heart attack is not the sum of their probabilities because these two events are not mutually exclusive.

It is possible that both will die i. To make this clearer, a good way to approach probability is through the use of Venn diagrams, as shown in Figure 2. Venn diagrams consist of squares that represent the universe of possibilities and circles that define the events of interest. In diagrams 1, 2, and 3, the space inside the square represents all N possible outcomes. The circle marked A represents all the outcomes that constitute event A; the circle marked B represents all the outcomes that constitute event B. Diagram 1 illustrates two mutually exclusive events; an outcome in circle A cannot also be in circle B.

Diagram 2 illustrates two events that can occur jointly: an outcome in circle A can also be an outcome belonging to circle B. The shaded area marked AB represents outcomes that are the occurrence of both A and B. The diagram 3 represents two events where one B is a subset of the other A ; an outcome in circle B must also be an outcome constituting event A, but the reverse is not necessarily true. Therefore, we must subtract the outcomes in the shaded area A and B also written as AB once to arrive at the correct a nswer. The probability of A, given that B has occurred, is called the conditional probability of A given B, and is written symbolically as P A B.

An illustration of this is provided by Venn diagram 2. When we speak of conditional probability, the denominator becomes all the outcomes in circle B instead of all N possible outcomes and the numerator consists of those outcomes that are in that part of A which also contains outcomes belonging to B.

This is the shaded area in the diagram labeled AB. It is difficult to obtain that probability directly because one would have to study the vast number of persons with memory loss which in most cases comes from other causes and determine what proportion of them have brain tumors. A Little Bit of Probability 2. In medicine, odds are often used to calculate an odds ratio. An odds ratio is simply the ratio of two odds. Note, however, that we cannot determine from such an analysis what the probability of getting lung cancer is for smokers, because in order to do that we would have to know how many people out of all smokers developed lung cancer, and we haven't studied all smokers; all we do know is how many out of all our lung cancer cases were smokers.

Nor can we get the probability of lung cancer among nonsmokers, 26 Biostatistics and Epidemiology: A Primer for Health Professionals because we would have to a look at a population of nonsmokers and see how many of them developed lung cancer. All we do know is that smokers have fold greater odds of having lung cancer than nonsmokers. More on this topic is presented in Section 4. For example, if a patient has a sudden loss of memory, we might want to know the likelihood ratio of that symptom for a brain tumor, say. What we want to know is the likelihood that the memory loss arose out of the brain tumor in relation to the likelihood that it arose from some other condition.

The likelihood ratio is a ratio of conditional probabilities. It may sometimes be quite difficult to establish the denominator of the likelihood ratio because we would need to know the prevalence of memory loss in the general population. The LR is perhaps more practical to use than the Bayes' theorem, which gives the probability of a particular disease given a particular A Little Bit of Probability 27 symptom.

In any case, it is widely used in variety of situations because it addresses this important question: If a patient presents with a symptom, what is the likelihood that the symptom is due to a particular disease rather than to some other reason than this disease? We will first consider the case of discrete variables and present the chi-square test and then we will discuss methods applicable to continuous variables.

Let us, therefore, consider the example of testing an anticoagulant drug on female pa tients with myocardial infarction. Alternate The mortality in the treated group is Hypothesis: lower than in the control group. The data for our example come from a study done a long time ago and refer to a specific high-risk group.

But could this difference have arisen by chance? We use the chi-square test to answer this question.

- Biostatistics and Epidemiology: A Primer for Health and Biomedical Professionals.
- ISBN 10: 1493921339?
- Experiments in Modern Analytical Chemistry;
- .NET Compact Framework Programming with C#!
- Biotherapy - History, Principles and Practice: A Practical Guide to the Diagnosis and Treatment of Disease using Living Organisms.

What we are really asking is whether the two categories of classification control vs. If they are independent, what frequencies would we expect in each of the cells? And how different are our observed frequencies from the expected ones? How do we measure the size of the difference? These expectations differ, as we see, from the observed frequencies noted earlier, that is, those patients treated did, in fact, have a lower mortality than those in the control group. Well, now that we have a table of observed frequencies and a table of expected values, how do we know just how different they are?

Do they differ just by chance or is there some other factor that causes them to differ? This is obtained by taking the observed value in each cell, subtracting from it the expected value in each cell, squaring this difference, and dividing by the expected value for each cell. When this is done for each cell, the four resulting quantities are added together to give a number called chi-square. This number, called chi-square, is a statistic that has a known distribution.

The particular value of chi-square that we get for our example happens to be From our knowledge of the distribution of values of chi-square, we know that if our null hypothesis is true, that is, if there is no difference in mortality between the control and treated group, then the probability that we get a value of chi-square as large or larger than Since it is not likely that we would get such a large value of chi-square by chance under the assumption of our null hypothesis, it must be that it has arisen not by chance but because our null hypothesis is incorrect.

We, therefore, reject the null hypothesis at the. Therefore, the probability of rejecting the null hypothesis, when it is in fact true type I error is less than. The probabilities for obtaining various values of chi-square are tabled in most standard statistics texts, so that the procedure is to calculate the value of chi-square and then look it up in the table to determine whether or not it is significant.

That value of chi-square that must be obtained from the data in order to be significant is called the critical value. The critical value of chi-square at the. This means that when we get a value of 3. Appendix A provides some critical values for chi-square and for other tests.

The corrected chi-square so calculated is The chi-square test should not be used if the numbers in the cells are too small. The rules of thumb: When the total N is greater than 40, use the chi-square test with Yates' correction. When N is between 20 and 40 and the expected frequency in each of the four cells is 5 or more, use the corrected chi-square test.

If the smallest expected frequency is less than 5, or if N is less than 20, use the Fisher's test. While the chi-square test approximates the probability, the Fisher's Exact Test gives the exact probability of getting a table with values like those obtained or even more extreme.

A sample calculation is shown in Appendix B. The calculations are unwieldy but the Fisher's exact test is also usually included in most statistics programs for personal computers. The important thing is to know when the chi-square test is or is not appropriate. In such situations, the before and after measures, or the opinions of two judges, are not independent of each other, since they pertain to the same individuals. Instead, we can use the McNemar test. Consider the following example. Case histories of patients who were suspected of having ischemic heart disease a decreased blood flow to the heart because of clogging of the arteries , were presented to two cardiology experts.

The doctors were asked to render an opinion on the basis of the available information about the patient. They could recommend either 1 that the patient should be on medical therapy or 2 that the patient have an angiogram, which is an invasive test, to determine if the patient is a suitable candidate for coronary artery bypass graft surgery known as CABG.

Table 3. TABLE 3. Cells a and d represent patients about whom the two doctors agree. A chisquare of. Were the chi-square test significant, we would have to reject the null hypothesis and say the experts significantly disagree. However, such a test does not tell us about the strength of their agreement, which can be evaluated by a statistic called Kappa. Kappa is a statistic that tells us the extent of the agreement between the two experts above and beyond chance agreement. The cells a and d in Table 3.

The topic of Kappa is thoroughly described in the book by Fleiss listed in the Suggested Readings. When we wish to describe a 37 Mostly About Statistics population with regard to some characteristic, we generally use the mean or aver age as an index of central tendency of the data. Other measures of central tendency are the median and the mode. It is the middle value or the 50th percentile.

To find the median of a set of scores we arrange them in ascending or descending order and locate the middle value if there are an odd number of scores, or the average between the two middle scores if there are an even number of scores. The mode is the value that occurs with the greatest frequency. There may be several modes in a set of scores but only one median and one mean value. These definitions are illustrated below. The mean is the measure of central tendency most often used in inferential statistics. The sample mean is called x read as x bar.

We must be careful to specify exactly the population from which we take a sample. For instance, in the general population the average I. What is also needed is some measure of variability of the data around the mean. Two groups can have the same mean but be very different. For instance, consider a hypothetical group of children each of whose individual I.

Compare this to another group whose mean is also but includes individuals with I. Different statements must be made about these two groups: one is composed of all average individuals; the other includes both retardates and geniuses. The most commonly used index of variability is the standard deviation s. The square of the standard deviation is called variance. When it is calculated from a sample, it is written as s.

The mathematical reason is complex and beyond the scope of this book. Consider a variable like I. This is illustrated in Figure 3. Figure 3. Approximately 2. This is indicated by the shaded a r eas at the tails of the curves. If we are estimating from a sample and if there are a large number of observations, the standard deviation can be estimated from the range of the data, that is, the difference between the smallest and the highest value.

For instance, suppose as a physician you are faced with an adult male who has a hematocrit reading of Hematocrit is a measure of the amount of packed red cells in a measured amount of blood. A low hematocrit may imply anemia, which in turn may imply a more serious condition. You also know that the average hematocrit reading for adult males is Do you know whether the patient with a reading of 39 is normal in the sense of healthy or abnormal? You need to know the standard deviation of the distribution of hematocrits in people before you can determine whether 39 is a normal value.

In point of fact, the standard deviation is approximately 3. For adult females, the mean hematocrit is 42 with a standard deviation of 2. Standard error and standard deviation are often confused, but they serve quite different functions. To under stand the concept of standard error, you must remember that the pur pose of statistics is to draw inferences from samples of data to the population from which these samples came. Specifically, we are inter ested in estimating the true mean of a population for which we have a sample mean based on, say, 25 cases.

Imagine the following: Population Sam ple means based on I. Now imagine that we draw a sample of 25 people at random from that population and calculate the sample mean x. This sample mean happens to be If we took another sample of 25 individuals we would probably get a slightly different sample mean, for example Suppose we did this repeatedly an infinite or a very large number of times, each time throwing the sample we just drew back into the population pool from which we would sample 25 people again.

We would then have a very large number of such sample means. These sample means would form a normal distribution. This distribution of sample means would have its own standard deviation, that is, a measure of the spread of the data around the mean of the data. In this case, the data are sample means rather than individual values. The standard deviation of this distribution of means is called the standard error of the mean. It should be pointed out that this distribution of means, which is also called the sampling distribution of means, is a theoretical construct.

Obviously, we don't go around measuring samples of the population to construct such a distribution. Usually, in fact, we just take one sample of 25 people and imagine what this distribution might be. However, due to certain mathematical derivations, we know a lot about this theoretical distribution of population means and therefore we can draw important inferences based on just one sample mean. What we do know is that the distribution of means is a normal distribution, that its mean is the same as the population mean of the individual values, that is, the mean of the means is m, and that its standard deviation is equal to the standard deviation of the original individual values divided by the square root of the number of people in the sample.

When we talk about values calculated from samples, we refer to the mean as x , the standard deviation as s. The mean of these means is also m, but its dispersion, or standard error, is smaller. It is easily seen that if we take a sample of 25 individuals, their mean is likely to be closer to the true mean than the value of a single individual, and if we draw a sample of 64 individuals, their mean is likely to be even closer to the true mean than was the mean we obtained from the sample of Thus, the larger the sample size, the better is our estimate of the true population mean.

The standard deviation is used to describe the dispersion or variability of the scores. The standard error is used to draw inferences about the population mean from which we have a sample. We draw such inferences by constructing confidence intervals, which are discussed in Section 3. The standard error of the differences between two means is the standard deviation of a theoretical distribution of differences between two means.

Imagine a group of men and a group of women each of whom have an I. Suppose we take a sample of 64 men and a sample of 64 women, calculate the mean I. If we were to do this an infinite number of times, we would get a distribution of differences between sample means of two groups of 64 each. These difference scores would be normally distributed; their mean would be the true average difference between the populations of men and women which we are trying to infer from the samples , and the standard deviation of this distribution is called the standard error of the differences between two means.

In some cases we know or assume that the variances of the two populations are equal to each other and that the variances that we calculate from the samples we have drawn are both estimates of a common variance. In such a situation, we would want to pool these estimates to get a better estimate of the common variance. The standard normal distribution looks like Figure 3. On the abscissa, instead of x we have a transformation of x called the standard score, Z.

Let us look at the I. Thus, an I. Accept this on faith. It happens that the area to the right of 1. Since the curve is symmetrical, the probability of Z being to the left of —1. Invoking the ad- Mostly About Statistics 49 Figure 3. Transforming back up to x , we can say that the probability of someone having an I.

Commonly, the Z value of 1. A very important use of Z derives from the fact that we can also convert a sample mean rather than just a single individual value to a Z score. The s. Now we can see that a sample mean of The probability that such a mean is less than A Z score can also be calculated for the difference between two means. This becomes very useful later on when we talk about confidence intervals in Sections 3. But often our sample is not large enough. We can obtain the probability of getting certain t values similarly to the way we obtained probabilities of Z values—from an appropriate table.

But it happens, that while the t distribution looks like a normal Z distribution, it is just a little different, thereby giving slightly different probabilities.

- Basic Solid Mechanics.
- Key Resources?
- Tarkin (Star Wars).
- Biostatistics and Epidemiology.
- China : fragile superpower.
- Customer Reviews?
- Download Biostatistics And Epidemiology A Primer For Health And Biomedical Professionals .
- Search the Libary Catalogue.
- Sanctified and Chicken-Fried: The Portable Lansdale (Southwestern Writers Collection).
- Public Health: Biostatistics.

In fact there are many t distributions not just one, like for Z. There is a different t distribution for each different sample size. More will be explained about this in Section 3. In our example, where we have a mean based on 25 cases, we would need a t value of 2. Translating this back to the scale of sample means, if our standard error were 3. This may seem like nit-picking, since the differences are so small.

In fact, as the sample size approaches infinity, the t distribution becomes exactly like the Z distribution, but, the differences between Z and t get larger as the sample size gets smaller, and it is always safe to use the t distribution. For example, for a mean based on five cases, the t value would be 2. Some t values are tabled in Appendix A. More detailed tables are in standard statistics books. Here are the points to remember: 1 We are always interested in estimating population values from samples.

We of Mostly About Statistics 53 course don't know the actual population values, but if we have very large samples, we can estimate them quite well from our sample data.

**tiasicbietipi.ga**

## Biostatistics and Epidemiology: A Primer for Health and Biomedical Professionals

This is not a particularly good assumption since the I. The exact multiplying factor depends on how large the sample is. If the sample is very large, greater than , we would multiply the s. If the sample is smaller, we should look up the multiplier in tables of t values, which appear in many texts. Some t values are shown in Appendix A. Also refer back to Section 3. Note that for a given sample size we trade off degree of certainty for size of the interval.

We can be more certain that our true mean lies within a wider range but if we want to pin down the range more precisely, we are less certain about it Figure 3. To achieve more precision and maintain a high probability of being correct in estimating the Mostly About Statistics 55 Figure 3. The main point here is that when you report a sample mean as an estimate of a population mean, it is most desirable to report the confidence limits. The interval between these limits is called the confidence interval. Of course, in real life we only take one sample and construct confidence intervals from it.

As you can see, we never know anything for sure. When we have one sample, in order to find the appropriate t value to calculate the confidence limits, we enter the tables with n — 1 degrees of freedom, where n is the sample size. For example, in the anticoagulant study described in Section 3. A proportion may assume values along the continuum between 0 and 1. We can construct a confidence interval around a proportion in a similar way to constructing confidence intervals around means.

To calculate the standard error of a proportion, we must first calculate the standard deviation of a proportion and divide it by the square root of n. Remember this refers to the population from which our sample was drawn. We cannot generalize this to all women having a heart attack. The multiplier is the Z value that corresponds to. As an example consider that we have a sample of 25 female and 25 male medical students. The mean I. This interval includes 0 difference, so we would have to conclude that the difference in I. We may wish to determine, for instance, whether 1 administering a certain drug lowers blood pressure, or 2 drug A is more effective than drug B in lowering blood sugar levels, or 3 teaching first-grade children to read by method I produces higher reading achievement scores at the end of the year than teaching them to read by method II.

The critical value is the minimum value of the test statistics that we must get in order to reject the null hypothesis at a given level of significance. The critical value of Z that we need to reject H O at the. The value we obtained is 3. This is clearly a large enough Z to reject H O at the. The critical value for Z to reject H O at the. Note that we came to the same conclusion using the chi-square test in Section 3.

Some values appear in Appendix A. Or we could use a test between proportions as described in Section 3. We now discuss a method of comparing two groups when the measure of interest is a continuous variable. Mostly About Statistics 63 Let us take as an example the comparison of the ages at first pregnancy of two groups of women: those who are lawyers and those who are paralegals. Such a study might be of sociological interest, or it might be of interest to law firms, or perhaps to a baby foods company that is seeking to focus its advertising strategy more effectively.

### A Primer for Health and Biomedical Professionals

Assuming we have taken proper samples of each group, we now have two sets of values: the ages of the lawyers group A and the ages of the paralegals group B , and we have a mean age for each sample. We are subject to the same kinds of type I a nd type II errors we discussed before. The general approach is as follows. We know there is variability of the scores in group A around the mean for group A and within group B around the mean for group B, simply because even within a given population, people vary.

What we want to find is whether the variability between the two sample means around the grand mean of all the scores is greater than the variability of the ages within the groups around their own means. If there is as much variability within the groups as between the groups, then they probably come from the same population. The appropriate test here is the t-test. We calculate a value known as t, which is equal to the difference between the two sample means divided by an appropriate standard error. The appropriate standard 64 Biostatistics and Epidemiology: A Primer for Health Professionals error is called the standard error of the difference between two means and is written as s.

If this probability is small i. This statistical test is performed to compare the means of two groups under the assumption that both samples are random, independent, and come from normally distributed populations with unknown but equal variances. When the direction of the difference is specified, it is called a one-tailed test. More on this topic appears in Section 5. The numerator of t is the difference between the two means: 31 — So we reject the null hypothesis of no difference, accept the alternate hypothesis, and conclude that the lawyers are older at first pregnancy than the paralegals.

This situation arises when you take two measures on the same individual. For instance, suppose group A represents reading scores of a group of children taken at time 1. These children have then been given special instruction in reading over a period of six months and their reading achievement is again measured to see if they accomplished any gains at time 2. I n such a situation you would use a matched pair t-test. Alternate Hypothesis: Mean difference is greater than 0.

Or it may be that the study was not large enough to detect a difference, and we have committed a type II error. When the actual difference between matched pairs is not in itself a meaningful number, but the researcher can rank the difference scores as being larger or smaller for given pairs. The a ppropriate test is the Wilcoxon matched-pairs rank sums test.

This is known as a nonparametric test, and along with other such tests is described with exquisite clarity in the classic book by Sidney Siegel, Nonparametric Statistics for the Behavioral Sciences listed in the Suggested Readings. When there are three or more group means to be compared, the t-test is not 68 Biostatistics and Epidemiology: A Primer for Health Professionals appropriate.

To understand why, we need to invoke our knowledge of combining probabilities from Section 2. Suppose you are testing the effects of three different treatments for high blood pressure. Patients in one group A receive one medication, a diuretic; patients in group B receive another medication, a beta-blocker; and patients in group C receive a placebo pill.

You want to know whether either drug is better than placebo in lowering blood pressure and if the two drugs are different from each other in their blood pressure lowering effect. There are three comparisons that can be made: group A versus group C to see if the diuretic is better than placebo , group B versus group C to see if the beta-blocker is better than the placebo , and group A versus group B to see which of the two active drugs has more effect.

We set our significance level at. We are looking here at the joint occurr ence of three events the three ways of not committing a type I error and we combine these probabilities by multiplying the individual probabilities. So now, we know that the overall probability of not committing a type I error in any of the three possible comparisons is. Therefore, the Mostly About Statistics 69 probability of committing such an error is 1—the probability of not committing it, or 1 —.

Thus, the overall probability of a type I error would be considerably greater than the. In actual fact, the numbers are a little different because the three comparisons are not independent events, since the same groups are used in more than one comparison, so combining probabilities in this situation would not involve the simple multiplication rule for the joint occurrence of independent events. However, it is close enough to illustrate the point that making multiple comparisons in the same experiment results in quite a different significance level.

When there are more than three groups to compare, the situation gets worse. An example might be comparing the blood pressure reduction effects of the three drugs. Under the null hypothesis we would have the following situation: there would be one big population and if we picked samples of a given size from that population we would have a bunch of sample means that would vary due to chance around the grand mean of the whole population.

If it turns out they vary around the grand mean more than we would expect just by chance alone, then perhaps something other than chance is operating. Perhaps they don't all come from the same population. Perhaps something distinguishes the groups we have picked. We would then reject the null hypothesis that all the means are equal and conclude the means are different from 70 Biostatistics and Epidemiology: A Primer for Health Professionals each other by more than just chance.

Essentially, we want to know if the variability of all the groups means is substantially greater than the variability within each of the groups around their own mean. We calculate a quantity known as the between-groups variance, which is the variability of the group means around the grand mean of all the data. We calculate another quantity called the within-groups variance, which is the variability of the scores within each group around its own mean. One of the assumptions of the analysis of variance is that the extent of the variability of individuals within groups is the same for each of the groups, so we can pool the estimates of the individual within group variances to obtain a more reliable estimate of overall within-groups variance.

If there is as much variability of individuals within the groups as there is variability of means between the groups, the means probably come from the same population, which would be consistent with the hypothesis of no true difference among means, that is, we could not reject the null hypothesis of no difference among means. The ratio of the between-groups variance to the within-groups variance is known as the F ratio.

Values of the F distribution appear in tables in many statistical texts and if the obtained value from our experiment is greater than the critical value that is tabled, we can then reject the hypothesis of no difference. There are different critical values of F depending on how many groups are compared and on how many scores there are in each group.

To read the tables of F, one must know the two values of degrees of freedom df. The df corresponding to the between-groups variance, which is the numerator of the F ratio, is equal to k — 1, where k is the number of groups. An F ratio would have to be at least 3.

If there were four groups being compared then the numerator degrees of freedom would be 3, 71 Mostly About Statistics and the critical value of F would need to be 2. We will not present here the actual calculations necessary to do an F test because nowadays these are rarely done by hand. There are a large number of programs available for personal computers that can perform F tests, t-tests, and most other statistical analyses.

However, shown below is the kind of output that can be expected from these programs. The TAIM study was designed to evaluate the effect of diet and drugs, used alone or in combination with each other, to treat overweight persons with mild hypertension high blood pressure. Drug group n Mean drop in diastolic blood pressure units after 6 months of treatment Standard deviation A.

Diuretic Beta-blocker Placebo 9. For between-groups, it is the variation of the group means around the grand mean, while for within-groups it is the pooled estimate of the variation of the individual scores around their respective group means. The within-groups mean square is also called the error mean square.

An important point is that the square root of the error mean square is the pooled estimate of the within-groups standard deviation. In this case it is It is roughly equivalent to the a verage standard deviation. F is the ratio of the between to the within mean squares; in this example it is The F ratio is significant at the. However, we do not know where the difference lies. Is group A different from group C but not from group B?

We should not simply make a ll the pairwise comparisons possible because of the problem of multiple comparisons discussed above.

But there are ways to handle this problem. One of them is the Bonferroni procedure, described in the next section. The Bonferroni procedure implies that if for example we make five comparisons, the probability that none of the five p values falls below. That means that there is a probability of up to.

To get around this, we divide the chosen overall significance level by the number of two-way comparisons to be made, consider this value to be the significance level for any single comparison, and reject the null hypothesis of no difference only if it achieves this new significance level. For example, if we want an overall significance level of. The Bonferroni procedure does not require a prior F test.

Let us apply the Bonferroni procedure to our data. First we compare each of the drugs to placebo. We calculate the t for the difference between means of group A versus group C. We obtained this from the analysis of variance as an estimate of the common standard deviation. We can safely say the diuretic reduces blood pressure more than the placebo. The same holds true for the comparison between the beta-blocker and placebo.

It might be tempting to declare a significant difference at the. Recently,12 there has been some questioning of the routine adjustment for multiple comparisons on the grounds that we thereby may commit more type II errors and miss important effects. In any case p levels should be reported so that the informed reader may evaluate the evidence. There may, however, be another factor that classifies individuals, and in that case we would have a two-way, or a two-factor, ANOVA. In the experiment we used as an example, patients were assigned to one of the three drugs noted above, as well as to one of three diet regimens—weight reduction, sodium salt restriction, or no change from their usual diet, which is analogous to a placebo diet condition.

The diagram below illustrates this two-factor design, and the mean drop in 75 Mostly About Statistics blood pressure in each group, as well as the numbers of cases in each group, which are shown in parenthesis. We now explain the concept of interaction. For example, maybe one drug produces better effects when combined with a weight-reduction diet than when combined with a sodiumrestricted diet. There may not be a significant effect of that drug when all diet groups are lumped together but if we look at the effects sepa rately for each diet group we may discover an interaction between the two factors: diet and drug.

The diagrams below illustrate the concept of interaction effects. WR means weight reduction and SR means sodium salt restriction. If we just compared the average for drug A, combining diets, with the average for drug B, we would have to say there is no difference between drug A and drug B, but if we look at the two diets separately we see quite different effects of the two drugs.

In example 2, there is no difference in the two drugs for those who restrict salt, but there is less effect of drug A than drug B for those in weight reduction. In example 3, there is no interaction; there is an equal effect for both diets: the two lines are parallel; their slopes are the same.

Drug B is better than drug A both for those in weight reduction and salt r estriction. Thus, Now we can use the square root of this which is 8. We have already made the three pairwise comparisons, by t-tests for the difference between two means among drugs i. We can do the same for the three diets. Their mean values are displayed below: Diet group Weight reduction Sodium restriction Usual diet n Mean drop in dia stolic blood pressure The t for this pairwise comparison is 2.

Often, however, we must deal with situa - 78 Biostatistics and Epidemiology: A Primer for Health Professionals tions when we want to compare several groups on a variable that does not meet all of the above conditions. This might be a case where we can say one person is better than another, but we can't say exactly how much better. In such a case we would rank people and compare the groups by using the Kruskal—Wallis test to determine if it is likely that all the groups come from a common population.

This test is analogous to the one-way analysis of variance but instead of using the original scores, it uses the rankings of the scores. It is called a non-parametric test. This test is available in many computer programs, but an example appears in Appendix C. Is there an association between poverty and drug addiction?

Is emotional stress associated with cardiovascular disease? To determine association, we must first quantify both variables. For instance, emotional stress may be quantified by using an appropriate psychological test of stress or by clearly defining, evaluating, and rating on a scale the stress factor in an individual's life situation, whereas hypertension defined as a blood pressure reading may be considered as the particular aspect of cardiovascular disease to be studied.

When variables have been quantified, a measure of association needs to be calculated to determine the strength of the relationship. The method of calculation appears in Appendix D. An example of this might be the correlation between blood pressure and the number of hairs on the head. This kind of correlation Mostly About Statistics 79 exists only in deterministic models, where there is really a functional relationship. An example might be the correlation between age of a tree and the number of rings it has. A correlation coefficient of -1 indicates a perfect inverse relationship, where a high score on one variable means a low score on the other and where, as in perfect positive correlation, there is no error of measurement.

These correlation coefficients apply when the basic relationship between the two variables is linear. Consider a group of people for each of whom we have a measurement of weight against height; we will find that we can draw a straight line through the points. There is a linear association between weight and height and the correlation coefficient would be positive but less than 1. The diagrams in Figure 3. The answer to this question depends upon the field of application as well as on many other factors. Among psychological variables, which are difficult to measure precisely and are affected by many other variables, the correlations are generally though not necessarily lower than among biological variables where more accurate measurement is possible.

The following example may give you a feel for the orders of magnitude. The correlations between verbal aptitude and nonverbal aptitude, as measured for Philadelphia schoolchildren by standardized national tests, range from. Note that only in diagrams 1 , 2 , and 6 does the correlation between W and B arise due to a causal relationship between the two variables.

In diagram 1 , W entirely determines B; in diagram 2 , W is a partial cause of B; in diagram 6 , W is one of several determinants of B. In all of the other structural relationships, the correlation between W and B arises due to common influences on both variables. Thus, it must be stressed that the existence of a correlation between two variables does not necessarily imply causation. Correla- Mostly About Statistics 81 tions may arise because one variable is the partial cause of another or the two correlated variables have a common cause.

Other factors, such as sampling, the variation in the two populations, and so on, affect the size of the correlation coefficient also. Thus, care must be taken in interpreting these coefficients. These are called regression lines. In the top scattergram labeled a , Y is the dependent variable weight and X, or height, is the independent variable. We say that weight is a function of height. The quantity a is the intercept. It is where the line crosses the Y axis. The quantity b is the slope and it is the rate of change in Y for a unit change in X.

If the slope is 0, it means we have a straight line parallel to the x axis, as in the illustra tion d. It also means that we cannot predict Y from a knowledge of X since there is no relationship between Y a nd X. If we have the situation shown in scattergrams b or c , we know exactly how Y changes when X changes and we can perfectly predict Y from a knowledge of X with no error. In the scattergram a , we can see that as X increases Y increases but we can't predict Y perfectly because the points are scattered around the line we have drawn.

We can, however, put confidence limits around our prediction, but first we must determine the form of the line we should draw through the points. We must estimate the values for the intercept and slope. This is called the least-squares fit. Consider the data below where Y could be a score on one test and X could be a score on another test. However, most statistical computer packages for personal computers provide a linear regression program that does these calcula tions. The intercept a is 2.

The slope is. It is the sum of these squared distances that is smaller for this line than it would be for any other line we might draw. The correlation coefficient for these data is. The square of the correlation coefficient, r 2, can be interpreted as the proportion of the variance in Y that is explained by X.

In our example,. We can have as many variables as appropriate, where the last variable is the k th variable. Note that family history of high blood pressure is not a continuous variable. It can either be yes or no. Statistical computer programs usually include multiple linear regression.