Hypothesis Testing

· From the textbook, Business Statistics in Practice, read the following chapters:

· Hypothesis Testing

· Decision Theory

 

CHAPTER 9: Hypothesis Testing

Chapter Outline

9.1 The Null and Alternative Hypotheses and Errors in Hypothesis Testing

9.2 z Tests about a Population Mean: σ Known

9.3 t Tests about a Population Mean: σ Unknown

9.4 z Tests about a Population Proportion

9.5 Type II Error Probabilities and Sample Size Determination (Optional)

9.6 The Chi-Square Distribution (Optional)

9.7 Statistical Inference for a Population Variance (Optional)

Hypothesis testing is a statistical procedure used to provide evidence in favor of some statement (called a hypothesis). For instance, hypothesis testing might be used to assess whether a population parameter, such as a population mean, differs from a specified standard or previous value. In this chapter we discuss testing hypotheses about population means, proportions, and variances.

In order to illustrate how hypothesis testing works, we revisit several cases introduced in previous chapters and also introduce some new cases:

The Payment Time Case: The consulting firm uses hypothesis testing to provide strong evidence that the new electronic billing system has reduced the mean payment time by more than 50 percent.

The Cheese Spread Case: The cheese spread producer uses hypothesis testing to supply extremely strong evidence that fewer than 10 percent of all current purchasers would stop buying the cheese spread if the new spout were used.

The Electronic Article Surveillance Case: A company that sells and installs EAS systems claims that at most 5 percent of all consumers would never shop in a store again if the store subjected them to a false EAS alarm. A store considering the purchase of such a system uses hypothesis testing to provide extremely strong evidence that this claim is not true.

The Trash Bag Case: A marketer of trash bags uses hypothesis testing to support its claim that the mean breaking strength of its new trash bag is greater than 50 pounds. As a result, a television network approves use of this claim in a commercial.

The Valentine’s Day Chocolate Case: A candy company projects that this year’s sales of its special valentine box of assorted chocolates will be 10 percent higher than last year. The candy company uses hypothesis testing to assess whether it is reasonable to plan for a 10 percent increase in sales of the valentine box.

 9.1: The Null and Alternative Hypotheses and Errors in Hypothesis Testing

One of the authors’ former students is employed by a major television network in the standards and practices division. One of the division’s responsibilities is to reduce the chances that advertisers will make false claims in commercials run on the network. Our former student reports that the network uses a statistical methodology called hypothesis testing to do this.

Chapter 9

To see how this might be done, suppose that a company wishes to advertise a claim, and suppose that the network has reason to doubt that this claim is true. The network assumes for the sake of argument that the claim is not valid. This assumption is called the null hypothesis. The statement that the claim is valid is called the alternative, or research, hypothesis. The network will run the commercial only if the company making the claim provides sufficient sample evidence to reject the null hypothesis that the claim is not valid in favor of the alternative hypothesis that the claim is valid. Explaining the exact meaning of sufficient sample evidence is quite involved and will be discussed in the next section.

The Null Hypothesis and the Alternative Hypothesis

In hypothesis testing:

1 The null hypothesis, denoted H0, is the statement being tested. Usually this statement represents the status quo and is not rejected unless there is convincing sample evidence that it is false.

2 The alternative, or research, hypothesis, de noted Ha, is a statement that will be accepted only if there is convincing sample evidence that it is true.

Setting up the null and alternative hypotheses in a practical situation can be tricky. In some situations there is a condition for which we need to attempt to find supportive evidence. We then formulate (1) the alternative hypothesis to be the statement that this condition exists and (2) the null hypothesis to be the statement that this condition does not exist. To illustrate this, we consider the following case studies.

 EXAMPLE 9.1: The Trash Bag Case1

A leading manufacturer of trash bags produces the strongest trash bags on the market. The company has developed a new 30-gallon bag using a specially formulated plastic that is stronger and more biodegradable than other plastics. This plastic’s increased strength allows the bag’s thickness to be reduced, and the resulting cost savings will enable the company to lower its bag price by 25 percent. The company also believes the new bag is stronger than its current 30-gallon bag.

The manufacturer wants to advertise the new bag on a major television network. In addition to promoting its price reduction, the company also wants to claim the new bag is better for the environment and stronger than its current bag. The network is convinced of the bag’s environmental advantages on scientific grounds. However, the network questions the company’s claim of increased strength and requires statistical evidence to justify this claim. Although there are various measures of bag strength, the manufacturer and the network agree to employ “breaking strength.” A bag’s breaking strength is the amount of a representative trash mix (in pounds) that, when loaded into a bag suspended in the air, will cause the bag to rip or tear. Tests show that the current bag has a mean breaking strength that is very close to (but does not exceed) 50 pounds. The new bag’s mean breaking strength μ is unknown and in question. The alternative hypothesis Ha is the statement for which we wish to find supportive evidence. Because we hope the new bags are stronger than the current bags, Ha says that μ is greater than 50. The null hypothesis states that Ha is false. Therefore, H0 says that μ is less than or equal to 50. We summarize these hypotheses by stating that we are testing

H0: μ ≤ 50   versus   Haμ > 50

The network will run the manufacturer’s commercial if a random sample of n new bags provides sufficient evidence to reject H0: μ ≤ 50   in favor of   Haμ > 50.

 EXAMPLE 9.2: The Payment Time Case

Recall that a management consulting firm has installed a new computer-based, electronic billing system for a Hamilton, Ohio, trucking company. Because of the system’s advantages, and because the trucking company’s clients are receptive to using this system, the management consulting firm believes that the new system will reduce the mean bill payment time by more than 50 percent. The mean payment time using the old billing system was approximately equal to, but no less than, 39 days. Therefore, if μ denotes the mean payment time using the new system, the consulting firm believes that μ will be less than 19.5 days. Because it is hoped that the new billing system reduces mean payment time, we formulate the alternative hypothesis as Haμ < 19.5 and the null hypothesis as H0: μ ≥ 19.5. The consulting firm will randomly select a sample of n invoices and determine if their payment times provide sufficient evidence to reject H0: μ ≥ 19.5 in favor of Haμ < 19.5. If such evidence exists, the consulting firm will conclude that the new electronic billing system has reduced the Hamilton trucking company’s mean bill payment time by more than 50 percent. This conclusion will be used to help demonstrate the benefits of the new billing system both to the Hamilton company and to other trucking companies that are considering using such a system.

 EXAMPLE 9.3: The Valentine’s Day Chocolate Case 2

A candy company annually markets a special 18 ounce box of assorted chocolates to large retail stores for Valentine’s Day. This year the candy company has designed an extremely attractive new valentine box and will fill the box with an especially appealing assortment or chocolates. For this reason, the candy company subjectively projects—based on past experience and knowledge of the candy market—that sales of its valentine box will be 10 percent higher than last year. However, since the candy company must decide how many valentine boxes to produce, the company needs to assess whether it is reasonable to plan for a 10 percent increase in sales.

Before the beginning of each Valentine’s Day sales season, the candy company sends large retail stores information about its newest valentine box of assorted chocolates. This information includes a description of the box of chocolates, as well as a preview of advertising displays that the candy company will provide to help retail stores sell the chocolates. Each retail store then places a single (nonreturnable) order of valentine boxes to satisfy its anticipated customer demand for the Valentine’s Day sales season. Last year the mean order quantity of large retail stores was 300 boxes per store. If the projected 10 percent sales increase will occur, the mean order quantity, μ, of large retail stores this year will be 330 boxes per store. Therefore, the candy company wishes to test the null hypothesis H0: μ = 330 versus the alternative hypothesis Haμ ≠ 330.

To perform the hypothesis test, the candy company will randomly select a sample of n large retail stores and will make an early mailing to these stores promoting this year’s valentine box. The candy company will then ask each retail store to report how many valentine boxes it anticipates ordering. If the sample data do not provide sufficient evidence to reject H0: μ = 330 in favor of Haμ ≠ 330, the candy company will base its production on the projected 10 percent sales increase. On the other hand, if there is sufficient evidence to reject H0: μ = 330, the candy company will change its production plans.

We next summarize the sets of null and alternative hypotheses that we have thus far considered.

The alternative hypothesis Haμ > 50 is called a one-sided, greater than alternative hypothesis, whereas Haμ < 19.5 is called a one-sided, less than alternative hypothesis, and Haμ ≠ 330 is called a two-sided, not equal to alternative hypothesis. Many of the alternative hypotheses we consider in this book are one of these three types. Also, note that each null hypothesis we have considered involves an equality. For example, the null hypothesis H0: μ ≤ 50 says that μ is either less than or equal to 50. We will see that, in general, the approach we use to test a null hypothesis versus an alternative hypothesis requires that the null hypothesis involve an equality.

The idea of a test statistic

Suppose that in the trash bag case the manufacturer randomly selects a sample of n = 40 new trash bags. Each of these bags is tested for breaking strength, and the sample mean  of the 40 breaking strengths is calculated. In order to test H0: μ ≤ 50 versus Haμ > 50, we utilize the test statistic

The test statistic z measures the distance between  and 50. The division by  says that this distance is measured in units of the standard deviation of all possible sample means. For example, a value of z equal to, say, 2.4 would tell us that  is 2.4 such standard deviations above 50. In general, a value of the test statistic that is less than or equal to zero results when  is less than or equal to 50. This provides no evidence to support rejecting H0 in favor of Ha because the point estimate  indicates that μ is probably less than or equal to 50. However, a value of the test statistic that is greater than zero results when  is greater than 50. This provides evidence to support rejecting H0 in favor of Ha because the point estimate  indicates that μ might be greater than 50. Furthermore, the farther the value of the test statistic is above 0 (the farther  is above 50), the stronger is the evidence to support rejecting H0 in favor of Ha.

Hypothesis testing and the legal system

If the value of the test statistic z is far enough above 0, we reject H0 in favor of Ha. To see how large z must be in order to reject H0, we must understand that a hypothesis test rejects a null hypothesis H0 only if there is strong statistical evidence against H0. This is similar to our legal system, which rejects the innocence of the accused only if evidence of guilt is beyond a reasonable doubt. For instance, the network will reject H0: μ ≤ 50 and run the trash bag commercial only if the test statistic z is far enough above 0 to show beyond a reasonable doubt that H0: μ ≤ 50 is false and Haμ > 50 is true. A test statistic that is only slightly greater than 0 might not be convincing enough. However, because such a test statistic would result from a sample mean  that is slightly greater than 50, it would provide some evidence to support rejecting H0: μ ≤ 50, and it certainly would not provide strong evidence sup porting H0: μ ≤ 50. Therefore, if the value of the test statistic is not large enough to convince us to reject H0, we do not say that we accept H0. Rather we say that we do not reject H0 because the evidence against H0 is not strong enough. Again, this is similar to our legal system, where the lack of evidence of guilt beyond a reasonable doubt results in a verdict of not guilty, but does not prove that the accused is innocent.

Type I and Type II errors and their probabilities

To determine exactly how much statistical evidence is required to reject H0, we consider the errors and the correct decisions that can be made in hypothesis testing. These errors and correct decisions, as well as their implications in the trash bag advertising example, are summarized in Tables 9.1 and 9.2. Across the top of each table are listed the two possible “states of nature.” Either H0: μ ≤ 50 is true, which says the manufacturer’s claim that μ is greater than 50 is false, or H0 is false, which says the claim is true. Down the left side of each table are listed the two possible decisions we can make in the hypothesis test. Using the sample data, we will either reject H0: μ ≤ 50, which implies that the claim will be advertised, or we will not reject H0, which implies that the claim will not be advertised.

Table 9.1: Type I and Type II Errors

Table 9.2: The Implications of Type I and Type II Errors in the Trash Bag Example

In general, the two types of errors that can be made in hypothesis testing are defined here:

Type I and Type II Errors

If we reject H0 when it is true, this is a Type I error.

If we do not reject H0 when it is false, this is a Type II error.

As can be seen by comparing Tables 9.1 and 9.2, if we commit a Type I error, we will advertise a false claim. If we commit a Type II error, we will fail to advertise a true claim.

We now let the symbol α (pronounced alphadenote the probability of a Type I error, and we let β (pronounced betadenote the probability of a Type II error. Obviously, we would like both α and β to be small. A common (but not the only) procedure is to base a hypothesis test on taking a sample of a fixed size (for example, n = 40 trash bags) and on setting α equal to a small prespecified value. Setting α low means there is only a small chance of rejecting H0 when it is true. This implies that we are requiring strong evidence against H0 before we reject it.

We sometimes choose α as high as .10, but we usually choose α between .05 and .01. A frequent choice for α is .05. In fact, our former student tells us that the network often tests advertising claims by setting the probability of a Type I error equal to .05. That is, the network will run a commercial making a claim if the sample evidence allows it to reject a null hypothesis that says the claim is not valid in favor of an alternative hypothesis that says the claim is valid with α set equal to .05. Since a Type I error is deciding that the claim is valid when it is not, the policy of setting α equal to .05 says that, in the long run, the network will advertise only 5 percent of all invalid claims made by advertisers.

One might wonder why the network does not set α lower—say at .01. One reason is that it can be shown that, for a fixed sample size, the lower we set α, the higher is β, and the higher we set α, the lower is β. Setting α at .05 means that β, the probability of failing to advertise a true claim (a Type II error), will be smaller than it would be if α were set at .01. As long as (1) the claim to be advertised is plausible and (2) the consequences of advertising the claim even if it is false are not terribly serious, then it is reasonable to set α equal to .05. However, if either (1) or (2) is not true, then we might set α lower than .05. For example, suppose a pharmaceutical company wishes to advertise that it has developed an effective treatment for a disease that has formerly been very resistant to treatment. Such a claim is (perhaps) difficult to believe. Moreover, if the claim is false, patients suffering from the disease would be subjected to false hope and needless expense. In such a case, it might be reasonable for the network to set α at .01 because this would lower the chance of advertising the claim if it is false. We usually do not set α lower than .01 because doing so often leads to an unacceptably large value of β. We explain some methods for computing the probability of a Type II error in optional Section 9.6. However, β can be difficult or impossible to calculate in many situations, and we often must rely on our intuition when deciding how to set α.

Exercises for Section 9.1

CONCEPTS

9.1 Which hypothesis (the null hypothesis, H0, or the alternative hypothesis, Ha) is the “status quo” hypothesis (that is, the hypothesis that states that things are remaining “as is”)? Which hypothesis is the hypothesis that says that a “hoped for” or “suspected” condition exists?

9.2 Which hypothesis (H0 or Ha) is not rejected unless there is convincing sample evidence that it is false? Which hypothesis (H0 or Ha) will be accepted only if there is convincing sample evidence that it is true?

9.3 Define each of the following:

a Type I error

b Type II error

c α

d β

9.4 For each of the following situations, indicate whether an error has occurred and, if so, indicate what kind of error (Type I or Type II) has occurred.

a We do not reject H0 and H0 is true.

b We reject H0 and H0 is true.

c We do not reject H0 and H0 is false.

d We reject H0 and H0 is false.

9.5 If we reject H0, what is the only type of error that we could be making? Explain.

9.6 If we do not reject H0, what is the only type of error that we could be making? Explain.

9.7 When testing a hypothesis, why don’t we set the probability of a Type I error to be extremely small? Explain.

METHODS AND APPLICATIONS

9.8 THE VIDEO GAME SATISFACTION RATING CASE  VideoGame

Recall that “very satisfied” customers give the XYZ-Box video game system a rating that is at least 42. Suppose that the manufacturer of the XYZ-Box wishes to use the random sample of 65 satisfaction ratings to provide evidence supporting the claim that the mean composite satisfaction rating for the XYZ-Box exceeds 42.

a Letting μ represent the mean composite satisfaction rating for the XYZ-Box, set up the null and alternative hypotheses needed if we wish to attempt to provide evidence supporting the claim that μ exceeds 42.

b In the context of this situation, interpret making a Type I error; interpret making a Type II error.

9.9 THE BANK CUSTOMER WAITING TIME CASE  WaitTime

Recall that a bank manager has developed a new system to reduce the time customers spend waiting for teller service during peak hours. The manager hopes the new system will reduce waiting times from the current 9 to 10 minutes to less than 6 minutes.

Suppose the manager wishes to use the random sample of 100 waiting times to support the claim that the mean waiting time under the new system is shorter than six minutes.

a Letting μ represent the mean waiting time under the new system, set up the null and alternative hypotheses needed if we wish to attempt to provide evidence supporting the claim that μ is shorter than six minutes.

b In the context of this situation, interpret making a Type I error; interpret making a Type II error.

9.10 An automobile parts supplier owns a machine that produces a cylindrical engine part. This part is supposed to have an outside diameter of three inches. Parts with diameters that are too small or too large do not meet customer requirements and must be rejected. Lately, the company has experienced problems meeting customer requirements. The technical staff feels that the mean diameter produced by the machine is off target. In order to verify this, a special study will randomly sample 100 parts produced by the machine. The 100 sampled parts will be measured, and if the results obtained cast a substantial amount of doubt on the hypothesis that the mean diameter equals the target value of three inches, the company will assign a problem-solving team to intensively search for the causes of the problem.

a The parts supplier wishes to set up a hypothesis test so that the problem-solving team will be assigned when the null hypothesis is rejected. Set up the null and alternative hypotheses for this situation.

b In the context of this situation, interpret making a Type I error; interpret making a Type II error.

c Suppose it costs the company $3,000 a day to assign the problem-solving team to a project. Is this $3,000 figure the daily cost of a Type I error or a Type II error? Explain.

9.11 The Crown Bottling Company has just installed a new bottling process that will fill 16-ounce bottles of the popular Crown Classic Cola soft drink. Both overfilling and underfilling bottles are undesirable: Underfilling leads to customer complaints and overfilling costs the company considerable money. In order to verify that the filler is set up correctly, the company wishes to see whether the mean bottle fill, μ, is close to the target fill of 16 ounces. To this end, a random sample of 36 filled bottles is selected from the output of a test filler run. If the sample results cast a substantial amount of doubt on the hypothesis that the mean bottle fill is the desired 16 ounces, then the filler’s initial setup will be readjusted.

a The bottling company wants to set up a hypothesis test so that the filler will be readjusted if the null hypothesis is rejected. Set up the null and alternative hypotheses for this hypothesis test.

b In the context of this situation, interpret making a Type I error; interpret making a Type II error.

9.12 Consolidated Power, a large electric power utility, has just built a modern nuclear power plant. This plant discharges waste water that is allowed to flow into the Atlantic Ocean. The Environmental Protection Agency (EPA) has ordered that the waste water may not be excessively warm so that thermal pollution of the marine environment near the plant can be avoided. Because of this order, the waste water is allowed to cool in specially constructed ponds and is then released into the ocean. This cooling system works properly if the mean temperature of waste water discharged is 60°F or cooler. Consolidated Power is required to monitor the temperature of the waste water. A sample of 100 temperature readings will be obtained each day, and if the sample results cast a substantial amount of doubt on the hypothesis that the cooling system is working properly (the mean temperature of waste water discharged is 60°F or cooler), then the plant must be shut down and appropriate actions must be taken to correct the problem.

a Consolidated Power wishes to set up a hypothesis test so that the power plant will be shut down when the null hypothesis is rejected. Set up the null and alternative hypotheses that should be used.

b In the context of this situation, interpret making a Type I error; interpret making a Type II error.

c The EPA periodically conducts spot checks to determine whether the waste water being discharged is too warm. Suppose the EPA has the power to impose very severe penalties (for example, very heavy fines) when the waste water is excessively warm. Other things being equal, should Consolidated Power set the probability of a Type I error equal to α = .01 or α = .05? Explain.

9.13 Consider Exercise 9.12, and suppose that Consolidated Power has been experiencing technical problems with the cooling system. Because the system has been unreliable, the company feels it must take precautions to avoid failing to shut down the plant when its waste water is too warm. Other things being equal, should Consolidated Power set the probability of a Type I error equal to α = .01 or α = .05? Explain.

 9.2: z Tests about a Population Mean: σ Known

In this section we discuss hypothesis tests about a population mean that are based on the normal distribution. These tests are called z tests, and they require that the true value of the population standard deviation σ is known. Of course, in most real-world situations the true value of σ is not known. However, the concepts and calculations of hypothesis testing are most easily illustrated using the normal distribution. Therefore, in this section we will assume that—through theory or history related to the population under consideration—we know σ. When σ is unknown, we test hypotheses about a population mean by using the t distribution. In Section 9.3 we study t tests, and we will revisit the examples of this section assuming that σ is unknown.

Chapter 9

Testing a “greater than” alternative hypothesis by using a critical value rule

In Section 9.1 we explained how to set up appropriate null and alternative hypotheses. We also discussed how to specify a value for α, the probability of a Type I error (also called the level of significance) of the hypothesis test, and we introduced the idea of a test statistic. We can use these concepts to begin developing a seven step hypothesis testing procedure. We will introduce these steps in the context of the trash bag case and testing a “greater than” alternative hypothesis.

Step 1: State the null hypothesis H0 and the alternative hypothesis Ha. In the trash bag case, we will test H0: μ ≤ 50 versus Haμ > 50. Here, μ is the mean breaking strength of the new trash bag.

Step 2: Specify the level of significance α. The television network will run the commercial stating that the new trash bag is stronger than the former bag if we can reject H0: μ ≤ 50 in favor of Haμ > 50 by setting α equal to .05.

Step 3: Select the test statistic. In order to test H0: μ ≤ 50 versus Haμ > 50, we will test the modified null hypothesis H0: μ = 50 versus Haμ > 50. The idea here is that if there is sufficient evidence to reject the hypothesis that μ equals 50 in favor of μ > 50, then there is certainly also sufficient evidence to reject the hypothesis that μ is less than or equal to 50. In order to test H0: μ = 50 versus Haμ > 50, we will randomly select a sample of n = 40 new trash bags and calculate the mean  of the breaking strengths of these bags. We will then utilize the test statistic

A positive value of this test statistic results from an  that is greater than 50 and thus provides evidence against H0: μ = 50 and in favor of Haμ > 50.

Step 4: Determine the critical value rule for deciding whether to reject H0. To decide how large the test statistic z must be to reject H0 in favor of Ha by setting the probability of a Type I error equal to α, we note that different samples would give different sample means and thus different values of z. Because the sample size n = 40 is large, the Central Limit Theorem tells us that the sampling distribution of z is (approximately) a standard normal distribution if the null hypothesis H0: μ = 50 is true. Therefore, we do the following:

 Place the probability of a Type I error, α, in the right-hand tail of the standard normal curve and use the normal table (see Table A.3page 863) to find the normal point . Here , which we call a critical value, is the point on the horizontal axis under the standard normal curve that gives a right-hand tail area equal to α.

 Reject H0: μ = 50 in favor of Haμ > 50 if and only if the test statistic z is greater than the critical value  (This is the critical value rule.)

Figure 9.1 illustrates that since we have set α equal to .05, we should use the critical value  = z.05 = 1.645 (see Table A.3). This says that we should reject H0 if z > 1.645 and we should not reject H0 if z ≤ 1.645.

Figure 9.1: The Critical Value for Testing H0: μ = 50 versus Haμ > 50 by Setting α = .05

To better understand the critical value rule, consider the standard normal curve in Figure 9.1. The area of .05 in the right-hand tail of this curve implies that values of the test statistic z that are greater than 1.645 are unlikely to occur if the null hypothesis H0: μ = 50 is true. There is a 5 percent chance of observing one of these values—and thus wrongly rejecting H0—if H0 is true. However, we are more likely to observe a value of z greater than 1.645—and thus correctly reject H0—if H0 is false. Therefore, it is intuitively reasonable to reject H0 if the value of the test statistic z is greater than 1.645.

Step 5: Collect the sample data and compute the value of the test statistic. When the sample of n = 40 new trash bags is randomly selected, the mean of the breaking strengths is calculated to be . Assuming that σ is known to equal 1.65, the value of the test statistic is

Step 6: Decide whether to reject H0 by using the test statistic value and the critical value rule. Since the test statistic value z = 2.20 is greater than the critical value z.05 = 1.645, we can reject H0: μ = 50 in favor of Haμ > 50 by setting α equal to .05. Furthermore, we can be intuitively confident that H0: μ = 50 is false and Haμ > 50 is true. This is because, since we have rejected H0 by setting α equal to .05, we have rejected H0 by using a test that allows only a 5 percent chance of wrongly rejecting H0. In general, if we can reject a null hypothesis in favor of an alternative hypothesis by setting the probability of a Type I error equal to α, we say that we have statistical significance at the α level.

Step 7: Interpret the statistical results in managerial (real-world) terms and assess their practical importance. Since we have rejected H0: μ = 50 in favor of Haμ > 50 by setting α equal to .05, we conclude (at an α of .05) that the mean breaking strength of the new trash bag exceeds 50 pounds. Furthermore, this conclusion has practical importance to the trash bag manufacturer because it means that the television network will approve running commercials claiming that the new trash bag is stronger than the former bag. Note, however, that the point estimate of μ, indicates that μ is not much larger than 50. Therefore, the trash bag manufacturer can claim only that its new bag is slightly stronger than its former bag. Of course, this might be practically important to consumers who feel that, because the new bag is 25 percent less expensive and is more environmentally sound, it is definitely worth purchasing if it has any strength advantage. However, to customers who are looking only for a substantial increase in bag strength, the statistical results would not be practically important. This illustrates that, in general, a finding of statistical significance (that is, concluding that the alternative hypothesis is true) can be practically important to some people but not to others. Notice that the point estimate of the parameter involved in a hypothesis test can help us to assess practical importance. We can also use confidence intervals to help assess practical importance.

Considerations in setting α

We have reasoned in Section 9.1 that the television network has set α equal to .05 rather than .01 because doing so means that β, the probability of failing to advertise a true claim (a Type II error), will be smaller than it would be if α were set at .01. It is informative, however, to see what would have happened if the network had set α equal to .01. Figure 9.2 illustrates that as we decrease α from .05 to .01, the critical value  increases from z.05 = 1.645 to z.01 = 2.33. Because the test statistic value z = 2.20 is less than z.01 = 2.33, we cannot reject H0: μ = 50 in favor of Haμ > 50 by setting α equal to .01. This illustrates the point that, the smaller we set α, the larger is the critical value, and thus the stronger is the statistical evidence that we are requiring to reject the null hypothesis H0. Some statisticians have concluded (somewhat subjectively) that (1) if we set α equal to .05, then we are requiring strong evidence to reject H0; and (2) if we set α equal to .01, then we are requiring very strong evidence to reject H0.

Figure 9.2: The Critical Values for Testing H0: μ = 50 versus Haμ > 50 by Setting α = .05 and .01

p -value for testing a “greater than” alternative hypothesis

To decide whether to reject the null hypothesis H0 at level of significance α, steps 4, 5, and 6 of the seven-step hypoth esis testing procedure compare the test statistic value with a critical value. Another way to make this decision is to calculate a p -value, which measures the likelihood of the sample results if the null hypothesis H0 is true. Sample results that are not likely if H0 is true are evidence that H0 is not true. To test H0 by using a p-value, we use the following steps 4, 5, and 6:

Step 4: Collect the sample data and compute the value of the test statistic. In the trash bag case, we have computed the value of the test statistic to be z = 2.20.

Step 5: Calculate the p-value by using the test statistic value. The p-value for testing H0: μ = 50 versus Haμ > 50 in the trash bag case is the area under the standard normal curve to the right of the test statistic value z = 2.20. As illustrated in Figure 9.3(b), this area is 1 − .9861 = .0139. The p-value is the probability, computed assuming that H0: μ = 50 is true, of observing a value of the test statistic that is greater than or equal to the value z = 2.20 that we have actually computed from the sample data. The p-value of .0139 says that, if H0: μ = 50 is true, then only 139 in 10,000 of all possible test statistic values are at least as large, or extreme, as the value z = 2.20. That is, if we are to believe that H0 is true, we must believe that we have observed a test statistic value that can be described as a 139 in 10,000 chance. Because it is difficult to believe that we have observed a 139 in 10,000 chance, we intuitively have strong evidence that H0: μ = 50 is false and Haμ > 50 is true.

Figure 9.3: Testing H0: μ = 50 versus Haμ > 50 by Using Critical Values and the p-Value

Step 6: Reject H0 if the p-value is less than α. Recall that the television network has set α equal to .05. The p-value of .0139 is less than the α of .05. Comparing the two normal curves in Figures 9.3(a) and (b), we see that this implies that the test statistic value z = 2.20 is greater than the critical value z.05 = 1.645. Therefore, we can reject H0 by setting α equal to .05. As another example, suppose that the television network had set α equal to .01. The p-value of .0139 is greater than the α of .01. Comparing the two normal curves in Figures 9.3(b) and (c), we see that this implies that the test statistic value z = 2.20 is less than the critical value z.01 = 2.33. Therefore, we cannot reject H0 by setting α equal to .01. Generalizing these examples, we conclude that the value of the test statistic z will be greater than the critical value  if and only if the p-value is less than αThat is, we can reject H0 in favor of Ha at level of significance α if and only if the p-value is less than α.

© NBC, Inc. Used with permission.

Note: This logo appears on an NBC advertising standards booklet. This booklet, along with other information provided by NBC and CBS, forms the basis for much of the discussion in the paragraph to the right.

Comparing the critical value and p-value methods

Thus far we have considered two methods for testing H0: μ = 50 versus Haμ > 50 at the .05 and .01 values of α. Using the first method, we determine if the test statistic value z = 2.20 is greater than the critical values z.05 = 1.645 and z.01 = 2.33. Using the second method, we determine if the p-value of .0139 is less than .05 and .01. Whereas the critical value method requires that we look up a different critical value for each different α value, the p-value method requires only that we calculate a single p-value and compare it directly with the different α values. It follows that the p-value method is the most efficient way to test a hypothesis at different α values. This can be useful when there are different decision makers who might use different α values. For example, television networks do not always evaluate advertising claims by setting α equal to .05. The reason is that the consequences of a Type I error (advertising a false claim) are more serious for some claims than for others. For example, the consequences of a Type I error would be fairly serious for a claim about the effectiveness of a drug or for the superiority of one product over another. However, these consequences might not be as serious for a noncomparative claim about an inexpensive and safe product, such as a cosmetic. Networks sometimes use α values between .01 and .04 for claims having more serious Type I error consequences, and they sometimes use α values between .06 and .10 for claims having less serious Type I error consequences. Furthermore, one network’s policies for setting α can differ somewhat from those of another. As a result, reporting an advertising claim’s p-value to each network is the most efficient way to tell the network whether to allow the claim to be advertised. For example, most networks would evaluate the trash bag claim by choosing an α value between .025 and .10. Since the p-value of .0139 is less than all these α values, most networks would allow the trash bag claim to be advertised.

A summary of the seven steps of hypothesis testing

For almost every hypothesis test discussed in this book, statisticians have developed both a critical value rule and a p-value that can be used to perform the hypothesis test. Furthermore, it can be shown that for each hypothesis test the p-value has been defined so that we can reject the null hypothesis at level of significance α if and only if the p-value is less than α . We now summarize a seven-step procedure for performing a hypothesis test.

The Seven Steps of Hypothesis Testing

1 State the null hypothesis H0 and the alternative hypothesis Ha.

2 Specify the level of significance α.

3 Select the test statistic.

Using a critical value rule:

4 Determine the critical value rule for deciding whether to reject H0. Use the specified value of α to find the critical value in the critical value rule.

5 Collect the sample data and compute the value of the test statistic.

6 Decide whether to reject H0 by using the test statistic value and the critical value rule.

Using a p-value:

4 Collect the sample data and compute the value of the test statistic.

5 Calculate the p-value by using the test statistic value.

6 Reject H0 at level of significance α if the p-value is less than α.

7 Interpret your statistical results in managerial (real-world) terms and assess their practical importance.

In the real world both critical value rules and p-values are used to carry out hypothesis tests. For example, NBC uses critical value rules, whereas CBS uses p-values, to statistically verify the validity of advertising claims. Throughout this book we will continue to present both the critical value and the p-value approaches to hypothesis testing.

Testing a “less than” alternative hypothesis

We next consider the payment time case and testing a “less than” alternative hypothesis:

Step 1: In order to study whether the new electronic billing system reduces the mean bill payment time by more than 50 percent, the management consulting firm will test H0: μ ≥ 19.5 versus Haμ < 19.5.

Step 2: The management consulting firm wishes to make sure that it truthfully describes the benefits of the new system both to the Hamilton, Ohio, trucking company and to other companies that are considering installing such a system. Therefore, the firm will require very strong evidence to conclude that μ is less than 19.5, which implies that it will test H0: μ ≥ 19.5 versus Haμ < 19.5 by setting α equal to .01.

Step 3: In order to test H0: μ ≥ 19.5 versus Haμ < 19.5, we will test the modified null hypothesis H0: μ = 19.5 versus Haμ < 19.5. The idea here is that if there is sufficient evidence to reject the hypothesis that μ equals 19.5 in favor of μ < 19.5, then there is certainly also sufficient evidence to reject the hypothesis that μ is greater than or equal to 19.5. In order to test H0: μ = 19.5 versus Haμ < 19.5, we will randomly select a sample of n = 65 invoices paid using the billing system and calculate the mean  of the payment times of these invoices. Since the sample size is large, the Central Limit Theorem applies, and we will utilize the test statistic

A value of the test statistic z that is less than zero results when  is less than 19.5. This provides evidence to support rejecting H0 in favor of Ha because the point estimate  indicates that μ might be less than 19.5.

Step 4: To decide how much less than zero the test statistic must be to reject H0 in favor of Ha by setting the probability of a Type I error equal to α, we do the following:

 Place the probability of a Type I error, α, in the left-hand tail of the standard normal curve and use the normal table to find the critical value −. Here − is the negative of the normal point . That is, − is the point on the horizontal axis under the standard normal curve that gives a left-hand tail area equal to α.

 Reject H0: μ = 19.5 in favor of Haμ < 19.5 if and only if the test statistic z is less than the critical value −. Because α equals .01, the critical value − is −z.01 = −2.33 [see Fig. 9.4(a)].

Figure 9.4: Testing H0: μ = 19.5 versus Haμ < 19.5 by Using Critical Values and the p-Value

Step 5: When the sample of n = 65 invoices is randomly selected, the mean of the payment times of these invoices is calculated to be . Assuming that σ is known to equal 4.2, the value of the test statistic is

Step 6: Since the test statistic value z = −2.67 is less than the critical value −z.01 = −2.33, we can reject H0: μ = 19.5 in favor of Haμ < 19.5 by setting α equal to .01.

Step 7: We conclude (at an α of .01) that the mean payment time for the new electronic billing system is less than 19.5 days. This, along with the fact that the sample mean  is slightly less than 19.5, implies that it is reasonable for the management consulting firm to conclude that the new electronic billing system has reduced the mean payment time by slightly more than 50 percent (a substantial improvement over the old system).

p-value for testing a “less than” alternative hypothesis

To test H0: μ = 19.5 versus Haμ < 19.5 in the payment time case by using a p-value, we use the following steps 4, 5, and 6:

Step 4: We have computed the value of the test statistic in the payment time case to be z = −2.67.

Step 5: The p-value for testing H0: μ = 19.5 versus Haμ < 19.5 is the area under the standard normal curve to the left of the test statistic value z = −2.67. As illustrated in Figure 9.4(b), this area is .0038. The p-value is the probability, computed assuming that H0: μ = 19.5 is true, of observing a value of the test statistic that is less than or equal to the value z = −2.67 that we have actually computed from the sample data. The p-value of .0038 says that, if H0: μ = 19.5 is true, then only 38 in 10,000 of all possible test statistic values are at least as negative, or extreme, as the value z = −2.67. That is, if we are to believe that H0 is true, we must believe that we have observed a test statistic value that can be described as a 38 in 10,000 chance.

Step 6: The management consulting firm has set α equal to .01. The p-value of .0038 is less than the α of .01. Therefore, we can reject H0 by setting α equal to .01.

Testing a “not equal to” alternative hypothesis

We next consider the Valentine’s Day chocolate case and testing a “not equal to” alternative hypothesis.

Step 1: To assess whether this year’s sales of its valentine box of assorted chocolates will be ten percent higher than last year’s, the candy company will test H0: μ = 330 versus Haμ ≠ 330. Here, μ is the mean order quantity of this year’s valentine box by large retail stores.

Step 2: If the candy company does not reject H0: μ = 330 and H0: μ = 330 is false—a Type II error—the candy company will base its production of valentine boxes on a 10 percent projected sales increase that is not correct. Since the candy company wishes to have a reasonably small probability of making this Type II error, the company will set α equal to .05. Setting α equal to .05 rather than .01 makes the probability of a Type II error smaller than it would be if α were set at .01. Note that in optional Section 9.5 we will verify that the probability of a Type II error in this situation is reasonably small. Therefore, if the candy company ends up not rejecting H0: μ = 330 and therefore decides to base its production of valentine boxes on the ten percent projected sales increase, the company can be intuitively confident that it has made the right decision.

Step 3: The candy company will randomly select n = 100 large retail stores and will make an early mailing to these stores promoting this year’s valentine box of assorted chocolates. The candy company will then ask each sampled retail store to report its anticipated order quantity of valentine boxes and will calculate the mean  of the reported order quantities. Since the sample size is large, the Central Limit Theorem applies, and we will utilize the test statistic

A value of the test statistic that is greater than 0 results when  is greater than 330. This provides evidence to support rejecting H0 in favor of Ha because the point estimate  indicates that μ might be greater than 330. Similarly, a value of the test statistic that is less than 0 results when  is less than 330. This also provides evidence to support rejecting H0 in favor of Ha because the point estimate  indicates that μ might be less than 330.

Step 4: To decide how different from zero (positive or negative) the test statistic must be in order to reject H0 in favor of Ha by setting the probability of a Type I error equal to α, we do the following:

 Divide the probability of a Type I error, α, into two equal parts, and place the area α/2 in the right-hand tail of the standard normal curve and the area α/2 in the left-hand tail of the standard normal curve. Then use the normal table to find the critical values /2 and −/2. Here /2 is the point on the horizontal axis under the standard normal curve that gives a right-hand tail area equal to α/2, and −/2 is the point giving a left-hand tail area equal to α/2.

 Reject H0: μ = 330 in favor of Haμ ≠ 330 if and only if the test statistic z is greater than the critical value /2 or less than the critical value −/2. Note that this is equivalent to saying that we should reject H0 if and only if the absolute value of the test statistic, | z | is greater than the critical value /2. Because α equals .05, the critical values are [see Figure 9.5(a)]

Figure 9.5: Testing H0: μ = 330 versus Haμ ≠ 330 by Using Critical Values and the p-Value

Step 5: When the sample of n = 100 large retail stores is randomly selected, the mean of their reported order quantities is calculated to be . Assuming that σ is known to equal 40, the value of the test statistic is

Step 6: Since the test statistic value z = −1 is greater than − z.025 = −1.96 (or, equivalently, since | z | = 1 is less than z.025 = 1.96), we cannot reject H0: μ = 330 in favor of Haμ ≠ 330 by setting α equal to .05.

Step 7: We cannot conclude (at an α of .05) that the mean order quantity of this year’s valentine box by large retail stores will differ from 330 boxes. Therefore, the candy company will base its production of valentine boxes on the ten percent projected sales increase.

p -value for testing a “not equal to” alternative hypothesis

To test H0: μ = 330 versus Haμ ≠ 330 in the Valentine’s Day chocolate case by using a p-value, we use the following steps 4, 5, and 6:

Step 4: We have computed the value of the test statistic in the Valentine’s Day chocolate case to be z = −1.

Step 5: Note from Figure 9.5(b) that the area under the standard normal curve to the right of | z | = 1 is .1587. Twice this area—that is, 2(.1587) = .3174—is the p-value for testing H0: μ = 330 versus Haμ ≠ 330. To interpret the p-value as a probability, note that the symmetry of the standard normal curve implies that twice the area under the curve to the right of | z | = 1 equals the area under this curve to the right of 1 plus the area under the curve to the left of −1 [see Figure 9.5(b)]. Also, note that since both positive and negative test statistic values count against H0: μ = 330, a test statistic value that is either greater than or equal to 1 or less than or equal to −1 is at least as extreme as the observed test statistic value z = −1. It follows that the p-value of .3174 says that, if H0: μ = 330 is true, then 31.74 percent of all possible test statistic values are at least as extreme as z = −1. That is, if we are to believe that H0 is true, we must believe that we have observed a test statistic value that can be described as a 31.74 percent chance.

Step 6: The candy company has set α equal to .05. The p-value of .3174 is greater than the α of .05. Therefore, we cannot reject H0 by setting α equal to .05.

A general procedure for testing a hypothesis about a population mean

In the trash bag case we have tested H0: μ ≤ 50 versus Haμ > 50 by testing H0: μ = 50 versus Haμ > 50. In the payment time case we have tested H0: μ ≥ 19.5 versus Haμ < 19.5 by testing H0: μ = 19.5 versus Haμ < 19.5. In general, the usual procedure for testing a “less than or equal to” null hypothesis or a “greater than or equal to” null hypothesis is to change the null hypothesis to an equality. We then test the “equal to” null hypothesis versus the alternative hypothesis. Furthermore, the critical value and p-value procedures for testing a null hypothesis versus an alternative hypothesis depend upon whether the alternative hypothesis is a “greater than,” a “less than,” or a “not equal to” alternative hypothesis. The following summary box gives the appropriate procedures. Specifically, letting μ0 be a particular number, the summary box shows how to test H0: μ = μ0 versus either Haμ > μ0, Haμ < μ0, or Haμ ≠ μ0:

Testing a Hypothesis about a Population Mean when σ Is Known

Define the test statistic

and assume that the population sampled is normally distributed, or that the sample size n is large. We can test H0: μ = μ0 versus a particular alternative hypothesis at level of significance α by using the appropriate critical value rule, or, equivalently, the corresponding p-value.

 
"Looking for a Similar Assignment? Get Expert Help at an Amazing Discount!"