# Hypothesis Testing

· From the textbook,* Business Statistics in Practice*, read the following chapters:

CHAPTER 9: Hypothesis Testing

Chapter Outline

__9.1__ __The Null and Alternative Hypotheses and Errors in Hypothesis Testing__

__9.2__ __z__ __Tests about a Population Mean: ____σ__ __Known__

__9.3__ __t__ __Tests about a Population Mean: ____σ__ __Unknown__

__9.4__ __z__ __Tests about a Population Proportion__

__9.5__ __Type II Error Probabilities and Sample Size Determination (Optional)__

__9.6__ __The Chi-Square Distribution (Optional)__

__9.7__ __Statistical Inference for a Population Variance (Optional)__

**Hypothesis testing** is a statistical procedure used to provide evidence in favor of some statement (called a *hypothesis*). For instance, hypothesis testing might be used to assess whether a population parameter, such as a population mean, differs from a specified standard or previous value. In this chapter we discuss testing hypotheses about population means, proportions, and variances.

In order to illustrate how hypothesis testing works, we revisit several cases introduced in previous chapters and also introduce some new cases:

**The Payment Time Case:** The consulting firm uses hypothesis testing to provide strong evidence that the new electronic billing system has reduced the mean payment time by more than 50 percent.

**The Cheese Spread Case:** The cheese spread producer uses hypothesis testing to supply extremely strong evidence that fewer than 10 percent of all current purchasers would stop buying the cheese spread if the new spout were used.

**The Electronic Article Surveillance Case:** A company that sells and installs EAS systems claims that at most 5 percent of all consumers would never shop in a store again if the store subjected them to a false EAS alarm. A store considering the purchase of such a system uses hypothesis testing to provide extremely strong evidence that this claim is not true.

**The Trash Bag Case:** A marketer of trash bags uses hypothesis testing to support its claim that the mean breaking strength of its new trash bag is greater than 50 pounds. As a result, a television network approves use of this claim in a commercial.

**The Valentine’s Day Chocolate Case:** A candy company projects that this year’s sales of its special valentine box of assorted chocolates will be 10 percent higher than last year. The candy company uses hypothesis testing to assess whether it is reasonable to plan for a 10 percent increase in sales of the valentine box.

9.1: The Null and Alternative Hypotheses and Errors in Hypothesis Testing

One of the authors’ former students is employed by a major television network in the standards and practices division. One of the division’s responsibilities is to reduce the chances that advertisers will make false claims in commercials run on the network. Our former student reports that the network uses a statistical methodology called **hypothesis testing** to do this.

__Chapter 9__

To see how this might be done, suppose that a company wishes to advertise a claim, and suppose that the network has reason to doubt that this claim is true. The network assumes for the sake of argument that **the claim is not valid.** This assumption is called the ** null hypothesis.** The statement that

**the claim is valid**is called the

**or**

__alternative__,**The network will run the commercial only if the company making the claim provides**

__research, hypothesis__.**sufficient sample evidence**to reject the null hypothesis that the claim is not valid in favor of the alternative hypothesis that the claim is valid. Explaining the exact meaning of

*sufficient sample evidence*is quite involved and will be discussed in the next section.

The Null Hypothesis and the Alternative Hypothesis

In hypothesis testing:

**1** The ** null hypothesis,** denoted

*H*0, is the statement being tested. Usually this statement represents the

*status quo*and is not rejected unless there is convincing sample evidence that it is false.

**2** The ** alternative,** or

**de noted**

__research, hypothesis__,*Ha*, is a statement that will be accepted only if there is convincing sample evidence that it is true.

Setting up the null and alternative hypotheses in a practical situation can be tricky. In some situations there is a condition for which we need to attempt to find supportive evidence. We then formulate (1) the alternative hypothesis to be the statement that this condition exists and (2) the null hypothesis to be the statement that this condition does not exist. To illustrate this, we consider the following case studies.

EXAMPLE 9.1: The Trash Bag Case__1__

A leading manufacturer of trash bags produces the strongest trash bags on the market. The company has developed a new 30-gallon bag using a specially formulated plastic that is stronger and more biodegradable than other plastics. This plastic’s increased strength allows the bag’s thickness to be reduced, and the resulting cost savings will enable the company to lower its bag price by 25 percent. The company also believes the new bag is stronger than its current 30-gallon bag.

The manufacturer wants to advertise the new bag on a major television network. In addition to promoting its price reduction, the company also wants to claim the new bag is better for the environment and stronger than its current bag. The network is convinced of the bag’s environmental advantages on scientific grounds. However, the network questions the company’s claim of increased strength and requires statistical evidence to justify this claim. Although there are various measures of bag strength, the manufacturer and the network agree to employ “breaking strength.” A bag’s breaking strength is the amount of a representative trash mix (in pounds) that, when loaded into a bag suspended in the air, will cause the bag to rip or tear. Tests show that the current bag has a mean breaking strength that is very close to (but does not exceed) 50 pounds. The new bag’s mean breaking strength *μ* is unknown and in question. The alternative hypothesis *Ha* is the statement for which we wish to find supportive evidence. Because we hope the new bags are stronger than the current bags, *Ha* says that *μ* is greater than 50. The null hypothesis states that *Ha* is false. Therefore, *H*0 says that *μ* is less than or equal to 50. We summarize these hypotheses by stating that we are testing

*H*0: *μ* ≤ 50 versus *Ha*: *μ* > 50

The network will run the manufacturer’s commercial if a random sample of *n* new bags provides sufficient evidence to reject *H*0: *μ* ≤ 50 in favor of *Ha*: *μ* > 50.

EXAMPLE 9.2: The Payment Time Case

Recall that a management consulting firm has installed a new computer-based, electronic billing system for a Hamilton, Ohio, trucking company. Because of the system’s advantages, and because the trucking company’s clients are receptive to using this system, the management consulting firm believes that the new system will reduce the mean bill payment time by more than 50 percent. The mean payment time using the old billing system was approximately equal to, but no less than, 39 days. Therefore, if *μ* denotes the mean payment time using the new system, the consulting firm believes that *μ* will be less than 19.5 days. Because it is hoped that the new billing system *reduces* mean payment time, we formulate the alternative hypothesis as *Ha*: *μ* < 19.5 and the null hypothesis as *H*0: *μ* ≥ 19.5. The consulting firm will randomly select a sample of *n* invoices and determine if their payment times provide sufficient evidence to reject *H*0: *μ* ≥ 19.5 in favor of *Ha*: *μ* < 19.5. If such evidence exists, the consulting firm will conclude that the new electronic billing system has reduced the Hamilton trucking company’s mean bill payment time by more than 50 percent. This conclusion will be used to help demonstrate the benefits of the new billing system both to the Hamilton company and to other trucking companies that are considering using such a system.

EXAMPLE 9.3: The Valentine’s Day Chocolate Case __2__

A candy company annually markets a special 18 ounce box of assorted chocolates to large retail stores for Valentine’s Day. This year the candy company has designed an extremely attractive new valentine box and will fill the box with an especially appealing assortment or chocolates. For this reason, the candy company subjectively projects—based on past experience and knowledge of the candy market—that sales of its valentine box will be 10 percent higher than last year. However, since the candy company must decide how many valentine boxes to produce, the company needs to assess whether it is reasonable to plan for a 10 percent increase in sales.

Before the beginning of each Valentine’s Day sales season, the candy company sends large retail stores information about its newest valentine box of assorted chocolates. This information includes a description of the box of chocolates, as well as a preview of advertising displays that the candy company will provide to help retail stores sell the chocolates. Each retail store then places a single (nonreturnable) order of valentine boxes to satisfy its anticipated customer demand for the Valentine’s Day sales season. Last year the mean order quantity of large retail stores was 300 boxes per store. If the projected 10 percent sales increase will occur, the mean order quantity, *μ*, of large retail stores this year will be 330 boxes per store. Therefore, the candy company wishes to test the null hypothesis *H*0: *μ* = 330 versus the alternative hypothesis *Ha*: *μ* ≠ 330.

To perform the hypothesis test, the candy company will randomly select a sample of *n* large retail stores and will make an early mailing to these stores promoting this year’s valentine box. The candy company will then ask each retail store to report how many valentine boxes it anticipates ordering. If the sample data do not provide sufficient evidence to reject *H*0: *μ* = 330 in favor of *Ha*: *μ* ≠ 330, the candy company will base its production on the projected 10 percent sales increase. On the other hand, if there is sufficient evidence to reject *H*0: *μ* = 330, the candy company will change its production plans.

We next summarize the sets of null and alternative hypotheses that we have thus far considered.

The alternative hypothesis *Ha*: *μ* > 50 is called a **one-sided, greater than alternative **hypothesis, whereas

*Ha*:

*μ*< 19.5 is called a

**one-sided,**hypothesis, and

__less than alternative__*Ha*:

*μ*≠ 330 is called a

**two-sided,**hypothesis. Many of the alternative hypotheses we consider in this book are one of these three types. Also, note that each null hypothesis we have considered involves an

__not equal to alternative__**equality.**For example, the null hypothesis

*H*0:

*μ*≤ 50 says that

*μ*is either less than or

**equal to**50. We will see that, in general, the approach we use to test a null hypothesis versus an alternative hypothesis requires that the null hypothesis involve an equality.

The idea of a test statistic

Suppose that in the trash bag case the manufacturer randomly selects a sample of *n* = 40 new trash bags. Each of these bags is tested for breaking strength, and the sample mean of the 40 breaking strengths is calculated. In order to test *H*0: *μ* ≤ 50 versus *Ha*: *μ* > 50, we utilize the __test statistic__

The test statistic *z* measures the distance between and 50. The division by says that this distance is measured in units of the standard deviation of all possible sample means. For example, a value of *z* equal to, say, 2.4 would tell us that is 2.4 such standard deviations above 50. In general, a value of the test statistic that is less than or equal to zero results when is less than or equal to 50. This provides no evidence to support rejecting *H*0 in favor of *Ha* because the point estimate indicates that *μ* is probably less than or equal to 50. However, a value of the test statistic that is greater than zero results when is greater than 50. This provides evidence to support rejecting *H*0 in favor of *Ha* because the point estimate indicates that *μ* might be greater than 50. Furthermore, the farther the value of the test statistic is above 0 (the farther is above 50), the stronger is the evidence to support rejecting *H*0 in favor of *Ha*.

Hypothesis testing and the legal system

If the value of the test statistic *z* is far enough above 0, we reject *H*0 in favor of *Ha*. To see how large *z* must be in order to reject *H*0, we must understand that **a hypothesis test rejects a null hypothesis H0 only if there is strong statistical evidence against H0.** This is similar to our legal system, which rejects the innocence of the accused only if evidence of guilt is beyond a reasonable doubt. For instance, the network will reject

*H*0:

*μ*≤ 50 and run the trash bag commercial only if the test statistic

*z*is far enough above 0 to show beyond a reasonable doubt that

*H*0:

*μ*≤ 50 is false and

*Ha*:

*μ*> 50 is true. A test statistic that is only slightly greater than 0 might not be convincing enough. However, because such a test statistic would result from a sample mean that is slightly greater than 50, it would provide some evidence to support rejecting

*H*0:

*μ*≤ 50, and it certainly would not provide strong evidence sup porting

*H*0:

*μ*≤ 50. Therefore, if the value of the test statistic is not large enough to convince us to reject

*H*0,

**we do not say that we accept**because the evidence against

*H*0. Rather we say that we do not reject*H*0*H*0 is not strong enough. Again, this is similar to our legal system, where the lack of evidence of guilt beyond a reasonable doubt results in a verdict of

**not guilty,**but does not prove that the accused is innocent.

Type I and Type II errors and their probabilities

To determine exactly how much statistical evidence is required to reject *H*0, we consider the errors and the correct decisions that can be made in hypothesis testing. These errors and correct decisions, as well as their implications in the trash bag advertising example, are summarized in __Tables 9.1__ and __9.2__. Across the top of each table are listed the two possible “states of nature.” Either *H*0: *μ* ≤ 50 is true, which says the manufacturer’s claim that *μ* is greater than 50 is false, or *H*0 is false, which says the claim is true. Down the left side of each table are listed the two possible decisions we can make in the hypothesis test. Using the sample data, we will either reject *H*0: *μ* ≤ 50, which implies that the claim will be advertised, or we will not reject *H*0, which implies that the claim will not be advertised.

Table 9.1: Type I and Type II Errors

Table 9.2: The Implications of Type I and Type II Errors in the Trash Bag Example

In general, the two types of errors that can be made in hypothesis testing are defined here:

**Type I and Type II Errors**

If we reject *H*0 when it is true, this is a __Type I error__.

If we do not reject *H*0 when it is false, this is a __Type II error__.

As can be seen by comparing __Tables 9.1__ and __9.2__, if we commit a Type I error, we will advertise a false claim. If we commit a Type II error, we will fail to advertise a true claim.

We now let the symbol ** α **(pronounced

**alpha**)

**denote the probability of a Type I error,**and we let

**(pronounced**

*β***beta**)

**denote the probability of a Type II error.**Obviously, we would like both

*α*and

*β*to be small. A common (but not the only) procedure is to base a hypothesis test on taking a sample of a fixed size (for example,

*n*= 40 trash bags) and on setting

*α*equal to a small prespecified value. Setting

*α*low means there is only a small chance of rejecting

*H*0 when it is true. This implies that we are requiring strong evidence against

*H*0 before we reject it.

We sometimes choose *α* as high as .10, but we usually choose *α* between .05 and .01. A frequent choice for *α* is .05. In fact, our former student tells us that the network often tests advertising claims by setting the probability of a Type I error equal to .05. That is, the network will run a commercial making a claim if the sample evidence allows it to reject a null hypothesis that says the claim is not valid in favor of an alternative hypothesis that says the claim is valid with *α* set equal to .05. Since a Type I error is deciding that the claim is valid when it is not, the policy of setting *α* equal to .05 says that, in the long run, the network will advertise only 5 percent of all invalid claims made by advertisers.

One might wonder why the network does not set *α* lower—say at .01. One reason is that **it can be shown that, for a fixed sample size, the lower we set α, the higher is β, and the higher we set α, the lower is β.** Setting

*α*at .05 means that

*β*, the probability of failing to advertise a true claim (a Type II error), will be smaller than it would be if

*α*were set at .01. As long as (1) the claim to be advertised is plausible and (2) the consequences of advertising the claim even if it is false are not terribly serious, then it is reasonable to set

*α*equal to .05. However, if either (1) or (2) is not true, then we might set

*α*lower than .05. For example, suppose a pharmaceutical company wishes to advertise that it has developed an effective treatment for a disease that has formerly been very resistant to treatment. Such a claim is (perhaps) difficult to believe. Moreover, if the claim is false, patients suffering from the disease would be subjected to false hope and needless expense. In such a case, it might be reasonable for the network to set

*α*at .01 because this would lower the chance of advertising the claim if it is false. We usually do not set

*α*lower than .01 because doing so often leads to an unacceptably large value of

*β*. We explain some methods for computing the probability of a Type II error in optional

__Section 9.6__. However,

*β*can be difficult or impossible to calculate in many situations, and we often must rely on our intuition when deciding how to set

*α*.

Exercises for Section 9.1

CONCEPTS

**9.1** Which hypothesis (the null hypothesis, *H*0, or the alternative hypothesis, *Ha*) is the “status quo” hypothesis (that is, the hypothesis that states that things are remaining “as is”)? Which hypothesis is the hypothesis that says that a “hoped for” or “suspected” condition exists?

**9.2** Which hypothesis (*H*0 or *Ha*) is not rejected unless there is convincing sample evidence that it is false? Which hypothesis (*H*0 or *Ha*) will be accepted only if there is convincing sample evidence that it is true?

**9.3** Define each of the following:

**a** Type I error

**b** Type II error

**c** *α*

**d** *β*

**9.4** For each of the following situations, indicate whether an error has occurred and, if so, indicate what kind of error (Type I or Type II) has occurred.

**a** We do not reject *H*0 and *H*0 is true.

**b** We reject *H*0 and *H*0 is true.

**c** We do not reject *H*0 and *H*0 is false.

**d** We reject *H*0 and *H*0 is false.

**9.5** If we reject *H*0, what is the only type of error that we could be making? Explain.

**9.6** If we do not reject *H*0, what is the only type of error that we could be making? Explain.

**9.7** When testing a hypothesis, why don’t we set the probability of a Type I error to be extremely small? Explain.

METHODS AND APPLICATIONS

**9.8** **THE VIDEO GAME SATISFACTION RATING CASE** VideoGame

Recall that “very satisfied” customers give the XYZ-Box video game system a rating that is at least 42. Suppose that the manufacturer of the XYZ-Box wishes to use the random sample of 65 satisfaction ratings to provide evidence supporting the claim that the mean composite satisfaction rating for the XYZ-Box exceeds 42.

**a** Letting *μ* represent the mean composite satisfaction rating for the XYZ-Box, set up the null and alternative hypotheses needed if we wish to attempt to provide evidence supporting the claim that *μ* exceeds 42.

**b** In the context of this situation, interpret making a Type I error; interpret making a Type II error.

**9.9** **THE BANK CUSTOMER WAITING TIME CASE** WaitTime

Recall that a bank manager has developed a new system to reduce the time customers spend waiting for teller service during peak hours. The manager hopes the new system will reduce waiting times from the current 9 to 10 minutes to less than 6 minutes.

Suppose the manager wishes to use the random sample of 100 waiting times to support the claim that the mean waiting time under the new system is shorter than six minutes.

**a** Letting *μ* represent the mean waiting time under the new system, set up the null and alternative hypotheses needed if we wish to attempt to provide evidence supporting the claim that *μ* is shorter than six minutes.

**b** In the context of this situation, interpret making a Type I error; interpret making a Type II error.

**9.10** An automobile parts supplier owns a machine that produces a cylindrical engine part. This part is supposed to have an outside diameter of three inches. Parts with diameters that are too small or too large do not meet customer requirements and must be rejected. Lately, the company has experienced problems meeting customer requirements. The technical staff feels that the mean diameter produced by the machine is off target. In order to verify this, a special study will randomly sample 100 parts produced by the machine. The 100 sampled parts will be measured, and if the results obtained cast a substantial amount of doubt on the hypothesis that the mean diameter equals the target value of three inches, the company will assign a problem-solving team to intensively search for the causes of the problem.

**a** The parts supplier wishes to set up a hypothesis test so that the problem-solving team will be assigned when the null hypothesis is rejected. Set up the null and alternative hypotheses for this situation.

**b** In the context of this situation, interpret making a Type I error; interpret making a Type II error.

**c** Suppose it costs the company $3,000 a day to assign the problem-solving team to a project. Is this $3,000 figure the daily cost of a Type I error or a Type II error? Explain.

**9.11** The Crown Bottling Company has just installed a new bottling process that will fill 16-ounce bottles of the popular Crown Classic Cola soft drink. Both overfilling and underfilling bottles are undesirable: Underfilling leads to customer complaints and overfilling costs the company considerable money. In order to verify that the filler is set up correctly, the company wishes to see whether the mean bottle fill, *μ*, is close to the target fill of 16 ounces. To this end, a random sample of 36 filled bottles is selected from the output of a test filler run. If the sample results cast a substantial amount of doubt on the hypothesis that the mean bottle fill is the desired 16 ounces, then the filler’s initial setup will be readjusted.

**a** The bottling company wants to set up a hypothesis test so that the filler will be readjusted if the null hypothesis is rejected. Set up the null and alternative hypotheses for this hypothesis test.

**b** In the context of this situation, interpret making a Type I error; interpret making a Type II error.

**9.12** Consolidated Power, a large electric power utility, has just built a modern nuclear power plant. This plant discharges waste water that is allowed to flow into the Atlantic Ocean. The Environmental Protection Agency (EPA) has ordered that the waste water may not be excessively warm so that thermal pollution of the marine environment near the plant can be avoided. Because of this order, the waste water is allowed to cool in specially constructed ponds and is then released into the ocean. This cooling system works properly if the mean temperature of waste water discharged is 60°F or cooler. Consolidated Power is required to monitor the temperature of the waste water. A sample of 100 temperature readings will be obtained each day, and if the sample results cast a substantial amount of doubt on the hypothesis that the cooling system is working properly (the mean temperature of waste water discharged is 60°F or cooler), then the plant must be shut down and appropriate actions must be taken to correct the problem.

**a** Consolidated Power wishes to set up a hypothesis test so that the power plant will be shut down when the null hypothesis is rejected. Set up the null and alternative hypotheses that should be used.

**b** In the context of this situation, interpret making a Type I error; interpret making a Type II error.

**c** The EPA periodically conducts spot checks to determine whether the waste water being discharged is too warm. Suppose the EPA has the power to impose very severe penalties (for example, very heavy fines) when the waste water is excessively warm. Other things being equal, should Consolidated Power set the probability of a Type I error equal to *α* = .01 or *α* = .05? Explain.

**9.13** Consider __Exercise 9.12__, and suppose that Consolidated Power has been experiencing technical problems with the cooling system. Because the system has been unreliable, the company feels it must take precautions to avoid failing to shut down the plant when its waste water is too warm. Other things being equal, should Consolidated Power set the probability of a Type I error equal to *α* = .01 or *α* = .05? Explain.

9.2: *z* Tests about a Population Mean: *σ* Known

In this section we discuss hypothesis tests about a population mean that are *based on the normal distribution*. These tests are called ** z tests,** and they require that the

*true value of the population standard deviation σ is known.*Of course, in most real-world situations the true value of

*σ*is not known. However, the concepts and calculations of hypothesis testing are most easily illustrated using the normal distribution. Therefore, in this section we will assume that—through theory or history related to the population under consideration—we know

*σ*. When

*σ*is unknown, we test hypotheses about a population mean by using the

*t distribution*. In

__Section 9.3__we study

**and we will revisit the examples of this section assuming that**

*t*tests,*σ*is unknown.

__Chapter 9__

Testing a “greater than” alternative hypothesis by using a critical value rule

In __Section 9.1__ we explained how to set up appropriate null and alternative hypotheses. We also discussed how to specify a value for *α*, the probability of a Type I error (also called the **level of significance**) of the hypothesis test, and we introduced the idea of a test statistic. We can use these concepts to begin developing a seven step hypothesis testing procedure. We will introduce these steps in the context of the trash bag case and testing a “greater than” alternative hypothesis.

**Step 1: State the null hypothesis H0 and the alternative hypothesis Ha.** In the trash bag case, we will test

*H*0:

*μ*≤ 50 versus

*Ha*:

*μ*> 50. Here,

*μ*is the mean breaking strength of the new trash bag.

**Step 2: Specify the level of significance α.** The television network will run the commercial stating that the new trash bag is stronger than the former bag if we can reject

*H*0:

*μ*≤ 50 in favor of

*Ha*:

*μ*> 50 by setting

*α*equal to .05.

**Step 3: Select the test statistic.** In order to test *H*0: *μ* ≤ 50 versus *Ha*: *μ* > 50, we will test the modified null hypothesis *H*0: *μ* = 50 versus *Ha*: *μ* > 50. The idea here is that if there is sufficient evidence to reject the hypothesis that *μ* equals 50 in favor of *μ* > 50, then there is certainly also sufficient evidence to reject the hypothesis that *μ* is less than or equal to 50. In order to test *H*0: *μ* = 50 versus *Ha*: *μ* > 50, we will randomly select a sample of *n* = 40 new trash bags and calculate the mean of the breaking strengths of these bags. We will then utilize the **test statistic**

A positive value of this test statistic results from an that is greater than 50 and thus provides evidence against *H*0: *μ* = 50 and in favor of *Ha*: *μ* > 50.

**Step 4: Determine the critical value rule for deciding whether to reject H0.** To decide how large the test statistic

*z*must be to reject

*H*0 in favor of

*Ha*by setting the probability of a Type I error equal to

*α*, we note that different samples would give different sample means and thus different values of

*z*. Because the sample size

*n*= 40 is large, the Central Limit Theorem tells us that the sampling distribution of

*z*is (approximately) a standard normal distribution if the null hypothesis

*H*0:

*μ*= 50 is true. Therefore, we do the following:

Place the probability of a Type I error, *α*, in the right-hand tail of the standard normal curve and use the normal table (see __Table A.3__, __page 863__) to find the normal point *zα*. Here *zα*, which we call a ** critical value,** is the point on the horizontal axis under the standard normal curve that gives a right-hand tail area equal to

*α*.

**Reject H0: μ = 50 in favor of Ha: μ > 50 if and only if the test statistic z is greater than the critical value zα **(This is the

**critical value rule.**)

__Figure 9.1__ illustrates that since we have set *α* equal to .05, we should use the critical value *zα* = *z*.05 = 1.645 (see __Table A.3__). This says that we should reject *H*0 if *z* > 1.645 and we should not reject *H*0 if *z* ≤ 1.645.

Figure 9.1: The Critical Value for Testing *H*0: *μ* = 50 versus *Ha*: *μ* > 50 by Setting *α* = .05

To better understand the critical value rule, consider the standard normal curve in __Figure 9.1__. The area of .05 in the right-hand tail of this curve implies that values of the test statistic *z* that are greater than 1.645 are unlikely to occur if the null hypothesis *H*0: *μ* = 50 is true. There is a 5 percent chance of observing one of these values—and thus wrongly rejecting *H*0—if *H*0 is true. However, we are more likely to observe a value of *z* greater than 1.645—and thus correctly reject *H*0—if *H*0 is false. Therefore, it is intuitively reasonable to reject *H*0 if the value of the test statistic *z* is greater than 1.645.

**Step 5: Collect the sample data and compute the value of the test statistic.** When the sample of *n* = 40 new trash bags is randomly selected, the mean of the breaking strengths is calculated to be . Assuming that *σ* is known to equal 1.65, the value of the test statistic is

**Step 6: Decide whether to reject H0 by using the test statistic value and the critical value rule.** Since the test statistic value

*z*= 2.20 is greater than the critical value

*z*.05 = 1.645, we can reject

*H*0:

*μ*= 50 in favor of

*Ha*:

*μ*> 50 by setting

*α*equal to .05. Furthermore, we can be intuitively confident that

*H*0:

*μ*= 50 is false and

*Ha*:

*μ*> 50 is true. This is because, since we have rejected

*H*0 by setting

*α*equal to .05, we have rejected

*H*0 by using a test that allows only a 5 percent chance of wrongly rejecting

*H*0. In general, if we can reject a null hypothesis in favor of an alternative hypothesis by setting the probability of a Type I error equal to

*α*, we say that we have

__statistical significance at the____α____level__.**Step 7: Interpret the statistical results in managerial (real-world) terms and assess their practical importance.** Since we have rejected *H*0: *μ* = 50 in favor of *Ha*: *μ* > 50 by setting *α* equal to .05, we conclude (at an *α* of .05) that the mean breaking strength of the new trash bag exceeds 50 pounds. Furthermore, this conclusion has practical importance to the trash bag manufacturer because it means that the television network will approve running commercials claiming that the new trash bag is stronger than the former bag. Note, however, that the point estimate of *μ*, , indicates that *μ* is not much larger than 50. Therefore, the trash bag manufacturer can claim only that its new bag is slightly stronger than its former bag. Of course, this might be practically important to consumers who feel that, because the new bag is 25 percent less expensive and is more environmentally sound, it is definitely worth purchasing if it has any strength advantage. However, to customers who are looking only for a substantial increase in bag strength, the statistical results would not be practically important. This illustrates that, in general, a finding of statistical significance (that is, concluding that the alternative hypothesis is true) can be practically important to some people but not to others. Notice that the point estimate of the parameter involved in a hypothesis test can help us to assess practical importance. We can also use confidence intervals to help assess practical importance.

Considerations in setting *α*

We have reasoned in __Section 9.1__ that the television network has set *α* equal to .05 rather than .01 because doing so means that *β*, the probability of failing to advertise a true claim (a Type II error), will be smaller than it would be if *α* were set at .01. It is informative, however, to see what would have happened if the network had set *α* equal to .01. __Figure 9.2__ illustrates that as we decrease *α* from .05 to .01, the critical value *zα* increases from *z*.05 = 1.645 to *z*.01 = 2.33. Because the test statistic value *z* = 2.20 is less than *z*.01 = 2.33, we cannot reject *H*0: *μ* = 50 in favor of *Ha*: *μ* > 50 by setting *α* equal to .01. This illustrates the point that, the smaller we set *α*, the larger is the critical value, and thus the stronger is the statistical evidence that we are requiring to reject the null hypothesis *H*0. Some statisticians have concluded (somewhat subjectively) that (1) **if we set α equal to .05, then we are requiring strong evidence to reject H0**; and (2)

**if we set**.

*α*equal to .01, then we are requiring very strong evidence to reject*H*0Figure 9.2: The Critical Values for Testing *H*0: *μ* = 50 versus *Ha*: *μ* > 50 by Setting *α* = .05 and .01

A __p__ __-value__ for testing a “greater than” alternative hypothesis

To decide whether to reject the null hypothesis *H*0 at level of significance *α*, steps 4, 5, and 6 of the seven-step hypoth esis testing procedure compare the test statistic value with a critical value. Another way to make this decision is to calculate a **p -value,** which measures the likelihood of the sample results if the null hypothesis

*H*0 is true. Sample results that are not likely if

*H*0 is true are evidence that

*H*0 is not true. To test

*H*0 by using a

*p*-value, we use the following steps 4, 5, and 6:

**Step 4: Collect the sample data and compute the value of the test statistic.** In the trash bag case, we have computed the value of the test statistic to be *z* = 2.20.

**Step 5: Calculate the p-value by using the test statistic value.** The

*p*-value for testing

*H*0:

*μ*= 50 versus

*Ha*:

*μ*> 50 in the trash bag case is the area under the standard normal curve to the right of the test statistic value

*z*= 2.20. As illustrated in

__Figure 9.3(b)__, this area is 1 − .9861 = .0139. The

*p*-value is the probability, computed assuming that

*H*0:

*μ*= 50 is true, of observing a value of the test statistic that is greater than or equal to the value

*z*= 2.20 that we have actually computed from the sample data. The

*p*-value of .0139 says that, if

*H*0:

*μ*= 50 is true, then only 139 in 10,000 of all possible test statistic values are at least as large, or extreme, as the value

*z*= 2.20. That is, if we are to believe that

*H*0 is true, we must believe that we have observed a test statistic value that can be described as a 139 in 10,000 chance. Because it is difficult to believe that we have observed a 139 in 10,000 chance, we intuitively have strong evidence that

*H*0:

*μ*= 50 is false and

*Ha*:

*μ*> 50 is true.

Figure 9.3: Testing *H*0: *μ* = 50 versus *Ha*: *μ* > 50 by Using Critical Values and the *p*-Value

**Step 6: Reject H0 if the p-value is less than α.** Recall that the television network has set

*α*equal to .05.

**The**Comparing the two normal curves in

*p*-value of .0139 is less than the*α*of .05.__Figures 9.3(a)__and

__(b)__, we see that this implies that the test statistic value

*z*= 2.20 is greater than the critical value

*z*.05 = 1.645. Therefore,

**we can reject**As another example, suppose that the television network had set

*H*0 by setting*α*equal to .05.*α*equal to .01.

**The**Comparing the two normal curves in

*p*-value of .0139 is greater than the*α*of .01.__Figures 9.3(b)__and

__(c)__, we see that this implies that the test statistic value

*z*= 2.20 is less than the critical value

*z*.01 = 2.33. Therefore,

**we cannot reject**Generalizing these examples, we conclude that the value of the test statistic

*H*0 by setting*α*equal to .01.*z*will be greater than the critical value

*zα*if and only if the

*p*-value is less than

*α*.

**That is, we can reject**

*H*0 in favor of*Ha*at level of significance*α*if and only if the*p*-value is less than*α*.© NBC, Inc. Used with permission.

Note: This logo appears on an NBC advertising standards booklet. This booklet, along with other information provided by NBC and CBS, forms the basis for much of the discussion in the paragraph to the right.

Comparing the critical value and *p*-value methods

Thus far we have considered two methods for testing *H*0: *μ* = 50 versus *Ha*: *μ* > 50 at the .05 and .01 values of *α*. Using the first method, we determine if the test statistic value *z* = 2.20 is greater than the critical values *z*.05 = 1.645 and *z*.01 = 2.33. Using the second method, we determine if the *p*-value of .0139 is less than .05 and .01. Whereas the critical value method requires that we look up a different critical value for each different *α* value, the *p*-value method requires only that we calculate a single *p*-value and compare it directly with the different *α* values. *It follows that the p-value method is the most efficient way to test a hypothesis at different α values.* This can be useful when there are different decision makers who might use different *α* values. For example, television networks do not always evaluate advertising claims by setting *α* equal to .05. The reason is that the consequences of a Type I error (advertising a false claim) are more serious for some claims than for others. For example, the consequences of a Type I error would be fairly serious for a claim about the effectiveness of a drug or for the superiority of one product over another. However, these consequences might not be as serious for a noncomparative claim about an inexpensive and safe product, such as a cosmetic. Networks sometimes use *α* values between .01 and .04 for claims having more serious Type I error consequences, and they sometimes use *α* values between .06 and .10 for claims having less serious Type I error consequences. Furthermore, one network’s policies for setting *α* can differ somewhat from those of another. As a result, reporting an advertising claim’s *p*-value to each network is the most efficient way to tell the network whether to allow the claim to be advertised. For example, most networks would evaluate the trash bag claim by choosing an *α* value between .025 and .10. Since the *p*-value of .0139 is less than all these *α* values, most networks would allow the trash bag claim to be advertised.

A summary of the seven steps of hypothesis testing

For almost every hypothesis test discussed in this book, statisticians have developed both a critical value rule and a *p*-value that can be used to perform the hypothesis test. Furthermore, it can be shown that for each hypothesis test the *p*-value has been defined so that **we can reject the null hypothesis at level of significance α if and only if the p-value is less than α **. We now summarize a seven-step procedure for performing a hypothesis test.

**The Seven Steps of Hypothesis Testing**

**1** State the null hypothesis *H*0 and the alternative hypothesis *Ha*.

**2** Specify the level of significance *α*.

**3** Select the test statistic.

**Using a critical value rule:**

**4** Determine the critical value rule for deciding whether to reject *H*0. Use the specified value of *α* to find the critical value in the critical value rule.

**5** Collect the sample data and compute the value of the test statistic.

**6** Decide whether to reject *H*0 by using the test statistic value and the critical value rule.

**Using a p-value:**

**4** Collect the sample data and compute the value of the test statistic.

**5** Calculate the *p*-value by using the test statistic value.

**6** Reject *H*0 at level of significance *α* if the *p*-value is less than *α*.

**7** Interpret your statistical results in managerial (real-world) terms and assess their practical importance.

In the real world both critical value rules and *p*-values are used to carry out hypothesis tests. For example, NBC uses critical value rules, whereas CBS uses *p*-values, to statistically verify the validity of advertising claims. Throughout this book we will continue to present both the critical value and the *p*-value approaches to hypothesis testing.

Testing a “less than” alternative hypothesis

We next consider the payment time case and testing a “less than” alternative hypothesis:

**Step 1:** In order to study whether the new electronic billing system reduces the mean bill payment time by more than 50 percent, the management consulting firm will test *H*0: *μ* ≥ 19.5 versus *Ha*: *μ* < 19.5.

**Step 2:** The management consulting firm wishes to make sure that it truthfully describes the benefits of the new system both to the Hamilton, Ohio, trucking company and to other companies that are considering installing such a system. Therefore, the firm will require very strong evidence to conclude that *μ* is less than 19.5, which implies that it will test *H*0: *μ* ≥ 19.5 versus *Ha*: *μ* < 19.5 by setting *α* equal to .01.

**Step 3:** In order to test *H*0: *μ* ≥ 19.5 versus *Ha*: *μ* < 19.5, we will test the modified null hypothesis *H*0: *μ* = 19.5 versus *Ha*: *μ* < 19.5. The idea here is that if there is sufficient evidence to reject the hypothesis that *μ* equals 19.5 in favor of *μ* < 19.5, then there is certainly also sufficient evidence to reject the hypothesis that *μ* is greater than or equal to 19.5. In order to test *H*0: *μ* = 19.5 versus *Ha*: *μ* < 19.5, we will randomly select a sample of *n* = 65 invoices paid using the billing system and calculate the mean of the payment times of these invoices. Since the sample size is large, the Central Limit Theorem applies, and we will utilize the test statistic

A value of the test statistic *z* that is less than zero results when is less than 19.5. This provides evidence to support rejecting *H*0 in favor of *Ha* because the point estimate indicates that *μ* might be less than 19.5.

**Step 4:** To decide how much less than zero the test statistic must be to reject *H*0 in favor of *Ha* by setting the probability of a Type I error equal to α, we do the following:

Place the probability of a Type I error, *α*, in the left-hand tail of the standard normal curve and use the normal table to find the critical value −*zα*. Here −*zα* is the negative of the normal point *zα*. That is, −*zα* is the point on the horizontal axis under the standard normal curve that gives a left-hand tail area equal to *α*.

**Reject H0: μ = 19.5 in favor of Ha: μ < 19.5 if and only if the test statistic z is less than the critical value −zα.** Because

*α*equals .01, the critical value −

*zα*is −

*z*.01 = −2.33 [see

__Fig. 9.4(a)__].

Figure 9.4: Testing *H*0: *μ* = 19.5 versus *Ha*: *μ* < 19.5 by Using Critical Values and the *p*-Value

**Step 5:** When the sample of *n* = 65 invoices is randomly selected, the mean of the payment times of these invoices is calculated to be . Assuming that *σ* is known to equal 4.2, the value of the test statistic is

**Step 6:** Since the test statistic value *z* = −2.67 is less than the critical value −*z*.01 = −2.33, we can reject *H*0: *μ* = 19.5 in favor of *Ha*: *μ* < 19.5 by setting *α* equal to .01.

**Step 7:** We conclude (at an *α* of .01) that the mean payment time for the new electronic billing system is less than 19.5 days. This, along with the fact that the sample mean is slightly less than 19.5, implies that it is reasonable for the management consulting firm to conclude that the new electronic billing system has reduced the mean payment time by slightly more than 50 percent (a substantial improvement over the old system).

A *p*-value for testing a “less than” alternative hypothesis

To test *H*0: *μ* = 19.5 versus *Ha*: *μ* < 19.5 in the payment time case by using a *p*-value, we use the following steps 4, 5, and 6:

**Step 4:** We have computed the value of the test statistic in the payment time case to be *z* = −2.67.

**Step 5:** The *p*-value for testing *H*0: *μ* = 19.5 versus *Ha*: *μ* < 19.5 is the area under the standard normal curve to the left of the test statistic value *z* = −2.67. As illustrated in __Figure 9.4(b)__, this area is .0038. The *p*-value is the probability, computed assuming that *H*0: *μ* = 19.5 is true, of observing a value of the test statistic that is less than or equal to the value *z* = −2.67 that we have actually computed from the sample data. The *p*-value of .0038 says that, if *H*0: *μ* = 19.5 is true, then only 38 in 10,000 of all possible test statistic values are at least as negative, or extreme, as the value *z* = −2.67. That is, if we are to believe that *H*0 is true, we must believe that we have observed a test statistic value that can be described as a 38 in 10,000 chance.

**Step 6:** The management consulting firm has set *α* equal to .01. **The p-value of .0038 is less than the α of .01. Therefore, we can reject H0 by setting α equal to .01.**

Testing a “not equal to” alternative hypothesis

We next consider the Valentine’s Day chocolate case and testing a “not equal to” alternative hypothesis.

**Step 1:** To assess whether this year’s sales of its valentine box of assorted chocolates will be ten percent higher than last year’s, the candy company will test *H*0: *μ* = 330 versus *Ha*: *μ* ≠ 330. Here, *μ* is the mean order quantity of this year’s valentine box by large retail stores.

**Step 2:** If the candy company does not reject *H*0: *μ* = 330 and *H*0: *μ* = 330 is false—a Type II error—the candy company will base its production of valentine boxes on a 10 percent projected sales increase that is not correct. Since the candy company wishes to have a reasonably small probability of making this Type II error, the company will set *α* equal to .05. Setting *α* equal to .05 rather than .01 makes the probability of a Type II error smaller than it would be if *α* were set at .01. Note that in optional __Section 9.5__ we will verify that the probability of a Type II error in this situation is reasonably small. Therefore, if the candy company ends up not rejecting *H*0: *μ* = 330 and therefore decides to base its production of valentine boxes on the ten percent projected sales increase, the company can be intuitively confident that it has made the right decision.

**Step 3:** The candy company will randomly select *n* = 100 large retail stores and will make an early mailing to these stores promoting this year’s valentine box of assorted chocolates. The candy company will then ask each sampled retail store to report its anticipated order quantity of valentine boxes and will calculate the mean of the reported order quantities. Since the sample size is large, the Central Limit Theorem applies, and we will utilize the test statistic

A value of the test statistic that is greater than 0 results when is greater than 330. This provides evidence to support rejecting *H*0 in favor of *Ha* because the point estimate indicates that *μ* might be greater than 330. Similarly, a value of the test statistic that is less than 0 results when is less than 330. This also provides evidence to support rejecting *H*0 in favor of *Ha* because the point estimate indicates that *μ* might be less than 330.

**Step 4:** To decide how different from zero (positive or negative) the test statistic must be in order to reject *H*0 in favor of *Ha* by setting the probability of a Type I error equal to *α*, we do the following:

Divide the probability of a Type I error, *α*, into two equal parts, and place the area *α*/2 in the right-hand tail of the standard normal curve and the area *α*/2 in the left-hand tail of the standard normal curve. Then use the normal table to find the critical values *zα*/2 and −*zα*/2. Here *zα*/2 is the point on the horizontal axis under the standard normal curve that gives a right-hand tail area equal to *α*/2, and −*zα*/2 is the point giving a left-hand tail area equal to *α*/2.

**Reject H0: μ = 330 in favor of Ha: μ ≠ 330 if and only if the test statistic z is greater than the critical value zα/2 or less than the critical value −zα/2.** Note that this is equivalent to saying that we should

**reject**|

*H*0 if and only if the absolute value of the test statistic,*z*|

**is greater than the critical value**Because

*zα*/2.*α*equals .05, the critical values are [see

__Figure 9.5(a)__]

Figure 9.5: Testing *H*0: *μ* = 330 versus *Ha*: *μ* ≠ 330 by Using Critical Values and the *p*-Value

**Step 5:** When the sample of *n* = 100 large retail stores is randomly selected, the mean of their reported order quantities is calculated to be . Assuming that *σ* is known to equal 40, the value of the test statistic is

**Step 6:** Since the test statistic value *z* = −1 is greater than − *z*.025 = −1.96 (or, equivalently, since | *z* | = 1 is less than *z*.025 = 1.96), we cannot reject *H*0: *μ* = 330 in favor of *Ha*: *μ* ≠ 330 by setting *α* equal to .05.

**Step 7:** We cannot conclude (at an *α* of .05) that the mean order quantity of this year’s valentine box by large retail stores will differ from 330 boxes. Therefore, the candy company will base its production of valentine boxes on the ten percent projected sales increase.

A __p__ __-value__ for testing a “not equal to” alternative hypothesis

To test *H*0: *μ* = 330 versus *Ha*: *μ* ≠ 330 in the Valentine’s Day chocolate case by using a *p*-value, we use the following steps 4, 5, and 6:

**Step 4:** We have computed the value of the test statistic in the Valentine’s Day chocolate case to be *z* = −1.

**Step 5:** Note from __Figure 9.5(b)__ that the area under the standard normal curve to the right of | *z* | = 1 is .1587. Twice this area—that is, 2(.1587) = .3174—is the *p*-value for testing *H*0: *μ* = 330 versus *Ha*: *μ* ≠ 330. To interpret the *p*-value as a probability, note that the symmetry of the standard normal curve implies that twice the area under the curve to the right of | *z* | = 1 equals the area under this curve to the right of 1 plus the area under the curve to the left of −1 [see __Figure 9.5(b)__]. Also, note that since both positive and negative test statistic values count against *H*0: *μ* = 330, a test statistic value that is either greater than or equal to 1 or less than or equal to −1 is at least as extreme as the observed test statistic value *z* = −1. It follows that the *p*-value of .3174 says that, if *H*0: *μ* = 330 is true, then 31.74 percent of all possible test statistic values are at least as extreme as *z* = −1. That is, if we are to believe that *H*0 is true, we must believe that we have observed a test statistic value that can be described as a 31.74 percent chance.

**Step 6:** The candy company has set *α* equal to .05. **The p-value of .3174 is greater than the α of .05. Therefore, we cannot reject H0 by setting α equal to .05.**

A general procedure for testing a hypothesis about a population mean

In the trash bag case we have tested *H*0: *μ* ≤ 50 versus *Ha*: *μ* > 50 by testing *H*0: *μ* = 50 versus *Ha*: *μ* > 50. In the payment time case we have tested *H*0: *μ* ≥ 19.5 versus *Ha*: *μ* < 19.5 by testing *H*0: *μ* = 19.5 versus *Ha*: *μ* < 19.5. In general, the usual procedure for testing a “less than or equal to” null hypothesis or a “greater than or equal to” null hypothesis is to change the null hypothesis to an equality. We then test the “equal to” null hypothesis versus the alternative hypothesis. Furthermore, the critical value and *p*-value procedures for testing a null hypothesis versus an alternative hypothesis depend upon whether the alternative hypothesis is a “greater than,” a “less than,” or a “not equal to” alternative hypothesis. The following summary box gives the appropriate procedures. Specifically, letting *μ*0 be a particular number, the summary box shows how to test *H*0: *μ* = *μ*0 versus either *Ha*: *μ* > *μ*0, *Ha*: *μ* < *μ*0, or *Ha*: *μ* ≠ *μ*0:

Testing a Hypothesis about a Population Mean when *σ* Is Known

Define the test statistic

and assume that the population sampled is normally distributed, or that the sample size *n* is large. We can test *H*0: *μ* = *μ*0 versus a particular alternative hypothesis at level of significance *α* by using the appropriate critical value rule, or, equivalently, the corresponding *p*-value.