44
Chapter 10 相關與迴歸 經過這一章的洗禮之後,你將具有以下的能力: 為一組成對數據繪製一張散佈圖。 計算相關係數。 檢定這項假設 H 0 ρ = 0計算迴歸直線的方程式。 計算決定係數。 計算估計的標準誤。 求出預測區間。 本章大綱 學習目標 簡介 10-1 散佈圖與相關 10-2 迴歸 10-3 決定係數與估計的標準誤 結語 © Getty RF.

Simple Linear Regression from Blumen in Chinese

Embed Size (px)

DESCRIPTION

Simple Linear Regression from Blumen in Chinese

Citation preview

Page 1: Simple Linear Regression from Blumen in Chinese

C h a p t e r

10相關與迴歸

經過這一章的洗禮之後,你將具有以下的能力:

毣 為一組成對數據繪製一張散佈圖。

毢 計算相關係數。

毧 檢定這項假設 H0:ρ =0。

氥 計算迴歸直線的方程式。

浺 計算決定係數。

浣 計算估計的標準誤。

浤 求出預測區間。

本章大綱

學習目標

簡介

10-1 散佈圖與相關

10-2 迴歸

10-3 決定係數與估計的標準誤

結語

Objectives

After completing this chapter, you should be able to

1 Draw a scatter plot for a set of ordered pairs.

2 Compute the correlation coefficient.

3 Test the hypothesis H0: r� 0.

4 Compute the equation of the regression line.

5 Compute the coefficient of determination.

6 Compute the standard error of the estimate.

7 Find a prediction interval.

8 Be familiar with the concept of multiple

regression.

Outline

Introduction

10–1 Scatter Plots and Correlation

10–2 Regression

10–3 Coefficient of Determination and Standard

Error of the Estimate

10–4 Multiple Regression (Optional)

Summary

10–1

1010Correlation and

Regression

C H A P T E R

lu38582_

ch10_533

-590.qxd

9/16/1

0 11:33

AM Pag

e 533

© Getty RF.

Page 2: Simple Linear Regression from Blumen in Chinese

統計學

468

��簡介

在第 7 章和第 8 章,解釋了推論統計的兩項領域  信賴區間和假設檢

定。另一項統計推論領域涉及判斷兩個以上數值或屬量變數是否存在某種關

係。比如說,商人可能想知道某一個月的銷售業績是否和那一個月公司投入多

少廣告有關。教育學家有興趣判斷花多少小時念書是否和該科的成績有關。醫

藥研究人員有興趣問,咖啡因和心臟病的關係?或是年紀和血壓的關係?動物

學家可能想知道某一種動物的新生兒體重和壽命的關係。有許多可以用相關或

是迴歸回答的問題,這些只是其中一小部分。相關 (correlation) 是一種統計方

法,用來決定變數間是否有線性關係。迴歸 (regression) 是一種統計方法,用來

描述變數間關係的本質,也就是說,到底是正的還是負的,線性還是非線性。

用統計觀點回答下列問題是本章的目標:

1. 兩個或以上的變數之間有線性關係嗎?

2. 如果有,關係的強度有多少?

3. 存在哪一種關係?

4. 可以從關係進行哪一類的預測?

為了回答前兩個問題,統計學家使用一種數值測度,決定兩個變數是否有

線性關係,並且決定變數間線性關係的強度。這一項測度叫做相關係數。比如

說,有許多變數和心臟病有關,諸如缺乏運動、抽菸、遺傳、年紀、壓力和飲

食。這些變數之中有些比其他重要;因此,醫生希望能夠幫助病人認識哪一些

變數最重要。

為了回答第三個問題,你必須先確定存在哪一種關係。有兩種關係:簡單

關係與複關係。在簡單關係 (simple relationship) 裡,有兩個變數,一個是獨

立變數 (independent variable),也叫做解釋變數或是預測變數,而第二個變數

叫做依變數 (dependent variable),也叫做反應變數。有一種簡單關係分析叫

做簡單迴歸,它有一個被用來預測依變數的獨立變數。比如說,一位經理想要

知道業務的年資是否對業績有任何幫助。這一類的研究涉及一種簡單關係,因

為只有兩個變數:年資和業績。

有一種複關係 (multiple relationship) 叫做複迴歸(multiple regression),

用兩個以上的獨立變數來預測一個依變數。比如說,有一位教育學家可能想要

研究大學成就和花多少時間念書、GPA 和高中背景等因素的關係。這一類的

研究涉及數個變數。

簡單關係可以是正的,也可以是負的。當兩變數同時增加或同時減少,存

人的一生大概走了 100,000 英哩,也就是每天約走 3.4 英哩。

非凡數字

Page 3: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

469

學習目標 毣

為一組成對數據繪

製一張散佈圖。

在一種正關係 (positive relationship)。比如說,身高與體重是有關係的;而且

關係是正的,因為一般而言,身高愈高的人,體重愈重。當一個變數增加,另

一個變數減少,或是反過來,則存在一種負關係 (negative relationship)。比如

說,如果測量年紀超過 60 歲民眾的力氣,你會發現年紀愈大,一般而言力氣

愈小。在這裡使用「一般」這樣的字眼是因為會有例外。

最後,第四個問題問到可以進行哪一種形式的預測。所有領域每天都有預

測,包括氣象預測、股市預測、業績預測、收成預測、油價預測和運動賽事預

測。有些預測比較準確,因為關係比較強。也就是說,變數關係愈強,預測就

愈準確。

10-1 散佈圖與相關

在簡單相關與迴歸研究中,研究員會收集兩數值或屬量變數的數據,藉以

求出兩變數間是否有某種關係。比如說,有一位研究員希望知道花多少時間念

書和某一次考試成績的關係,她必須收集一組學生的隨機樣本,決定每一位學

生的念書時數,以及取得每一位學生該科的考試成績。為數據做個表格,如下

所示。

3. What type of relationship exists?

4. What kind of predictions can be made from the relationship?

To answer the first two questions, statisticians use a numerical measure to determinewhether two or more variables are linearly related and to determine the strength of the rela-tionship between or among the variables. This measure is called a correlation coefficient.For example, there are many variables that contribute to heart disease, among them lackof exercise, smoking, heredity, age, stress, and diet. Of these variables, some are moreimportant than others; therefore, a physician who wants to help a patient must know whichfactors are most important.

To answer the third question, you must ascertain what type of relationship exists.There are two types of relationships: simple and multiple. In a simple relationship, thereare two variables—an independent variable, also called an explanatory variable or apredictor variable, and a dependent variable, also called a response variable. A simplerelationship analysis is called simple regression, and there is one independent variable thatis used to predict the dependent variable. For example, a manager may wish to see whetherthe number of years the salespeople have been working for the company has anything todo with the amount of sales they make. This type of study involves a simple relationship,since there are only two variables—years of experience and amount of sales.

In a multiple relationship, called multiple regression, two or more independentvariables are used to predict one dependent variable. For example, an educator may wishto investigate the relationship between a student’s success in college and factors suchas the number of hours devoted to studying, the student’s GPA, and the student’s highschool background. This type of study involves several variables.

Simple relationships can also be positive or negative. A positive relationship existswhen both variables increase or decrease at the same time. For instance, a person’s heightand weight are related; and the relationship is positive, since the taller a person is, gen-erally, the more the person weighs. In a negative relationship, as one variable increases,the other variable decreases, and vice versa. For example, if you measure the strengthof people over 60 years of age, you will find that as age increases, strength generallydecreases. The word generally is used here because there are exceptions.

Finally, the fourth question asks what type of predictions can be made. Predictions aremade in all areas and daily. Examples include weather forecasting, stock market analyses,sales predictions, crop predictions, gasoline price predictions, and sports predictions. Somepredictions are more accurate than others, due to the strength of the relationship. That is,the stronger the relationship is between variables, the more accurate the prediction is.

Section 10–1 Scatter Plots and Correlation 535

10–3

Unusual Stat

A person walks onaverage 100,000miles in his or herlifetime. This is about3.4 miles per day.

10–1 Scatter Plots and CorrelationIn simple correlation and regression studies, the researcher collects data on two numeri-cal or quantitative variables to see whether a relationship exists between the variables.For example, if a researcher wishes to see whether there is a relationship between numberof hours of study and test scores on an exam, she must select a random sample ofstudents, determine the hours each studied, and obtain their grades on the exam. A tablecan be made for the data, as shown here.

Hours of Student study x Grade y (%)

A 6 82B 2 63C 1 57D 5 88E 2 68F 3 75

Objective

Draw a scatter plot fora set of ordered pairs.

1

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 535

學生 念書時數 x 成績 y (%)

如前述,這一項研究的兩個變數稱為獨立變數和依變數。迴歸裡的獨立變

數是可以被控制或操作的變數。這時候,念書時數是獨立變數,記作 x 變數。

迴歸裡的依變數是無法被控制或操作的變數。學生的考試成績是依變數,記

作 y 變數。這樣區別變數的原因在於假設學生的考試成績是根據學生的念書時

數。同時,某種程度上,我們也假設學生可以因應考試安排念書時數。

決定哪一個變數是 x 變數,哪一個變數是 y 變數不會都如此明確,有時候

是任意決定的。比如說,如果有一位研究員研究年紀對血壓的效果。一般而

言,研究員會假設年紀影響血壓。因此年紀這一個變數被認為是獨立變數,而

血壓變數就會被認為是依變數。另一方面,如果研究夫妻對某一件事的態度,

這時候決定誰的態度是獨立變數、誰的態度是依變數是很困難的。這時候,研

究員可能會任意決定這一件事。

獨立變數與依變數可以被畫在一張圖上,這張圖叫做散佈圖。獨立變數 x

Page 4: Simple Linear Regression from Blumen in Chinese

統計學

470

是位在圖的 x 軸(橫軸)上,而依變數 y 是位在圖的 y 軸(縱軸)上。

散佈圖 (scatter plot) 是把獨立變數 x 和依變數 y 配成有序對 (x, y),然後

把每一對看做是二維平面上的一點,再描點繪圖。

散佈圖是一種視覺工具,可以用來描述獨立變數與依變數關係的本質。變

數的單位可以不一樣,而且利用個別變數的最大值和最小值決定個別座標軸的

範圍。

繪製散佈圖的程序顯示在例題 10-1 至例題 10-3。

為以下美國租車公司的數據建構一張散佈圖。

536 Chapter 10 Correlation and Regression

10–4

As stated previously, the two variables for this study are called the independent vari-able and the dependent variable. The independent variable is the variable in regressionthat can be controlled or manipulated. In this case, the number of hours of study is theindependent variable and is designated as the x variable. The dependent variable is thevariable in regression that cannot be controlled or manipulated. The grade the studentreceived on the exam is the dependent variable, designated as the y variable. The reasonfor this distinction between the variables is that you assume that the grade the studentearns depends on the number of hours the student studied. Also, you assume that, to someextent, the student can regulate or control the number of hours he or she studies for theexam.

The determination of the x and y variables is not always clear-cut and is sometimesan arbitrary decision. For example, if a researcher studies the effects of age on a person’sblood pressure, the researcher can generally assume that age affects blood pressure.Hence, the variable age can be called the independent variable, and the variable bloodpressure can be called the dependent variable. On the other hand, if a researcher is study-ing the attitudes of husbands on a certain issue and the attitudes of their wives on thesame issue, it is difficult to say which variable is the independent variable and which isthe dependent variable. In this study, the researcher can arbitrarily designate the variablesas independent and dependent.

The independent and dependent variables can be plotted on a graph called a scatterplot. The independent variable x is plotted on the horizontal axis, and the dependent vari-able y is plotted on the vertical axis.

A scatter plot is a graph of the ordered pairs (x, y) of numbers consisting of theindependent variable x and the dependent variable y.

The scatter plot is a visual way to describe the nature of the relationship between theindependent and dependent variables. The scales of the variables can be different, andthe coordinates of the axes are determined by the smallest and largest data values of thevariables.

The procedure for drawing a scatter plot is shown in Examples 10–1 through 10–3.

Example 10–1 Car Rental CompaniesConstruct a scatter plot for the data shown for car rental companies in the UnitedStates for a recent year.

Company Cars (in ten thousands) Revenue (in billions)

A 63.0 $7.0B 29.0 3.9C 20.8 2.1D 19.1 2.8E 13.4 1.4F 8.5 1.5

Source: Auto Rental News.

Solution

Step 1 Draw and label the x and y axes.

Step 2 Plot each point on the graph, as shown in Figure 10–1.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 536

公司 車輛數(以萬輛計) 收益(以十億美元計)

資料來源:Auto Rental News.

■解答

步驟 1

畫出 x 軸和 y 軸,並且加上標示。

步驟 2

在圖上描點,如圖 10-1 所示。

租車公司例題 10-1

例題 10-1 的散佈圖

圖 10-1

Reve

nue

(bill

ions

)

7.75

6.50

5.25

4.00

2.75

1.50

y

x

8.5

Cars (in 10,000s)

17.5 26.5 35.5 44.5 53.5 62.5

車輛數(萬輛)

收益︵十億美元︶

Page 5: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

471

從一份缺席次數與統計學期末成績的研究取得以下這一組隨機樣本,為數據繪製一張散佈

圖。

Section 10–1 Scatter Plots and Correlation 537

10–5

Reve

nue

(bill

ions

)

7.75

6.50

5.25

4.00

2.75

1.50

y

x

8.5

Cars (in 10,000s)

17.5 26.5 35.5 44.5 53.5 62.5

Figure 10–1

Scatter Plot forExample 10–1

Fina

l gra

de

100

90

80

70

60

50

40

30

y

x

1

Number of absences

2 3 4 5 6 7 8 9 10 11 12 13 14 150

Figure 10–2

Scatter Plot forExample 10–2

Example 10–2 Absences and Final GradesConstruct a scatter plot for the data obtained in a study on the number of absencesand the final grades of seven randomly selected students from a statistics class.

The data are shown here.

Student Number of absences x Final grade y (%)

A 6 82B 2 86C 15 43D 9 74E 12 58F 5 90G 8 78

Solution

Step 1 Draw and label the x and y axes.

Step 2 Plot each point on the graph, as shown in Figure 10–2.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 537

學生 缺席次數 x 期末成績 y (%)

■解答

步驟 1

畫出 x 和 y 軸,並且加上標示。

步驟 2

在圖上描點,如圖 10-2 所示。

有一位研究員希望知道美國有錢人的年紀和財產之間是不是有關係。某一年的數據如下所

示。

資料來源:Forbes magazine.

缺席與期末成績例題 10-2

年紀與財產例題 10-3

例題 10-2 的散佈圖

圖 10-2

Fina

l gra

de

100

90

80

70

60

50

40

30

y

x

1

Number of absences

2 3 4 5 6 7 8 9 10 11 12 13 14 150

缺席次數

期末成績

Page 6: Simple Linear Regression from Blumen in Chinese

統計學

472

After the plot is drawn, it should be analyzed to determine which type of relationship,if any, exists. For example, the plot shown in Figure 10–1 suggests a positive relationship,since as the number of cars rented increases, revenue tends to increase also. The plot ofthe data shown in Figure 10–2 suggests a negative relationship, since as the number ofabsences increases, the final grade decreases. Finally, the plot of the data shown inFigure 10–3 shows no specific type of relationship, since no pattern is discernible.

Note that the data shown in Figures 10–1 and 10–2 also suggest a linear relationship,since the points seem to fit a straight line, although not perfectly. Sometimes a scatterplot, such as the one in Figure 10–4, shows a curvilinear relationship between the data.In this situation, the methods shown in this section and in Section 10–2 cannot be used.Methods for curvilinear relationships are beyond the scope of this book.

Correlation

Correlation Coefficient As stated in the Introduction, statisticians use a measurecalled the correlation coefficient to determine the strength of the linear relationshipbetween two variables. There are several types of correlation coefficients. The one

538 Chapter 10 Correlation and Regression

10–6

Example 10–3 Age and WealthA researcher wishes to see if there is a relationship between the ages and net worthof the wealthiest people in America. The data for a specific year are shown.

Person Age x Net wealth y ($ billions)

A 73 16B 65 26C 53 50D 54 21.5E 79 40F 69 16G 61 19.6H 65 19

Source: Forbes magazine.

Solution

Step 1 Draw and label the x and y axes.

Step 2 Plot each point on the graph, as shown in Figure 10–3.

Wea

lth ($

bill

ions

)

50

40

30

20

10

y

x

50

Age

60 70 80

Figure 10–3

Scatter Plot forExample 10–3

Objective

Compute thecorrelation coefficient.

2

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 538

人 年紀 x 財產 y(以十億美元計)

■解答

步驟 1

畫出 x 軸和 y 軸,並且加上標示。

步驟 2

在圖上描點,如圖 10-3 所示。

例題 10-3 的散佈圖

圖 10-3

Wea

lth ($

bill

ions

)

50

40

30

20

10

y

x

50

Age

60 70 80

年紀

財產︵十億美元︶

畫完圖之後,透過分析決定可能是哪一種關係,如果關係存在的話。比如

說,圖 10-1 的散佈圖顯現一種正關係,因為出租車數增加,收益也增加。圖

10-2 的散佈圖則是顯現一種負關係,因為缺席次數增加,期末成績降低了。最

後,圖 10-3 的散佈圖看不出來有什麼特定的關係,因為圖上無明顯的模式。

注意,圖 10-1 和圖 10-2 也顯現一種線性關係,因為圖上的點和某一條線

段看起來相當符合,雖然不是完全符合。有時候,像圖 10-4 的散佈圖會顯示

數據間的某種曲線關係。這時候,本節和第 10-2 節所介紹的方法並不適用。

曲線關係的方法不在本書的範圍。

Page 7: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

473

��相關

相關係數 如簡介所述,統計學家用一種叫做「相關係數」的測度決定兩變數

的線性關係強度。相關係數有好幾種。本節要解釋的叫做 Pearson 動差相關係

數 (Pearson product moment correlation coefficient, PPMC),根據這個領域的

研究先鋒 Karl Pearson 命名。

相關係數 (correlation coefficient) 是一個從樣本數據得到且用來測量兩屬

量變數的線性關係強度與方向的數字。樣本相關係數的符號是 r。母體相

關係數的符號是 ρ(希臘字母 rho)。

相關係數的範圍是從−1 到+1。如果變數間有某種強烈的正線性關係,

r 的數值會接近+1。如果變數間有某種強烈的負線性關係,r 的數值會接近

−1。當變數間沒有線性關係,或是只有微弱的線性關係,r 的數值會接近 0。

見圖 10-5。

圖 10-6 內的圖形顯示相關係數與對應的散佈圖。注意,當相關係數從 0

漸增到+1((a)、(b)、(c) 小圖),數據愈來愈靠近某種強烈的正線性關係。

當相關係數從 0 漸減到−1((d)、(e)、(f) 小圖),數據愈來愈靠近某種強烈的

負線性關係。這再一次顯現某種強烈的關係。

計算相關係數的數值有數種方式,其中一種就是使用以下的公式。

學習目標 毢

計算相關係數。

圖 10-4

散佈圖顯現一種

曲線關係

y

x

圖 10-5

相關係數的數值

範圍0–1 +1

強烈負線性關係 無線性關係 強烈正線性關係

Page 8: Simple Linear Regression from Blumen in Chinese

統計學

474

相關係數 r 的公式

Section 10–1 Scatter Plots and Correlation 539

10–7

y

x

Figure 10–4

Scatter PlotSuggesting aCurvilinearRelationship

explained in this section is called the Pearson product moment correlation coefficient(PPMC), named after statistician Karl Pearson, who pioneered the research in this area.

The correlation coefficient computed from the sample data measures the strengthand direction of a linear relationship between two quantitative variables. The symbol forthe sample correlation coefficient is r. The symbol for the population correlationcoefficient is r (Greek letter rho).

The range of the correlation coefficient is from �1 to �1. If there is a strong positivelinear relationship between the variables, the value of r will be close to �1. If there is astrong negative linear relationship between the variables, the value of r will be close to�1. When there is no linear relationship between the variables or only a weak relation-ship, the value of r will be close to 0. See Figure 10–5.

The graphs in Figure 10–6 show the relationship between the correlation coefficientsand their corresponding scatter plots. Notice that as the value of the correlation coefficientincreases from 0 to �1 (parts a, b, and c), data values become closer to an increasinglystrong relationship. As the value of the correlation coefficient decreases from 0 to �1(parts d, e, and f ), the data values also become closer to a straight line. Again this sug-gests a stronger relationship.

There are several ways to compute the value of the correlation coefficient. Onemethod is to use the formula shown here.

0–1 +1

Strong negativelinear relationship

No linearrelationship

Strong positivelinear relationship

Figure 10–5

Range of Values for theCorrelation Coefficient

Formula for the Correlation Coefficient r

where n is the number of data pairs.

r �n��xy� � ��x���y�

2[n��x 2� � ��x�2][n��y2� � ��y�2]

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 539

其中 n 是成對數據的個數。

相關係數的假設

1. 樣本是隨機樣本。

2. 成對數據大約落在一條直線上,而且以區間或是比例尺度取得數據。

3. 兩變數是某種聯合常態分配。(這意味著對於任意已知的 x,y 的分配是常態

的;而且對於任意已知的 y,x 的分配是常態的。)

相關係數的四捨五入原則 將 r 四捨五入到 3 位小數。

相關係數的公式看起來有點複雜,但是使用例題 10-4 建議的表格輔助計

算,會讓計算變得比較簡單一點。

相關係數 r 沒有單位,而且如果對調 x 值和 y 值,r 不會改變。

圖 10-6

相關係數與散佈

圖之間的關係

y

x

(a) r = 0.50

y

x

(b) r = 0.90

y

x

(c) r = 1.00

y

x

(d) r = –0.50

y

x

(e) r = –0.90

y

x

(f) r = –1.00

(a)

(d)

(b)

(e)

(c)

(f)

Page 9: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

475

計算例題 10-1 數據的相關係數。

■解答

步驟 1

製作一張如下所示的表格。

540 Chapter 10 Correlation and Regression

10–8

y

x

(a) r = 0.50

y

x

(b) r = 0.90

y

x

(c) r = 1.00

y

x

(d) r = –0.50

y

x

(e) r = –0.90

y

x

(f) r = –1.00

Figure 10–6

Relationship Betweenthe CorrelationCoefficient and theScatter Plot

Rounding Rule for the Correlation Coefficient Round the value of r to threedecimal places.

The formula looks somewhat complicated, but using a table to compute the values,as shown in Example 10–4, makes it somewhat easier to determine the value of r.

There are no units associated with r, and the value of r will remain unchanged if thex and y values are switched.

Example 10–4 Car Rental CompaniesCompute the correlation coefficient for the data in Example 10–1.

Solution

Step 1 Make a table as shown here.

Cars x Revenue yCompany (in ten thousands) (in billions) xy x2 y2

A 63.0 7.0B 29.0 3.9C 20.8 2.1D 19.1 2.8E 13.4 1.4F 8.5 1.5

Assumptions for the Correlation Coefficient

1. The sample is a random sample.2. The data pairs fall approximately on a straight line and are measured at the interval or

ratio level.3. The variables have a joint normal distribution. (This means that given any specific value

of x, the y values are normally distributed; and given any specific value of y, the x valuesare normally distributed.)

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 540

540 Chapter 10 Correlation and Regression

10–8

y

x

(a) r = 0.50

y

x

(b) r = 0.90

y

x

(c) r = 1.00

y

x

(d) r = –0.50

y

x

(e) r = –0.90

y

x

(f) r = –1.00

Figure 10–6

Relationship Betweenthe CorrelationCoefficient and theScatter Plot

Rounding Rule for the Correlation Coefficient Round the value of r to threedecimal places.

The formula looks somewhat complicated, but using a table to compute the values,as shown in Example 10–4, makes it somewhat easier to determine the value of r.

There are no units associated with r, and the value of r will remain unchanged if thex and y values are switched.

Example 10–4 Car Rental CompaniesCompute the correlation coefficient for the data in Example 10–1.

Solution

Step 1 Make a table as shown here.

Cars x Revenue yCompany (in ten thousands) (in billions) xy x2 y2

A 63.0 7.0B 29.0 3.9C 20.8 2.1D 19.1 2.8E 13.4 1.4F 8.5 1.5

Assumptions for the Correlation Coefficient

1. The sample is a random sample.2. The data pairs fall approximately on a straight line and are measured at the interval or

ratio level.3. The variables have a joint normal distribution. (This means that given any specific value

of x, the y values are normally distributed; and given any specific value of y, the x valuesare normally distributed.)

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 540

公司

車輛數

(以萬輛計)

收益

(以十億美元計)

步驟 2

求出 xy, x2 和 y2 的數值,並且把結果放在表內適當的行內。

  完成的表格如下所示。Step 2 Find the values of xy, x2, and y2 and place these values in the corresponding

columns of the table.The completed table is shown.

Cars x Revenue yCompany (in 10,000s) (in billions) xy x2 y2

A 63.0 7.0 441.00 3969.00 49.00B 29.0 3.9 113.10 841.00 15.21C 20.8 2.1 43.68 432.64 4.41D 19.1 2.8 53.48 364.81 7.84E 13.4 1.4 18.76 179.56 1.96F 8.5 1.5 12.75 72.25 2.25

�x � 153.8 �y � 18.7 �xy � 682.77 �x2 � 5859.26 �y2 � 80.67

Step 3 Substitute in the formula and solve for r.

The correlation coefficient suggests a strong relationship between thenumber of cars a rental agency has and its annual revenue.

��6��682.77� � �153.8��18.7�

2[�6��5859.26� � �153.8�2][�6��80.67� � �18.7�2]� 0.982

r �n��xy� � ��x���y�

2[n��x2� � ��x�2][n��y2� � ��y�2]

Section 10–1 Scatter Plots and Correlation 541

10–9

Example 10–5 Absences and Final GradesCompute the value of the correlation coefficient for the data obtained in the study ofthe number of absences and the final grade of the seven students in the statistics classgiven in Example 10–2.

Solution

Step 1 Make a table.

Step 2 Find the values of xy, x2, and y2; place these values in the correspondingcolumns of the table.

Number of Final gradeStudent absences x y (%) xy x2 y2

A 6 82 492 36 6,724B 2 86 172 4 7,396C 15 43 645 225 1,849D 9 74 666 81 5,476E 12 58 696 144 3,364F 5 90 450 25 8,100G 8 78 624 64 6,084

�x � 57 �y � 511 �xy � 3745 �x2 � 579 �y2 � 38,993

Step 3 Substitute in the formula and solve for r.

��7��3745� � �57��511�

2[�7��579� � �57�2][�7��38,993� � �511�2]� �0.944

r �n��xy� � ��x���y�

2[n��x2� � ��x�2][n��y2� � ��y�2]

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 541

公司

車輛數

(以萬輛計)

收益

(以十億美元計)

步驟 3

代入公式解得 r。

Step 2 Find the values of xy, x2, and y2 and place these values in the correspondingcolumns of the table.

The completed table is shown.

Cars x Revenue yCompany (in 10,000s) (in billions) xy x2 y2

A 63.0 7.0 441.00 3969.00 49.00B 29.0 3.9 113.10 841.00 15.21C 20.8 2.1 43.68 432.64 4.41D 19.1 2.8 53.48 364.81 7.84E 13.4 1.4 18.76 179.56 1.96F 8.5 1.5 12.75 72.25 2.25

�x � 153.8 �y � 18.7 �xy � 682.77 �x2 � 5859.26 �y2 � 80.67

Step 3 Substitute in the formula and solve for r.

The correlation coefficient suggests a strong relationship between thenumber of cars a rental agency has and its annual revenue.

��6��682.77� � �153.8��18.7�

2[�6��5859.26� � �153.8�2][�6��80.67� � �18.7�2]� 0.982

r �n��xy� � ��x���y�

2[n��x2� � ��x�2][n��y2� � ��y�2]

Section 10–1 Scatter Plots and Correlation 541

10–9

Example 10–5 Absences and Final GradesCompute the value of the correlation coefficient for the data obtained in the study ofthe number of absences and the final grade of the seven students in the statistics classgiven in Example 10–2.

Solution

Step 1 Make a table.

Step 2 Find the values of xy, x2, and y2; place these values in the correspondingcolumns of the table.

Number of Final gradeStudent absences x y (%) xy x2 y2

A 6 82 492 36 6,724B 2 86 172 4 7,396C 15 43 645 225 1,849D 9 74 666 81 5,476E 12 58 696 144 3,364F 5 90 450 25 8,100G 8 78 624 64 6,084

�x � 57 �y � 511 �xy � 3745 �x2 � 579 �y2 � 38,993

Step 3 Substitute in the formula and solve for r.

��7��3745� � �57��511�

2[�7��579� � �57�2][�7��38,993� � �511�2]� �0.944

r �n��xy� � ��x���y�

2[n��x2� � ��x�2][n��y2� � ��y�2]

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 541

  相關係數推測車輛數和收益之間有一種強烈的關係。

租車公司例題 10-4

Page 10: Simple Linear Regression from Blumen in Chinese

統計學

476

計算例題 10-2 提供的缺席次數與統計學期末成績樣本數據的相關係數。

■解答

步驟 1

製作一張表格。

步驟 2

求出 xy, x2 和 y2 的數值,並且把結果放在表內適當的行內。

  完成的表格如下所示。

Step 2 Find the values of xy, x2, and y2 and place these values in the correspondingcolumns of the table.

The completed table is shown.

Cars x Revenue yCompany (in 10,000s) (in billions) xy x2 y2

A 63.0 7.0 441.00 3969.00 49.00B 29.0 3.9 113.10 841.00 15.21C 20.8 2.1 43.68 432.64 4.41D 19.1 2.8 53.48 364.81 7.84E 13.4 1.4 18.76 179.56 1.96F 8.5 1.5 12.75 72.25 2.25

�x � 153.8 �y � 18.7 �xy � 682.77 �x2 � 5859.26 �y2 � 80.67

Step 3 Substitute in the formula and solve for r.

The correlation coefficient suggests a strong relationship between thenumber of cars a rental agency has and its annual revenue.

��6��682.77� � �153.8��18.7�

2[�6��5859.26� � �153.8�2][�6��80.67� � �18.7�2]� 0.982

r �n��xy� � ��x���y�

2[n��x2� � ��x�2][n��y2� � ��y�2]

Section 10–1 Scatter Plots and Correlation 541

10–9

Example 10–5 Absences and Final GradesCompute the value of the correlation coefficient for the data obtained in the study ofthe number of absences and the final grade of the seven students in the statistics classgiven in Example 10–2.

Solution

Step 1 Make a table.

Step 2 Find the values of xy, x2, and y2; place these values in the correspondingcolumns of the table.

Number of Final gradeStudent absences x y (%) xy x2 y2

A 6 82 492 36 6,724B 2 86 172 4 7,396C 15 43 645 225 1,849D 9 74 666 81 5,476E 12 58 696 144 3,364F 5 90 450 25 8,100G 8 78 624 64 6,084

�x � 57 �y � 511 �xy � 3745 �x2 � 579 �y2 � 38,993

Step 3 Substitute in the formula and solve for r.

��7��3745� � �57��511�

2[�7��579� � �57�2][�7��38,993� � �511�2]� �0.944

r �n��xy� � ��x���y�

2[n��x2� � ��x�2][n��y2� � ��y�2]

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 541

學生 缺席次數 x 期末成績 y (%)

步驟 3

代入公式解得 r。

Step 2 Find the values of xy, x2, and y2 and place these values in the correspondingcolumns of the table.

The completed table is shown.

Cars x Revenue yCompany (in 10,000s) (in billions) xy x2 y2

A 63.0 7.0 441.00 3969.00 49.00B 29.0 3.9 113.10 841.00 15.21C 20.8 2.1 43.68 432.64 4.41D 19.1 2.8 53.48 364.81 7.84E 13.4 1.4 18.76 179.56 1.96F 8.5 1.5 12.75 72.25 2.25

�x � 153.8 �y � 18.7 �xy � 682.77 �x2 � 5859.26 �y2 � 80.67

Step 3 Substitute in the formula and solve for r.

The correlation coefficient suggests a strong relationship between thenumber of cars a rental agency has and its annual revenue.

��6��682.77� � �153.8��18.7�

2[�6��5859.26� � �153.8�2][�6��80.67� � �18.7�2]� 0.982

r �n��xy� � ��x���y�

2[n��x2� � ��x�2][n��y2� � ��y�2]

Section 10–1 Scatter Plots and Correlation 541

10–9

Example 10–5 Absences and Final GradesCompute the value of the correlation coefficient for the data obtained in the study ofthe number of absences and the final grade of the seven students in the statistics classgiven in Example 10–2.

Solution

Step 1 Make a table.

Step 2 Find the values of xy, x2, and y2; place these values in the correspondingcolumns of the table.

Number of Final gradeStudent absences x y (%) xy x2 y2

A 6 82 492 36 6,724B 2 86 172 4 7,396C 15 43 645 225 1,849D 9 74 666 81 5,476E 12 58 696 144 3,364F 5 90 450 25 8,100G 8 78 624 64 6,084

�x � 57 �y � 511 �xy � 3745 �x2 � 579 �y2 � 38,993

Step 3 Substitute in the formula and solve for r.

��7��3745� � �57��511�

2[�7��579� � �57�2][�7��38,993� � �511�2]� �0.944

r �n��xy� � ��x���y�

2[n��x2� � ��x�2][n��y2� � ��y�2]

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 541

相關係數推測缺席次數與統計學期末成績之間有一種強烈的負關係。也就是說,缺席次數愈

多的學生,期末成績愈低。

缺席與期末成績例題 10-5

計算例題 10-3 美國有錢人年紀和財產數據的相關係數。

■解答

步驟 1

製作一張表格。

步驟 2

求出 xy, x2 和 y2 的數值,並且把結果放在表內適當的行內。

年紀與財產例題 10-6

Page 11: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

477

在例題 10-4,r 的數值很高(接近 1.00);在例題 10-6,r 的數值很低

(接近 0)。接著你可能會問,當 r 值和機會有關,什麼時候會推測變數間某

種顯著的線性關係?我們接下來回答這個問題。

相關係數的顯著性 如前所述,相關係數的範圍落在−1 和+1 之間。當 r 值

接近+1,或是−1,表示有一種強烈的線性關係。當 r 值接近 0,線性關係是

微弱的,或是不存在的。因為 r 值是用樣本數據計算,當 r 不等於 0 的時候,

有兩種可能性:若不是因為 r 值夠高,讓我們可以結論為變數間有顯著的線性

關係;就是因為某種機會才讓我們看到現在的 r 值。

為了作出決策,你會使用某一種假設檢定。傳統法和前面章節用過的類

似。

步驟 1 陳述假設。

The value of r suggests a strong negative relationship between a student’sfinal grade and the number of absences a student has. That is, the moreabsences a student has, the lower is his or her grade.

542 Chapter 10 Correlation and Regression

10–10

Example 10–6 Age and WealthCompute the value of the correlation coefficient for the data given in Example 10–3for the age and wealth of the richest persons in the United States.

Solution

Step 1 Make a table.

Step 2 Find the values of xy, x2, and y2, and place these values in the correspondingcolumns of the table.

Person Age x Net wealth y xy x2 y2

A 73 16 1,168 5,329 256B 65 26 1,690 4,225 676C 53 50 2,650 2,809 2,500D 54 21.5 1,161 2,916 462.25E 79 40 3,160 6,241 1,600F 69 16 1,104 4,761 256G 61 19.6 1,195.6 3,721 384.16H 65 19 1,235 4,225 361

�x � 519 �y � 208.1 �xy � 13,363.6 �x2 � 34,227 �y2 � 6,495.41

Step 3 Substitute in the formula and solve for r.

The value of r indicates a very weak negative relationship between the variables.

In Example 10–4, the value of r was high (close to 1.00); in Example 10–6, the valueof r was much lower (close to 0). This question then arises, When is the value of r dueto chance, and when does it suggest a significant linear relationship between the vari-ables? This question will be answered next.

The Significance of the Correlation Coefficient As stated before, the rangeof the correlation coefficient is between �1 and �1. When the value of r is near �1 or�1, there is a strong linear relationship. When the value of r is near 0, the linear rela-tionship is weak or nonexistent. Since the value of r is computed from data obtained fromsamples, there are two possibilities when r is not equal to zero: either the value of r ishigh enough to conclude that there is a significant linear relationship between the vari-ables, or the value of r is due to chance.

� �0.176

��1095.16210.469

��1095.1

2�4455��8657.67�

�8�13,363.6� � �519��208.1�

2[8�34,227� � �519�2][8�6495.41� � �208.1�2]

r �n��xy� � ��x���y�

2[n��x2� � ��x�2][n��y2� � ��y�2]

Objective

Test the hypothesisH0: r � 0.

3

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 542

人 年紀 x 財產 y

步驟 3

代入公式解得 r。

The value of r suggests a strong negative relationship between a student’sfinal grade and the number of absences a student has. That is, the moreabsences a student has, the lower is his or her grade.

542 Chapter 10 Correlation and Regression

10–10

Example 10–6 Age and WealthCompute the value of the correlation coefficient for the data given in Example 10–3for the age and wealth of the richest persons in the United States.

Solution

Step 1 Make a table.

Step 2 Find the values of xy, x2, and y2, and place these values in the correspondingcolumns of the table.

Person Age x Net wealth y xy x2 y2

A 73 16 1,168 5,329 256B 65 26 1,690 4,225 676C 53 50 2,650 2,809 2,500D 54 21.5 1,161 2,916 462.25E 79 40 3,160 6,241 1,600F 69 16 1,104 4,761 256G 61 19.6 1,195.6 3,721 384.16H 65 19 1,235 4,225 361

�x � 519 �y � 208.1 �xy � 13,363.6 �x2 � 34,227 �y2 � 6,495.41

Step 3 Substitute in the formula and solve for r.

The value of r indicates a very weak negative relationship between the variables.

In Example 10–4, the value of r was high (close to 1.00); in Example 10–6, the valueof r was much lower (close to 0). This question then arises, When is the value of r dueto chance, and when does it suggest a significant linear relationship between the vari-ables? This question will be answered next.

The Significance of the Correlation Coefficient As stated before, the rangeof the correlation coefficient is between �1 and �1. When the value of r is near �1 or�1, there is a strong linear relationship. When the value of r is near 0, the linear rela-tionship is weak or nonexistent. Since the value of r is computed from data obtained fromsamples, there are two possibilities when r is not equal to zero: either the value of r ishigh enough to conclude that there is a significant linear relationship between the vari-ables, or the value of r is due to chance.

� �0.176

��1095.16210.469

��1095.1

2�4455��8657.67�

�8�13,363.6� � �519��208.1�

2[8�34,227� � �519�2][8�6495.41� � �208.1�2]

r �n��xy� � ��x���y�

2[n��x2� � ��x�2][n��y2� � ��y�2]

Objective

Test the hypothesisH0: r � 0.

3

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 542

相關係數推測兩變數間有一種非常微弱的負線性關係。

學習目標 毧

檢定這項假設 H0:ρ =0。

Page 12: Simple Linear Regression from Blumen in Chinese

統計學

478

步驟 2 求出臨界值。

步驟 3 計算檢定數值。

步驟 4 下決定。

步驟 5 摘要結論。

透過所有可能的成對數據 (x, y) 計算母體相關係數;用希臘字母 ρ 代表。

如果以下的假設是對的,則可以用樣本相關係數估計母體相關係數 ρ。

1. 變數 x 和 y 是線性相關的。

2. 變數都是隨機變數。

3. 兩變數是雙變量常態分配。

一種雙變量常態分配意味著,針對任意已知的 x 值,對應 y 值會是某種鐘

形分配,而針對任意已知的 y 值,對應 x 值也會是某種鐘形分配。

根據正式定義,母體相關係數 (population correlation coefficient) ρ 就是

用所有母體內可能的成對數據 (x, y) 計算出來的相關係數。

在假設檢定的時候,以下有一個假設是真的:

H0:ρ =0 這一項虛無假設意味著變數 x 和 y 無相關。

H1:ρ≠0 這一項對立假設意味著變數 x 和 y 有顯著的相關。

當虛無假設在某一個顯著水準被拒絕的時候,它意味著 r 值和 0 之間有顯

著的差距。當虛無假設不被拒絕的時候,它意味著 r 值和 0 之間沒有顯著的差

距,而且可能是因為機會才看到現在的 r 值。

有許多方法可以用來檢定相關係數的顯著性,這一節會介紹三種方法。第

一種方法是使用 t 檢定。

相關係數 t 檢定的公式

Section 10–1 Scatter Plots and Correlation 543

10–11

To make this decision, you use a hypothesis-testing procedure. The traditionalmethod is similar to the one used in previous chapters.

Step 1 State the hypotheses.

Step 2 Find the critical values.

Step 3 Compute the test value.

Step 4 Make the decision.

Step 5 Summarize the results.

The population correlation coefficient is computed from taking all possible (x, y)pairs; it is designated by the Greek letter r (rho). The sample correlation coefficient canthen be used as an estimator of r if the following assumptions are valid.

1. The variables x and y are linearly related.2. The variables are random variables.3. The two variables have a bivariate normal distribution.

A biviarate normal distribution means that for the pairs of (x, y) data values, the cor-responding y values have a bell-shaped distribution for any given x value, and the x val-ues for any given y value have a bell-shaped distribution.

Formally defined, the population correlation coefficient r is the correlation computedby using all possible pairs of data values (x, y) taken from a population.

In hypothesis testing, one of these is true:

H0: r � 0 This null hypothesis means that there is no correlation between the x and y variables in the population.

H1: r � 0 This alternative hypothesis means that there is a significant correla-tion between the variables in the population.

When the null hypothesis is rejected at a specific level, it means that there is asignificant difference between the value of r and 0. When the null hypothesis is notrejected, it means that the value of r is not significantly different from 0 (zero) and isprobably due to chance.

Several methods can be used to test the significance of the correlation coefficient.Three methods will be shown in this section. The first uses the t test.

Interesting Fact

Scientists think that aperson is never morethan 3 feet away froma spider at any giventime!

Formula for the t Test for the Correlation Coefficient

with degrees of freedom equal to n � 2.

t � r� n � 21 � r 2

Although hypothesis tests can be one-tailed, most hypotheses involving the correla-tion coefficient are two-tailed. Recall that r represents the population correlation coeffi-cient. Also, if there is no linear relationship, the value of the correlation coefficient willbe 0. Hence, the hypotheses will be

H0: r � 0 and H1: r � 0

You do not have to identify the claim here, since the question will always be whetherthere is a significant linear relationship between the variables.

Historical Notes

A mathematiciannamed Karl Pearson(1857–1936) becameinterested in FrancisGalton’s work and sawthat the correlationand regression theorycould be applied toother areas besidesheredity. Pearsondeveloped thecorrelation coefficientthat bears his name.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 543

其中自由度是 n− 2。

科學家認為人在任

何時候都不會跟一

隻蜘蛛距離 3 英呎以上。

趣 聞

Page 13: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

479

雖然假設檢定可以是單尾的,大部分相關係數的假設檢定都是雙尾的。回

憶一下,ρ 表示母體相關係數。同時,如果沒有線性關係,相關係數的值會是

0,因此,假設會是

Section 10–1 Scatter Plots and Correlation 543

10–11

To make this decision, you use a hypothesis-testing procedure. The traditionalmethod is similar to the one used in previous chapters.

Step 1 State the hypotheses.

Step 2 Find the critical values.

Step 3 Compute the test value.

Step 4 Make the decision.

Step 5 Summarize the results.

The population correlation coefficient is computed from taking all possible (x, y)pairs; it is designated by the Greek letter r (rho). The sample correlation coefficient canthen be used as an estimator of r if the following assumptions are valid.

1. The variables x and y are linearly related.2. The variables are random variables.3. The two variables have a bivariate normal distribution.

A biviarate normal distribution means that for the pairs of (x, y) data values, the cor-responding y values have a bell-shaped distribution for any given x value, and the x val-ues for any given y value have a bell-shaped distribution.

Formally defined, the population correlation coefficient r is the correlation computedby using all possible pairs of data values (x, y) taken from a population.

In hypothesis testing, one of these is true:

H0: r � 0 This null hypothesis means that there is no correlation between the x and y variables in the population.

H1: r � 0 This alternative hypothesis means that there is a significant correla-tion between the variables in the population.

When the null hypothesis is rejected at a specific level, it means that there is asignificant difference between the value of r and 0. When the null hypothesis is notrejected, it means that the value of r is not significantly different from 0 (zero) and isprobably due to chance.

Several methods can be used to test the significance of the correlation coefficient.Three methods will be shown in this section. The first uses the t test.

Interesting Fact

Scientists think that aperson is never morethan 3 feet away froma spider at any giventime!

Formula for the t Test for the Correlation Coefficient

with degrees of freedom equal to n � 2.

t � r� n � 21 � r 2

Although hypothesis tests can be one-tailed, most hypotheses involving the correla-tion coefficient are two-tailed. Recall that r represents the population correlation coeffi-cient. Also, if there is no linear relationship, the value of the correlation coefficient willbe 0. Hence, the hypotheses will be

H0: r � 0 and H1: r � 0

You do not have to identify the claim here, since the question will always be whetherthere is a significant linear relationship between the variables.

Historical Notes

A mathematiciannamed Karl Pearson(1857–1936) becameinterested in FrancisGalton’s work and sawthat the correlationand regression theorycould be applied toother areas besidesheredity. Pearsondeveloped thecorrelation coefficientthat bears his name.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 543

以及

Section 10–1 Scatter Plots and Correlation 543

10–11

To make this decision, you use a hypothesis-testing procedure. The traditionalmethod is similar to the one used in previous chapters.

Step 1 State the hypotheses.

Step 2 Find the critical values.

Step 3 Compute the test value.

Step 4 Make the decision.

Step 5 Summarize the results.

The population correlation coefficient is computed from taking all possible (x, y)pairs; it is designated by the Greek letter r (rho). The sample correlation coefficient canthen be used as an estimator of r if the following assumptions are valid.

1. The variables x and y are linearly related.2. The variables are random variables.3. The two variables have a bivariate normal distribution.

A biviarate normal distribution means that for the pairs of (x, y) data values, the cor-responding y values have a bell-shaped distribution for any given x value, and the x val-ues for any given y value have a bell-shaped distribution.

Formally defined, the population correlation coefficient r is the correlation computedby using all possible pairs of data values (x, y) taken from a population.

In hypothesis testing, one of these is true:

H0: r � 0 This null hypothesis means that there is no correlation between the x and y variables in the population.

H1: r � 0 This alternative hypothesis means that there is a significant correla-tion between the variables in the population.

When the null hypothesis is rejected at a specific level, it means that there is asignificant difference between the value of r and 0. When the null hypothesis is notrejected, it means that the value of r is not significantly different from 0 (zero) and isprobably due to chance.

Several methods can be used to test the significance of the correlation coefficient.Three methods will be shown in this section. The first uses the t test.

Interesting Fact

Scientists think that aperson is never morethan 3 feet away froma spider at any giventime!

Formula for the t Test for the Correlation Coefficient

with degrees of freedom equal to n � 2.

t � r� n � 21 � r 2

Although hypothesis tests can be one-tailed, most hypotheses involving the correla-tion coefficient are two-tailed. Recall that r represents the population correlation coeffi-cient. Also, if there is no linear relationship, the value of the correlation coefficient willbe 0. Hence, the hypotheses will be

H0: r � 0 and H1: r � 0

You do not have to identify the claim here, since the question will always be whetherthere is a significant linear relationship between the variables.

Historical Notes

A mathematiciannamed Karl Pearson(1857–1936) becameinterested in FrancisGalton’s work and sawthat the correlationand regression theorycould be applied toother areas besidesheredity. Pearsondeveloped thecorrelation coefficientthat bears his name.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 543

在這裡你不需要指出主張,因為問題總是這樣:變數之間是否存在某種顯

著的線性關係?

我們會用雙尾的臨界值。在附錄 C 的表 E 可以發現這些數字,同時,當

你檢定相關係數的顯著性,兩變數 x 和 y 必須來自常態分配母體。

檢定在例題 10-4 求出之相關係數的顯著性。使用 α =0.05 和 r=0.982。

■解答

步驟 1

陳述假設。

H0:ρ =0 以及 H1:ρ≠0

步驟 2

求出臨界值。因為 α = 0.05,而且有 6−2=4 的自由度,從表 E 求出臨界值是±2.776,如圖

10-7 所示。

步驟 3

計算檢定數值。

The two-tailed critical values are used. These values are found in Table F inAppendix C. Also, when you are testing the significance of a correlation coefficient, bothvariables x and y must come from normally distributed populations.

544 Chapter 10 Correlation and Regression

10–12

0 +2.776–2.776

Figure 10–7

Critical Values forExample 10–7

0 +2.776 +10.4–2.776

Figure 10–8

Test Value forExample 10–7

Step 3 Compute the test value.

Step 4 Make the decision. Reject the null hypothesis, since the test value falls in thecritical region, as shown in Figure 10–8.

t � r� n � 21 � r 2 � 0.982� 6 � 2

1 � �0.982�2 � 10.4

Example 10–7 Test the significance of the correlation coefficient found in Example 10–4. Use a� 0.05and r � 0.982.

Solution

Step 1 State the hypotheses.

H0: r � 0 and H1: r � 0

Step 2 Find the critical values. Since a� 0.05 and there are 6 � 2 � 4 degrees offreedom, the critical values obtained from Table F are �2.776, as shown inFigure 10–7.

Step 5 Summarize the results. There is a significant relationship between the numberof cars a rental agency owns and its annual income.

The second method that can be used to test the significance of r is the P-value method.The method is the same as that shown in Chapters 8 and 9. It uses the following steps.

Step 1 State the hypotheses.

Step 2 Find the test value. (In this case, use the t test.)

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 544

步驟 4

下決定。拒絕虛無假設,因為檢定數值落在拒絕域,如圖 10-8 所示。

例題 10-7

例題 10-7 的臨界值

圖 10-7

0 +2.776–2.776

Page 14: Simple Linear Regression from Blumen in Chinese

統計學

480

步驟 5

摘要結論。在出租車輛數與公司的收益之間有一種顯著的關係。

例題 10-7 的檢定數值

圖 10-8

0 +2.776 +10.4–2.776

第二種可以用來檢定 r 的顯著性的方法是 p 值法。這一個方法和在第 8 章

以及第 9 章的內容一樣。它使用以下的步驟。

步驟 1 陳述假設。

步驟 2 計算檢定數值。(這時候使用 t 檢定。)

步驟 3 求出 p 值。(這時候使用表 E。)

步驟 4 下決定。

步驟 5 摘要結論。

考慮一個例子,其中 t =4.059 和 d.f. =4。使用表 E 加上 d.f. =4,在雙尾

那一列,發現數字 4.059 落入 3.747 和 4.604 之間;因此,0.01<p 值<0.02。

(從計算機得到的 p 值是 0.015。)也就是說,p 值落在 0.01 和 0.02 之間。然

後,我們決定拒絕虛無假設,因為 p 值<0.05。

第三個方法是用附錄 C 的表 H 檢定 r 的顯著性。針對特定的水準 α 和某

個自由度,這一張表格顯示什麼樣的相關係數是顯著的。比如說,針對自由度

7 和 α =0.05,表格給我們的臨界值是 0.666。任何超過+0.666 或是−0.666 的

r 都會被認為是顯著的,而且虛無假設會被拒絕。詳見圖 10-9。當使用表 H 的

時候,你不需要計算 t 檢定數值。另外,表 H 只適用於雙尾檢定。

Page 15: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

481

圖 10-9

從表 H 求出臨界值

d.f. a = 0.05

1

2

3

4

5

6

7 0.666

a = 0.01

針對例題 10-6 找到的相關係數 r= −0.176,在 α =0.01 之下利用表 H 檢定顯著性。

■解答

H0:ρ =0 以及 H1:ρ≠0

因為樣本數是 8,有 n −2 或說是 8 −2 =6 個自由度。當 α =0.01 且 d.f. =6 的時候,從表 H

得到的臨界值是 0.834。如果是顯著的線性關係,r 值要超過+0.834 或低於−0.834。因為

r = −0.176,它超過−0.834,所以虛無假設不會被拒絕。因此,沒有足夠的證據支持年紀和財產

之間有某種顯著的線性關係。

例題 10-8

例題 10-8 的拒絕域和非拒絕域

圖 10-10

–1 –0.834 +0.834–0.176

Reject RejectDo not reject

0 +1

拒絕 不拒絕 拒絕

相關和因果 研究員必須了解獨立變數 x 與依變數 y 之間線性關係的本質,當

假設檢定指出變數間存在某種顯著的關係,研究員必須考慮以下內容的可能

546 Chapter 10 Correlation and Regression

10–14

Possible Relationships Between Variables

When the null hypothesis has been rejected for a specific a value, any of the following fivepossibilities can exist.

1. There is a direct cause-and-effect relationship between the variables. That is, x causes y.For example, water causes plants to grow, poison causes death, and heat causes ice to melt.

2. There is a reverse cause-and-effect relationship between the variables. That is, y causes x.For example, suppose a researcher believes excessive coffee consumption causesnervousness, but the researcher fails to consider that the reverse situation may occur. Thatis, it may be that an extremely nervous person craves coffee to calm his or her nerves.

3. The relationship between the variables may be caused by a third variable. For example, ifa statistician correlated the number of deaths due to drowning and the number of cans ofsoft drink consumed daily during the summer, he or she would probably find a significantrelationship. However, the soft drink is not necessarily responsible for the deaths, sinceboth variables may be related to heat and humidity.

4. There may be a complexity of interrelationships among many variables. For example, aresearcher may find a significant relationship between students’ high school grades andcollege grades. But there probably are many other variables involved, such as IQ, hoursof study, influence of parents, motivation, age, and instructors.

5. The relationship may be coincidental. For example, a researcher may be able to find asignificant relationship between the increase in the number of people who are exercisingand the increase in the number of people who are committing crimes. But common sensedictates that any relationship between these two values must be due to coincidence.

When two variables are highly correlated, item 3 in the box states that there exists apossibility that the correlation is due to a third variable. If this is the case and the thirdvariable is unknown to the researcher or not accounted for in the study, it is called a

Correlation and Causation Researchers must understand the nature of the linearrelationship between the independent variable x and the dependent variable y. When ahypothesis test indicates that a significant linear relationship exists between the variables,researchers must consider the possibilities outlined next.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 546

© Getty RF.

Page 16: Simple Linear Regression from Blumen in Chinese

統計學

482

性。

當兩變數高度相關的時候,上述的第 3 點提出一種可能性,就是兩者相關

因為第三個變數。如果是這樣,而且研究員不知道是哪一個變數或是該變數未

被包含在研究內,則它叫做潛伏變數 (lurking variable)。研究員會試圖找到這

樣的變數,並且使用方法控制它們的影響。

再一次強調一項重點,如果兩變數的相關係數很高,不代表具有因果關

係。也有其他可能,諸如潛伏變數或是巧合。

同時,你應該注意一個或兩個變數涉及平均數而不是個別數據。用平均數

不是錯誤,但是分析結果卻無法一般化到個體,因為平均數會淡化個別數據間

的變異。這可能會帶出比實際情形高的相關結果。

因此,當拒絕虛無假設的時候,研究員必須考慮所有可能性,並且透過研

究結果決定其中的一個。記住,相關不必然帶出因果。

變數間可能的關係

當虛無假設在某一個 α 值被拒絕的時候,會存在以下五種可能性:

1. 變數間有一種直接的因果關係。也就是說,x 引起 y。比如說,有水植物才會

長大,中毒導致身亡,或是熱讓冰熔化。

2. 變數間有一種逆向的因果關係。也就是說,y 引起 x。比如說,假設某一位

研究員相信喝太多咖啡會造成緊張,但是研究員卻沒有想到可能是相反的情

況。也就是說,極度緊張的人想要喝咖啡減輕緊張的程度。

3. 變數間的關係可能是因為同時受到第三個變數的影響。比如說,如果有一位

統計學家把死亡人數和溺死人數以及暑假每天喝幾罐汽水相關起來,他可能

會發現某種顯著的關係。不過,汽水並不會造成死亡,因為兩個變數可能都

和高溫以及溼度有關。

4. 許多變數之間有各種複雜關係。比如說,有一位研究員可能發現學生的大學

成績和高中成績有顯著的關係。但是可能也與其他變數有關,諸如智商、念

書時數、父母的影響、動機、年紀以及老師。

5. 有關係可能是因為巧合。比如說,某一位研究員可能在運動人數和犯罪人數

之間發現一種顯著關係。但是一般知識指出任何這兩種數字之間的關係一定

是因為巧合。

Page 17: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

483

觀念應用 10-1

煞車距離

在一項速度控制的研究,發現制定交通規則的最主要理由其實是為了車流

效率和降低發生危險的風險。有一個領域曾經是研究的重點,就是各種速度下

的煞車距離。使用以下的數據回答問題。

lurking variable. An attempt should be made by the researcher to identify such variablesand to use methods to control their influence.

It is important to restate the fact that even if the correlation between two variables ishigh, it does not necessarily mean causation. There are other possibilities, such as lurk-ing variables or just a coincidental relationship. See the Speaking of Statistics article onpage 548.

Also, you should be cautious when the data for one or both of the variables involveaverages rather than individual data. It is not wrong to use averages, but the results cannotbe generalized to individuals since averaging tends to smooth out the variability amongindividual data values. The result could be a higher correlation than actually exists.

Thus, when the null hypothesis is rejected, the researcher must consider all possibil-ities and select the appropriate one as determined by the study. Remember, correlationdoes not necessarily imply causation.

Applying the Concepts 10–1

Stopping DistancesIn a study on speed control, it was found that the main reasons for regulations were to maketraffic flow more efficient and to minimize the risk of danger. An area that was focused on inthe study was the distance required to completely stop a vehicle at various speeds. Use thefollowing table to answer the questions.

MPH Braking distance (feet)

20 2030 4540 8150 13360 20580 411

Assume MPH is going to be used to predict stopping distance.

1. Which of the two variables is the independent variable?

2. Which is the dependent variable?

3. What type of variable is the independent variable?

4. What type of variable is the dependent variable?

5. Construct a scatter plot for the data.

6. Is there a linear relationship between the two variables?

7. Redraw the scatter plot, and change the distances between the independent-variablenumbers. Does the relationship look different?

8. Is the relationship positive or negative?

9. Can braking distance be accurately predicted from MPH?

10. List some other variables that affect braking distance.

11. Compute the value of r.

12. Is r significant at ?

See page 589 for the answers.

a � 0.05

Section 10–1 Scatter Plots and Correlation 547

10–15

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 547

MPH 煞車距離(英呎)

假設 MPH 會被用來預測煞車距離。

1. 上述兩個變數中,哪一個是獨立變數?

2. 哪一個是依變數?

3. 獨立變數是哪一種變數?

4. 依變數是哪一種變數?

5. 為數據建構一張散佈圖。

6. 兩變數之間有某種線性關係嗎?

7. 改變獨立變數的數字間的距離,再畫一張散佈圖。此關係看起來有不一樣

嗎?

8. 關係是正的還是負的?

9. 可以用 MPH 準確預測煞車距離嗎?

10. 舉出數個影響煞車距離的變數。

11. 計算相關係數 r。

12. 在 α =0.05 之下,相關係數 r 顯著嗎?

答案在第 509 頁。

1. 兩變數有關係的主張是什麼意思?

2. 樣本相關係數的符號是哪一個?母體相關係

數呢?

3. 兩變數之間有正關係的意思是什麼?負關係

呢?

4. 舉出一個相關研究的例子,並且指出獨立變

數和依變數。

5. 本節使用的相關係數名稱是什麼?

6. 當兩變數是相關的,研究員可以確定是哪一

練習題 10-1

Page 18: Simple Linear Regression from Blumen in Chinese

統計學

484

個引起哪一個嗎?

針對練習題 7 到 14,執行以下的步驟。

a. 為變數繪製一張散佈圖。

b. 計算相關係數。

c. 陳述假設。

d. 使用附錄 C 表 H 在 α = 0.05 之下檢定相

關係數的顯著性。

e. 簡單解釋該關係的種類。

7. 商業電影 年度發表的數據顯示歷年來每一

家電影院的上映次數與它的總收入。根據數

據,可以認為上映次數與總收入之間有某種

關係嗎?

資料來源:www.showbizdata.com

c. State the hypotheses.d. Test the significance of the correlation coefficient ata � 0.05, using Table I.

e. Give a brief explanation of the type of relationship.

12. Gas Tax and Fuel Use The data below indicatethe state gas tax in cents per gallon and the fuel use per

registered vehicle (in gallons). Is there a significantrelationship between these two variables?

Tax 21.5 23 18 24.5 26.4 19

Usage 1062 631 920 686 736 684

(The information in this exercise will be used forExercise 12 in Section 10–2.)

Source: World Almanac.

13. Commercial Movie Releases The yearlydata have been published showing the number of

releases for each of the commercial movie studiosand the gross receipts for those studios thus far. Basedon these data, can it be concluded that there is arelationship between the number of releases and thegross receipts?

No. of releases x 361 270 306 22 35 10 8 12 21

Gross receipts y(million $) 3844 1962 1371 1064 334 241 188 154 125

(The information in this exercise will be used forExercises 13 and 36 in Section 10–2 and Exercises 15and 19 in Section 10–3.)

Source: www.showbizdata.com

14. Forest Fires and Acres Burned Anenvironmentalist wants to determine the relationships

between the numbers (in thousands) of forest fires overthe year and the number (in hundred thousands) of acresburned. The data for 8 recent years are shown. Describethe relationship.

Number of fires x 72 69 58 47 84 62 57 45

Number of acres burned y 62 42 19 26 51 15 30 15

Source: National Interagency Fire Center.

(The information in this exercise will be used forExercise 14 in Section 10–2 and Exercises 16 and 20 inSection 10–3.)

15. Alumni Contributions The director of analumni association for a small college wants to

determine whether there is any type of relationshipbetween the amount of an alumnus’s contribution(in dollars) and the years the alumnus has beenout of school. The data follow. (The information is usedfor Exercises 15, 36, and 37 in Section 10–2 andExercises 17 and 21 in Section 10–3.)

Years x 1 5 3 10 7 6

Contribution y 500 100 300 50 75 80

16. State Debt and Per Capita Tax An economicsstudent wishes to see if there is a relationship between

the amount of state debt per capita and the amount oftax per capita at the state level. Based on the followingdata, can she or he conclude that per capita state debtand per capita state taxes are related? Both amounts arein dollars and represent five randomly selected states.(The information in this exercise will be used forExercises 16 and 37 in Section 10–2 and Exercises 18and 22 in Section 10–3.)

Per capita debt x 1924 907 1445 1608 661

Per capita tax y 1685 1838 1734 1842 1317

Source: World Almanac.

17. School Districts and Secondary Schools Arandom sample of states yielded the following

numbers of local school districts and the correspondingnumbers of secondary schools. Is there a significantrelationship between the data?

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Source: World Almanac.

(The information in this exercise will be used forExercise 17 of Section 10–2.)

18. Triples and Home Runs The data below showthe number of three-base hits (triples) and the number

of home runs hit during the season by a random sampleof MLB teams. Is there a significant relationshipbetween the data?

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Source: New York Times Almanac.

(The information in this exercise will be used forExercises 18 and 38 in Section 10–2.)

19. Egg Production Recent agricultural datashowed the number of eggs produced and the

price received per dozen for a given year. Based onthe following data for a random selection of states,can it be concluded that a relationship existsbetween the number of eggs produced and the priceper dozen? (The information in this exercise will beused for Exercise 19 in Section 10–2.)

No. of eggs (millions) x 957 1332 1163 1865 119 273

Price per dozen (dollars) y 0.770 0.697 0.617 0.652 1.080 1.420

Source: World Almanac.

Section 10–1 Scatter Plots and Correlation 549

10–17

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 549

上映次數 x

總收入 y(百萬美元)

8. 校友捐款 一所小型大學的校友會會長希望

知道校友捐款(以美元計)和畢業年數之間

是不是有某一種關係?數據如下所示。

c. State the hypotheses.d. Test the significance of the correlation coefficient ata � 0.05, using Table I.

e. Give a brief explanation of the type of relationship.

12. Gas Tax and Fuel Use The data below indicatethe state gas tax in cents per gallon and the fuel use per

registered vehicle (in gallons). Is there a significantrelationship between these two variables?

Tax 21.5 23 18 24.5 26.4 19

Usage 1062 631 920 686 736 684

(The information in this exercise will be used forExercise 12 in Section 10–2.)

Source: World Almanac.

13. Commercial Movie Releases The yearlydata have been published showing the number of

releases for each of the commercial movie studiosand the gross receipts for those studios thus far. Basedon these data, can it be concluded that there is arelationship between the number of releases and thegross receipts?

No. of releases x 361 270 306 22 35 10 8 12 21

Gross receipts y(million $) 3844 1962 1371 1064 334 241 188 154 125

(The information in this exercise will be used forExercises 13 and 36 in Section 10–2 and Exercises 15and 19 in Section 10–3.)

Source: www.showbizdata.com

14. Forest Fires and Acres Burned Anenvironmentalist wants to determine the relationships

between the numbers (in thousands) of forest fires overthe year and the number (in hundred thousands) of acresburned. The data for 8 recent years are shown. Describethe relationship.

Number of fires x 72 69 58 47 84 62 57 45

Number of acres burned y 62 42 19 26 51 15 30 15

Source: National Interagency Fire Center.

(The information in this exercise will be used forExercise 14 in Section 10–2 and Exercises 16 and 20 inSection 10–3.)

15. Alumni Contributions The director of analumni association for a small college wants to

determine whether there is any type of relationshipbetween the amount of an alumnus’s contribution(in dollars) and the years the alumnus has beenout of school. The data follow. (The information is usedfor Exercises 15, 36, and 37 in Section 10–2 andExercises 17 and 21 in Section 10–3.)

Years x 1 5 3 10 7 6

Contribution y 500 100 300 50 75 80

16. State Debt and Per Capita Tax An economicsstudent wishes to see if there is a relationship between

the amount of state debt per capita and the amount oftax per capita at the state level. Based on the followingdata, can she or he conclude that per capita state debtand per capita state taxes are related? Both amounts arein dollars and represent five randomly selected states.(The information in this exercise will be used forExercises 16 and 37 in Section 10–2 and Exercises 18and 22 in Section 10–3.)

Per capita debt x 1924 907 1445 1608 661

Per capita tax y 1685 1838 1734 1842 1317

Source: World Almanac.

17. School Districts and Secondary Schools Arandom sample of states yielded the following

numbers of local school districts and the correspondingnumbers of secondary schools. Is there a significantrelationship between the data?

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Source: World Almanac.

(The information in this exercise will be used forExercise 17 of Section 10–2.)

18. Triples and Home Runs The data below showthe number of three-base hits (triples) and the number

of home runs hit during the season by a random sampleof MLB teams. Is there a significant relationshipbetween the data?

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Source: New York Times Almanac.

(The information in this exercise will be used forExercises 18 and 38 in Section 10–2.)

19. Egg Production Recent agricultural datashowed the number of eggs produced and the

price received per dozen for a given year. Based onthe following data for a random selection of states,can it be concluded that a relationship existsbetween the number of eggs produced and the priceper dozen? (The information in this exercise will beused for Exercise 19 in Section 10–2.)

No. of eggs (millions) x 957 1332 1163 1865 119 273

Price per dozen (dollars) y 0.770 0.697 0.617 0.652 1.080 1.420

Source: World Almanac.

Section 10–1 Scatter Plots and Correlation 549

10–17

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 549

畢業年數 x

捐款 y

9. 學區與高中 一組隨機樣本產生以下的資

訊,地方學區的個數和它有幾所高中。數據

之間有顯著的關係嗎?

資料來源:World Almanac.

c. State the hypotheses.d. Test the significance of the correlation coefficient ata � 0.05, using Table I.

e. Give a brief explanation of the type of relationship.

12. Gas Tax and Fuel Use The data below indicatethe state gas tax in cents per gallon and the fuel use per

registered vehicle (in gallons). Is there a significantrelationship between these two variables?

Tax 21.5 23 18 24.5 26.4 19

Usage 1062 631 920 686 736 684

(The information in this exercise will be used forExercise 12 in Section 10–2.)

Source: World Almanac.

13. Commercial Movie Releases The yearlydata have been published showing the number of

releases for each of the commercial movie studiosand the gross receipts for those studios thus far. Basedon these data, can it be concluded that there is arelationship between the number of releases and thegross receipts?

No. of releases x 361 270 306 22 35 10 8 12 21

Gross receipts y(million $) 3844 1962 1371 1064 334 241 188 154 125

(The information in this exercise will be used forExercises 13 and 36 in Section 10–2 and Exercises 15and 19 in Section 10–3.)

Source: www.showbizdata.com

14. Forest Fires and Acres Burned Anenvironmentalist wants to determine the relationships

between the numbers (in thousands) of forest fires overthe year and the number (in hundred thousands) of acresburned. The data for 8 recent years are shown. Describethe relationship.

Number of fires x 72 69 58 47 84 62 57 45

Number of acres burned y 62 42 19 26 51 15 30 15

Source: National Interagency Fire Center.

(The information in this exercise will be used forExercise 14 in Section 10–2 and Exercises 16 and 20 inSection 10–3.)

15. Alumni Contributions The director of analumni association for a small college wants to

determine whether there is any type of relationshipbetween the amount of an alumnus’s contribution(in dollars) and the years the alumnus has beenout of school. The data follow. (The information is usedfor Exercises 15, 36, and 37 in Section 10–2 andExercises 17 and 21 in Section 10–3.)

Years x 1 5 3 10 7 6

Contribution y 500 100 300 50 75 80

16. State Debt and Per Capita Tax An economicsstudent wishes to see if there is a relationship between

the amount of state debt per capita and the amount oftax per capita at the state level. Based on the followingdata, can she or he conclude that per capita state debtand per capita state taxes are related? Both amounts arein dollars and represent five randomly selected states.(The information in this exercise will be used forExercises 16 and 37 in Section 10–2 and Exercises 18and 22 in Section 10–3.)

Per capita debt x 1924 907 1445 1608 661

Per capita tax y 1685 1838 1734 1842 1317

Source: World Almanac.

17. School Districts and Secondary Schools Arandom sample of states yielded the following

numbers of local school districts and the correspondingnumbers of secondary schools. Is there a significantrelationship between the data?

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Source: World Almanac.

(The information in this exercise will be used forExercise 17 of Section 10–2.)

18. Triples and Home Runs The data below showthe number of three-base hits (triples) and the number

of home runs hit during the season by a random sampleof MLB teams. Is there a significant relationshipbetween the data?

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Source: New York Times Almanac.

(The information in this exercise will be used forExercises 18 and 38 in Section 10–2.)

19. Egg Production Recent agricultural datashowed the number of eggs produced and the

price received per dozen for a given year. Based onthe following data for a random selection of states,can it be concluded that a relationship existsbetween the number of eggs produced and the priceper dozen? (The information in this exercise will beused for Exercise 19 in Section 10–2.)

No. of eggs (millions) x 957 1332 1163 1865 119 273

Price per dozen (dollars) y 0.770 0.697 0.617 0.652 1.080 1.420

Source: World Almanac.

Section 10–1 Scatter Plots and Correlation 549

10–17

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 549

學區 x

高中 y

10. 蛋的產量 最近的農業數據顯示某一年蛋的

產量和每一打的價格。根據下列隨機挑選一

些州的數據,可以認為蛋的產量和每一打的

價格之間存在某一種關係嗎?

資料來源:World Almanac.

c. State the hypotheses.d. Test the significance of the correlation coefficient ata � 0.05, using Table I.

e. Give a brief explanation of the type of relationship.

12. Gas Tax and Fuel Use The data below indicatethe state gas tax in cents per gallon and the fuel use per

registered vehicle (in gallons). Is there a significantrelationship between these two variables?

Tax 21.5 23 18 24.5 26.4 19

Usage 1062 631 920 686 736 684

(The information in this exercise will be used forExercise 12 in Section 10–2.)

Source: World Almanac.

13. Commercial Movie Releases The yearlydata have been published showing the number of

releases for each of the commercial movie studiosand the gross receipts for those studios thus far. Basedon these data, can it be concluded that there is arelationship between the number of releases and thegross receipts?

No. of releases x 361 270 306 22 35 10 8 12 21

Gross receipts y(million $) 3844 1962 1371 1064 334 241 188 154 125

(The information in this exercise will be used forExercises 13 and 36 in Section 10–2 and Exercises 15and 19 in Section 10–3.)

Source: www.showbizdata.com

14. Forest Fires and Acres Burned Anenvironmentalist wants to determine the relationships

between the numbers (in thousands) of forest fires overthe year and the number (in hundred thousands) of acresburned. The data for 8 recent years are shown. Describethe relationship.

Number of fires x 72 69 58 47 84 62 57 45

Number of acres burned y 62 42 19 26 51 15 30 15

Source: National Interagency Fire Center.

(The information in this exercise will be used forExercise 14 in Section 10–2 and Exercises 16 and 20 inSection 10–3.)

15. Alumni Contributions The director of analumni association for a small college wants to

determine whether there is any type of relationshipbetween the amount of an alumnus’s contribution(in dollars) and the years the alumnus has beenout of school. The data follow. (The information is usedfor Exercises 15, 36, and 37 in Section 10–2 andExercises 17 and 21 in Section 10–3.)

Years x 1 5 3 10 7 6

Contribution y 500 100 300 50 75 80

16. State Debt and Per Capita Tax An economicsstudent wishes to see if there is a relationship between

the amount of state debt per capita and the amount oftax per capita at the state level. Based on the followingdata, can she or he conclude that per capita state debtand per capita state taxes are related? Both amounts arein dollars and represent five randomly selected states.(The information in this exercise will be used forExercises 16 and 37 in Section 10–2 and Exercises 18and 22 in Section 10–3.)

Per capita debt x 1924 907 1445 1608 661

Per capita tax y 1685 1838 1734 1842 1317

Source: World Almanac.

17. School Districts and Secondary Schools Arandom sample of states yielded the following

numbers of local school districts and the correspondingnumbers of secondary schools. Is there a significantrelationship between the data?

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Source: World Almanac.

(The information in this exercise will be used forExercise 17 of Section 10–2.)

18. Triples and Home Runs The data below showthe number of three-base hits (triples) and the number

of home runs hit during the season by a random sampleof MLB teams. Is there a significant relationshipbetween the data?

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Source: New York Times Almanac.

(The information in this exercise will be used forExercises 18 and 38 in Section 10–2.)

19. Egg Production Recent agricultural datashowed the number of eggs produced and the

price received per dozen for a given year. Based onthe following data for a random selection of states,can it be concluded that a relationship existsbetween the number of eggs produced and the priceper dozen? (The information in this exercise will beused for Exercise 19 in Section 10–2.)

No. of eggs (millions) x 957 1332 1163 1865 119 273

Price per dozen (dollars) y 0.770 0.697 0.617 0.652 1.080 1.420

Source: World Almanac.

Section 10–1 Scatter Plots and Correlation 549

10–17

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 549

蛋的產量 x(百萬)

每打價格 y(美元)

11. 教職員和學生 一組小型大學的隨機樣本顯

示了教職員人數與學生數。兩變數間有顯著

的關係嗎?對調 x 和 y 的角色再做一次。你

認為哪一個是真正的獨立變數?

資料來源:World Almanac.

20. Emergency Calls and Temperature Anemergency service wishes to see whether a relation-

ship exists between the outside temperature and thenumber of emergency calls it receives for a 7-hourperiod. The data are shown. (The information in thisexercise will be used for Exercises 20 and 38in Section 10–2.)

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

21. Faculty and Students The number of facultyand the number of students are shown for a random

selection of small colleges. Is there a significantrelationship between the two variables? Switch xand y and repeat the process. Which do you think isreally the independent variable?

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Source: World Almanac.

(The information in this exercise will be used forExercises 21 and 36 in Section 10–2.)

22. Precipitation and Snow/Sleet For a randomselection of U.S. cities, the following data show thenumber of days for which the precipitation is greaterthan or equal to 0.01 inch and the number of days forwhich there is at least 1 inch of snow and/or sleet. Isthere a significant linear relationship between thevariables?

Precipitation �0.01 inch 61 111 140 116 88 136

Snow/sleet �1 in 2 15 21 8 11 13

Source: World Almanac.

(The information in this exercise will be used forExercise 22 in Section 10–2.)

23. Average Temperature and PrecipitationThe average normal daily temperature (in degrees

Fahrenheit) and the corresponding average monthlyprecipitation (in inches) for the month of June areshown here for seven randomly selected cities in theUnited States. Determine if there is a relationshipbetween the two variables. (The information inthis exercise will be used for Exercise 23 in Section 10–2.)

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Source: New York Times Almanac.

24. NHL Assists and Total Points A randomsample of scoring leaders from the NHL showed the

following numbers of assists and total points. Basedon these data, can it be concluded that there is asignificant relationship between the two?

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Source: Associated Press.

(The information in this exercise will be used forExercise 24 in Section 10–2.)

25. Fat Grams and Secondary Schools Thenumbers of fat calories and grams of saturated fat in

a number of fast-food nonbreakfast entrees are shownbelow. Is there sufficient evidence to conclude asignificant relationship between the two variables?

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Source: www.fatcalories.com

(The information in this exercise will be used inExercise 25 in Section 10–2.)

26. Tall Buildings An architect wants to determinethe relationship between the heights (in feet) of a

building and the number of stories in the building.The data for a sample of 10 buildings in Pittsburghare shown. Explain the relationship.

Stories x 64 54 40 31 45 38 42 41 37 40

Height y 841 725 635 616 615 582 535 520 511 485

Source: World Almanac Book of Facts.

(The information in this exercise will be used forExercise 26 of Section 10–2.)

27. Hospital Beds A hospital administrator wantsto see if there is a relationship between the number

of licensed beds and the number of staffed beds inlocal hospitals. The data for a specific day are shown.Describe the relationship.

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Source: Pittsburgh Tribune-Review.

(The information in this exercise will be used forExercise 28 of this section and Exercise 27 in Section 10–2.)

550 Chapter 10 Correlation and Regression

10–18

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 550

20. Emergency Calls and Temperature Anemergency service wishes to see whether a relation-

ship exists between the outside temperature and thenumber of emergency calls it receives for a 7-hourperiod. The data are shown. (The information in thisexercise will be used for Exercises 20 and 38in Section 10–2.)

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

21. Faculty and Students The number of facultyand the number of students are shown for a random

selection of small colleges. Is there a significantrelationship between the two variables? Switch xand y and repeat the process. Which do you think isreally the independent variable?

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Source: World Almanac.

(The information in this exercise will be used forExercises 21 and 36 in Section 10–2.)

22. Precipitation and Snow/Sleet For a randomselection of U.S. cities, the following data show thenumber of days for which the precipitation is greaterthan or equal to 0.01 inch and the number of days forwhich there is at least 1 inch of snow and/or sleet. Isthere a significant linear relationship between thevariables?

Precipitation �0.01 inch 61 111 140 116 88 136

Snow/sleet �1 in 2 15 21 8 11 13

Source: World Almanac.

(The information in this exercise will be used forExercise 22 in Section 10–2.)

23. Average Temperature and PrecipitationThe average normal daily temperature (in degrees

Fahrenheit) and the corresponding average monthlyprecipitation (in inches) for the month of June areshown here for seven randomly selected cities in theUnited States. Determine if there is a relationshipbetween the two variables. (The information inthis exercise will be used for Exercise 23 in Section 10–2.)

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Source: New York Times Almanac.

24. NHL Assists and Total Points A randomsample of scoring leaders from the NHL showed the

following numbers of assists and total points. Basedon these data, can it be concluded that there is asignificant relationship between the two?

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Source: Associated Press.

(The information in this exercise will be used forExercise 24 in Section 10–2.)

25. Fat Grams and Secondary Schools Thenumbers of fat calories and grams of saturated fat in

a number of fast-food nonbreakfast entrees are shownbelow. Is there sufficient evidence to conclude asignificant relationship between the two variables?

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Source: www.fatcalories.com

(The information in this exercise will be used inExercise 25 in Section 10–2.)

26. Tall Buildings An architect wants to determinethe relationship between the heights (in feet) of a

building and the number of stories in the building.The data for a sample of 10 buildings in Pittsburghare shown. Explain the relationship.

Stories x 64 54 40 31 45 38 42 41 37 40

Height y 841 725 635 616 615 582 535 520 511 485

Source: World Almanac Book of Facts.

(The information in this exercise will be used forExercise 26 of Section 10–2.)

27. Hospital Beds A hospital administrator wantsto see if there is a relationship between the number

of licensed beds and the number of staffed beds inlocal hospitals. The data for a specific day are shown.Describe the relationship.

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Source: Pittsburgh Tribune-Review.

(The information in this exercise will be used forExercise 28 of this section and Exercise 27 in Section 10–2.)

550 Chapter 10 Correlation and Regression

10–18

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 550

教職員人數 x

學生數 y

12. 平均溫度與降雨量 隨機挑選七個美國城

市,它們的 6 月平均每日常溫(以華氏計)

和平均每月降雨量(以英吋計)如下所示。

決定這兩項變數之間是不是有某一種關係。

資料來源:New York Times Almanac.

20. Emergency Calls and Temperature Anemergency service wishes to see whether a relation-

ship exists between the outside temperature and thenumber of emergency calls it receives for a 7-hourperiod. The data are shown. (The information in thisexercise will be used for Exercises 20 and 38in Section 10–2.)

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

21. Faculty and Students The number of facultyand the number of students are shown for a random

selection of small colleges. Is there a significantrelationship between the two variables? Switch xand y and repeat the process. Which do you think isreally the independent variable?

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Source: World Almanac.

(The information in this exercise will be used forExercises 21 and 36 in Section 10–2.)

22. Precipitation and Snow/Sleet For a randomselection of U.S. cities, the following data show thenumber of days for which the precipitation is greaterthan or equal to 0.01 inch and the number of days forwhich there is at least 1 inch of snow and/or sleet. Isthere a significant linear relationship between thevariables?

Precipitation �0.01 inch 61 111 140 116 88 136

Snow/sleet �1 in 2 15 21 8 11 13

Source: World Almanac.

(The information in this exercise will be used forExercise 22 in Section 10–2.)

23. Average Temperature and PrecipitationThe average normal daily temperature (in degrees

Fahrenheit) and the corresponding average monthlyprecipitation (in inches) for the month of June areshown here for seven randomly selected cities in theUnited States. Determine if there is a relationshipbetween the two variables. (The information inthis exercise will be used for Exercise 23 in Section 10–2.)

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Source: New York Times Almanac.

24. NHL Assists and Total Points A randomsample of scoring leaders from the NHL showed the

following numbers of assists and total points. Basedon these data, can it be concluded that there is asignificant relationship between the two?

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Source: Associated Press.

(The information in this exercise will be used forExercise 24 in Section 10–2.)

25. Fat Grams and Secondary Schools Thenumbers of fat calories and grams of saturated fat in

a number of fast-food nonbreakfast entrees are shownbelow. Is there sufficient evidence to conclude asignificant relationship between the two variables?

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Source: www.fatcalories.com

(The information in this exercise will be used inExercise 25 in Section 10–2.)

26. Tall Buildings An architect wants to determinethe relationship between the heights (in feet) of a

building and the number of stories in the building.The data for a sample of 10 buildings in Pittsburghare shown. Explain the relationship.

Stories x 64 54 40 31 45 38 42 41 37 40

Height y 841 725 635 616 615 582 535 520 511 485

Source: World Almanac Book of Facts.

(The information in this exercise will be used forExercise 26 of Section 10–2.)

27. Hospital Beds A hospital administrator wantsto see if there is a relationship between the number

of licensed beds and the number of staffed beds inlocal hospitals. The data for a specific day are shown.Describe the relationship.

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Source: Pittsburgh Tribune-Review.

(The information in this exercise will be used forExercise 28 of this section and Exercise 27 in Section 10–2.)

550 Chapter 10 Correlation and Regression

10–18

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 550

平均每日常溫 x

平均每月降雨量 y

13. 脂肪 某些速食的脂肪熱量(以卡路里計)

和飽和脂肪重量(以公克計)如下所示。有

充分證據認為兩變數之間有顯著的關係嗎?

資料來源:www.fatcalories.com

20. Emergency Calls and Temperature Anemergency service wishes to see whether a relation-

ship exists between the outside temperature and thenumber of emergency calls it receives for a 7-hourperiod. The data are shown. (The information in thisexercise will be used for Exercises 20 and 38in Section 10–2.)

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

21. Faculty and Students The number of facultyand the number of students are shown for a random

selection of small colleges. Is there a significantrelationship between the two variables? Switch xand y and repeat the process. Which do you think isreally the independent variable?

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Source: World Almanac.

(The information in this exercise will be used forExercises 21 and 36 in Section 10–2.)

22. Precipitation and Snow/Sleet For a randomselection of U.S. cities, the following data show thenumber of days for which the precipitation is greaterthan or equal to 0.01 inch and the number of days forwhich there is at least 1 inch of snow and/or sleet. Isthere a significant linear relationship between thevariables?

Precipitation �0.01 inch 61 111 140 116 88 136

Snow/sleet �1 in 2 15 21 8 11 13

Source: World Almanac.

(The information in this exercise will be used forExercise 22 in Section 10–2.)

23. Average Temperature and PrecipitationThe average normal daily temperature (in degrees

Fahrenheit) and the corresponding average monthlyprecipitation (in inches) for the month of June areshown here for seven randomly selected cities in theUnited States. Determine if there is a relationshipbetween the two variables. (The information inthis exercise will be used for Exercise 23 in Section 10–2.)

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Source: New York Times Almanac.

24. NHL Assists and Total Points A randomsample of scoring leaders from the NHL showed the

following numbers of assists and total points. Basedon these data, can it be concluded that there is asignificant relationship between the two?

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Source: Associated Press.

(The information in this exercise will be used forExercise 24 in Section 10–2.)

25. Fat Grams and Secondary Schools Thenumbers of fat calories and grams of saturated fat in

a number of fast-food nonbreakfast entrees are shownbelow. Is there sufficient evidence to conclude asignificant relationship between the two variables?

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Source: www.fatcalories.com

(The information in this exercise will be used inExercise 25 in Section 10–2.)

26. Tall Buildings An architect wants to determinethe relationship between the heights (in feet) of a

building and the number of stories in the building.The data for a sample of 10 buildings in Pittsburghare shown. Explain the relationship.

Stories x 64 54 40 31 45 38 42 41 37 40

Height y 841 725 635 616 615 582 535 520 511 485

Source: World Almanac Book of Facts.

(The information in this exercise will be used forExercise 26 of Section 10–2.)

27. Hospital Beds A hospital administrator wantsto see if there is a relationship between the number

of licensed beds and the number of staffed beds inlocal hospitals. The data for a specific day are shown.Describe the relationship.

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Source: Pittsburgh Tribune-Review.

(The information in this exercise will be used forExercise 28 of this section and Exercise 27 in Section 10–2.)

550 Chapter 10 Correlation and Regression

10–18

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 550

20. Emergency Calls and Temperature Anemergency service wishes to see whether a relation-

ship exists between the outside temperature and thenumber of emergency calls it receives for a 7-hourperiod. The data are shown. (The information in thisexercise will be used for Exercises 20 and 38in Section 10–2.)

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

21. Faculty and Students The number of facultyand the number of students are shown for a random

selection of small colleges. Is there a significantrelationship between the two variables? Switch xand y and repeat the process. Which do you think isreally the independent variable?

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Source: World Almanac.

(The information in this exercise will be used forExercises 21 and 36 in Section 10–2.)

22. Precipitation and Snow/Sleet For a randomselection of U.S. cities, the following data show thenumber of days for which the precipitation is greaterthan or equal to 0.01 inch and the number of days forwhich there is at least 1 inch of snow and/or sleet. Isthere a significant linear relationship between thevariables?

Precipitation �0.01 inch 61 111 140 116 88 136

Snow/sleet �1 in 2 15 21 8 11 13

Source: World Almanac.

(The information in this exercise will be used forExercise 22 in Section 10–2.)

23. Average Temperature and PrecipitationThe average normal daily temperature (in degrees

Fahrenheit) and the corresponding average monthlyprecipitation (in inches) for the month of June areshown here for seven randomly selected cities in theUnited States. Determine if there is a relationshipbetween the two variables. (The information inthis exercise will be used for Exercise 23 in Section 10–2.)

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Source: New York Times Almanac.

24. NHL Assists and Total Points A randomsample of scoring leaders from the NHL showed the

following numbers of assists and total points. Basedon these data, can it be concluded that there is asignificant relationship between the two?

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Source: Associated Press.

(The information in this exercise will be used forExercise 24 in Section 10–2.)

25. Fat Grams and Secondary Schools Thenumbers of fat calories and grams of saturated fat in

a number of fast-food nonbreakfast entrees are shownbelow. Is there sufficient evidence to conclude asignificant relationship between the two variables?

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Source: www.fatcalories.com

(The information in this exercise will be used inExercise 25 in Section 10–2.)

26. Tall Buildings An architect wants to determinethe relationship between the heights (in feet) of a

building and the number of stories in the building.The data for a sample of 10 buildings in Pittsburghare shown. Explain the relationship.

Stories x 64 54 40 31 45 38 42 41 37 40

Height y 841 725 635 616 615 582 535 520 511 485

Source: World Almanac Book of Facts.

(The information in this exercise will be used forExercise 26 of Section 10–2.)

27. Hospital Beds A hospital administrator wantsto see if there is a relationship between the number

of licensed beds and the number of staffed beds inlocal hospitals. The data for a specific day are shown.Describe the relationship.

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Source: Pittsburgh Tribune-Review.

(The information in this exercise will be used forExercise 28 of this section and Exercise 27 in Section 10–2.)

550 Chapter 10 Correlation and Regression

10–18

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 550

脂肪熱量 x

飽和脂肪重量 y

14. 醫院病床 有一位醫院的主任希望知道區域

醫院的許可床位數和工作人員床位數是不是

有關係。某一天的數據如下所示。描述兩變

數之間的關係。

資料來源:Pittsburgh Tribune-Review.

20. Emergency Calls and Temperature Anemergency service wishes to see whether a relation-

ship exists between the outside temperature and thenumber of emergency calls it receives for a 7-hourperiod. The data are shown. (The information in thisexercise will be used for Exercises 20 and 38in Section 10–2.)

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

21. Faculty and Students The number of facultyand the number of students are shown for a random

selection of small colleges. Is there a significantrelationship between the two variables? Switch xand y and repeat the process. Which do you think isreally the independent variable?

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Source: World Almanac.

(The information in this exercise will be used forExercises 21 and 36 in Section 10–2.)

22. Precipitation and Snow/Sleet For a randomselection of U.S. cities, the following data show thenumber of days for which the precipitation is greaterthan or equal to 0.01 inch and the number of days forwhich there is at least 1 inch of snow and/or sleet. Isthere a significant linear relationship between thevariables?

Precipitation �0.01 inch 61 111 140 116 88 136

Snow/sleet �1 in 2 15 21 8 11 13

Source: World Almanac.

(The information in this exercise will be used forExercise 22 in Section 10–2.)

23. Average Temperature and PrecipitationThe average normal daily temperature (in degrees

Fahrenheit) and the corresponding average monthlyprecipitation (in inches) for the month of June areshown here for seven randomly selected cities in theUnited States. Determine if there is a relationshipbetween the two variables. (The information inthis exercise will be used for Exercise 23 in Section 10–2.)

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Source: New York Times Almanac.

24. NHL Assists and Total Points A randomsample of scoring leaders from the NHL showed the

following numbers of assists and total points. Basedon these data, can it be concluded that there is asignificant relationship between the two?

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Source: Associated Press.

(The information in this exercise will be used forExercise 24 in Section 10–2.)

25. Fat Grams and Secondary Schools Thenumbers of fat calories and grams of saturated fat in

a number of fast-food nonbreakfast entrees are shownbelow. Is there sufficient evidence to conclude asignificant relationship between the two variables?

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Source: www.fatcalories.com

(The information in this exercise will be used inExercise 25 in Section 10–2.)

26. Tall Buildings An architect wants to determinethe relationship between the heights (in feet) of a

building and the number of stories in the building.The data for a sample of 10 buildings in Pittsburghare shown. Explain the relationship.

Stories x 64 54 40 31 45 38 42 41 37 40

Height y 841 725 635 616 615 582 535 520 511 485

Source: World Almanac Book of Facts.

(The information in this exercise will be used forExercise 26 of Section 10–2.)

27. Hospital Beds A hospital administrator wantsto see if there is a relationship between the number

of licensed beds and the number of staffed beds inlocal hospitals. The data for a specific day are shown.Describe the relationship.

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Source: Pittsburgh Tribune-Review.

(The information in this exercise will be used forExercise 28 of this section and Exercise 27 in Section 10–2.)

550 Chapter 10 Correlation and Regression

10–18

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 550

許可床位數 x

工作人員床位數 y

Page 19: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

485

10-2 迴歸

想研究兩變數之間的關係,我們會收集數據,然後建構一張散佈圖。如前

所述,這一張散佈圖的用意是用來決定關係的本質。可能性包括正線性關係、

負線性關係、曲線關係或是無明顯關係。散佈圖完成之後,下一步是計算相關

係數的數值並且檢定關係的顯著性。如果相關係數是顯著的,下一步就是決定

迴歸線 (regression line) 的方程式,它會和數據最相符。(注意:當 r 不顯著

的時候,決定迴歸線的方程式並用來預測是無意義的。)迴歸線的目的是讓研

究員了解趨勢,並且根據數據進行預測。

��最適線

圖 10-11 顯示兩變數的一張散佈圖。圖上顯示數條鄰近數據的直線。對於

這樣的散佈圖,你必須能夠畫出一條「最適線」。「最適」意味著數據點與直

線垂直距離的平方和會最小。你需要一條最適線的理由是,可以用 x 值預測 y

值。因此,數據愈接近那一條直線,愈合適而且預測會愈好。見圖 10-12。當

r 是正的,直線斜率往上揚。當 r 是負的,直線斜率從左邊往右邊下滑。

��決定迴歸線方程式

直線的代數方程式定義為 y =mx +b,其中 m 是斜率,而 b 是 y 截距。

(需要複習代數的學生,請在繼續之前先閱讀隨書光碟中附錄 A,第 A-3

節。)在統計學,迴歸線的方程式寫成 y′=a +bx,其中 a 是 y′ 截距,而 b 是

學習目標 氥

計算迴歸直線的方

程式。

圖 10-11

散佈圖和三條與

數據相符的直線

y

x

Page 20: Simple Linear Regression from Blumen in Chinese

統計學

486

斜率。見圖10-13。

有許多方法可以求出迴歸線的方程式,這裡會提供兩種公式。這些公式會

用到計算相關係數時的那些數值。這些公式的數學推導已超過本書設定的範

圍。

計算截距與斜率的四捨五入原則 將 a 和 b 的數值四捨五入到第三位小數。

圖 10-12

一組數據的最適

圖 10-13

代數與統計學裡

的直線

y

d5

d6 d7

d4d2d1

d3

x

觀察值

預測值

(a) Algebra of a line

y

x

5

y = mx + by = 0.5x + 5 Dy = 2

Dx = 4

m = DyDx

= 24

= 0.5

y9

x

5

y9 = a + bxy9 = 5 + 0.5x Dy9 = 2

Dx = 4

(b) Statistical notation for a regression line

b = Dy9Dx

= 24

= 0.5

斜率 y ′ 截距

斜率y 截距

(a) 直線的代數 (b) 迴歸線的統計學符號

迴歸線 y′ =a +bx 的公式

Section 10–2 Regression 553

10–21

Formulas for the Regression Line y � � a � bx

where a is the y� intercept and b is the slope of the line.

b �n��xy� � ��x���y�

n��x2� � ��x�2

a ���y���x2� � ��x���xy�

n��x2� � ��x�2

Example 10–9 Car Rental CompaniesFind the equation of the regression line for the data in Example 10–4, and graph the lineon the scatter plot of the data.

Solution

The values needed for the equation are n � 6, �x � 153.8, �y � 18.7, �xy � 682.77,and �x2 � 5859.26. Substituting in the formulas, you get

Hence, the equation of the regression line y� � a � bx is

y� � 0.396 � 0.106x

To graph the line, select any two points for x and find the corresponding values fory. Use any x values between 10 and 60. For example, let x � 15. Substitute in the equa-tion and find the corresponding y� value.

y� � 0.396� 0.396 � 0.106(15)� 1.986

Let x � 40; then

y� � 0.396 � 0.106x� 0.396 � 0.106(40)� 4.636

Then plot the two points (15, 1.986) and (40, 4.636) and draw a line connecting the twopoints. See Figure 10–14.

Note: When you draw the regression line, it is sometimes necessary to truncate thegraph (see Chapter 2). This is done when the distance between the origin and the firstlabeled coordinate on the x axis is not the same as the distance between the rest of the

b �n��xy� � ��x���y�

n��x2� � ��x�2 �6�682.77� � �153.8��18.7�

�6��5859.26� � �153.8�2 � 0.106

a ���y���x2� � ��x���xy�

n��x2� � ��x�2 ��18.7��5859.26� � �153.8��682.77�

�6��5859.26� � �153.8�2 � 0.396

Rounding Rule for the Intercept and Slope Round the values of a and b tothree decimal places.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 553

其中 a 是 y′ 截距,而 b 是迴歸線的斜率。

Page 21: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

487

求出例題 10-4 數據的迴歸線方程式,並且在散佈圖上畫出這一條直線。

■解答

解方程式需要的數字有 n=6,∑x=153.8,∑y=18.7,∑xy=682.77 和 ∑x2 =5859.26。代入公式

你會得到

Section 10–2 Regression 553

10–21

Formulas for the Regression Line y � � a � bx

where a is the y� intercept and b is the slope of the line.

b �n��xy� � ��x���y�

n��x2� � ��x�2

a ���y���x2� � ��x���xy�

n��x2� � ��x�2

Example 10–9 Car Rental CompaniesFind the equation of the regression line for the data in Example 10–4, and graph the lineon the scatter plot of the data.

Solution

The values needed for the equation are n � 6, �x � 153.8, �y � 18.7, �xy � 682.77,and �x2 � 5859.26. Substituting in the formulas, you get

Hence, the equation of the regression line y� � a � bx is

y� � 0.396 � 0.106x

To graph the line, select any two points for x and find the corresponding values fory. Use any x values between 10 and 60. For example, let x � 15. Substitute in the equa-tion and find the corresponding y� value.

y� � 0.396� 0.396 � 0.106(15)� 1.986

Let x � 40; then

y� � 0.396 � 0.106x� 0.396 � 0.106(40)� 4.636

Then plot the two points (15, 1.986) and (40, 4.636) and draw a line connecting the twopoints. See Figure 10–14.

Note: When you draw the regression line, it is sometimes necessary to truncate thegraph (see Chapter 2). This is done when the distance between the origin and the firstlabeled coordinate on the x axis is not the same as the distance between the rest of the

b �n��xy� � ��x���y�

n��x2� � ��x�2 �6�682.77� � �153.8��18.7�

�6��5859.26� � �153.8�2 � 0.106

a ���y���x2� � ��x���xy�

n��x2� � ��x�2 ��18.7��5859.26� � �153.8��682.77�

�6��5859.26� � �153.8�2 � 0.396

Rounding Rule for the Intercept and Slope Round the values of a and b tothree decimal places.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 553

因此,迴歸線 y′ =a+bx 的方程式是

Section 10–2 Regression 553

10–21

Formulas for the Regression Line y � � a � bx

where a is the y� intercept and b is the slope of the line.

b �n��xy� � ��x���y�

n��x2� � ��x�2

a ���y���x2� � ��x���xy�

n��x2� � ��x�2

Example 10–9 Car Rental CompaniesFind the equation of the regression line for the data in Example 10–4, and graph the lineon the scatter plot of the data.

Solution

The values needed for the equation are n � 6, �x � 153.8, �y � 18.7, �xy � 682.77,and �x2 � 5859.26. Substituting in the formulas, you get

Hence, the equation of the regression line y� � a � bx is

y� � 0.396 � 0.106x

To graph the line, select any two points for x and find the corresponding values fory. Use any x values between 10 and 60. For example, let x � 15. Substitute in the equa-tion and find the corresponding y� value.

y� � 0.396� 0.396 � 0.106(15)� 1.986

Let x � 40; then

y� � 0.396 � 0.106x� 0.396 � 0.106(40)� 4.636

Then plot the two points (15, 1.986) and (40, 4.636) and draw a line connecting the twopoints. See Figure 10–14.

Note: When you draw the regression line, it is sometimes necessary to truncate thegraph (see Chapter 2). This is done when the distance between the origin and the firstlabeled coordinate on the x axis is not the same as the distance between the rest of the

b �n��xy� � ��x���y�

n��x2� � ��x�2 �6�682.77� � �153.8��18.7�

�6��5859.26� � �153.8�2 � 0.106

a ���y���x2� � ��x���xy�

n��x2� � ��x�2 ��18.7��5859.26� � �153.8��682.77�

�6��5859.26� � �153.8�2 � 0.396

Rounding Rule for the Intercept and Slope Round the values of a and b tothree decimal places.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 553

為了畫這一條迴歸線,挑選兩個 x,求出它們對應的 y。使用介於 10 和 60 之間的 x 值。比如

說,令 x=15,代入迴歸方程式求出 y ′ 值。

Section 10–2 Regression 553

10–21

Formulas for the Regression Line y � � a � bx

where a is the y� intercept and b is the slope of the line.

b �n��xy� � ��x���y�

n��x2� � ��x�2

a ���y���x2� � ��x���xy�

n��x2� � ��x�2

Example 10–9 Car Rental CompaniesFind the equation of the regression line for the data in Example 10–4, and graph the lineon the scatter plot of the data.

Solution

The values needed for the equation are n � 6, �x � 153.8, �y � 18.7, �xy � 682.77,and �x2 � 5859.26. Substituting in the formulas, you get

Hence, the equation of the regression line y� � a � bx is

y� � 0.396 � 0.106x

To graph the line, select any two points for x and find the corresponding values fory. Use any x values between 10 and 60. For example, let x � 15. Substitute in the equa-tion and find the corresponding y� value.

y� � 0.396� 0.396 � 0.106(15)� 1.986

Let x � 40; then

y� � 0.396 � 0.106x� 0.396 � 0.106(40)� 4.636

Then plot the two points (15, 1.986) and (40, 4.636) and draw a line connecting the twopoints. See Figure 10–14.

Note: When you draw the regression line, it is sometimes necessary to truncate thegraph (see Chapter 2). This is done when the distance between the origin and the firstlabeled coordinate on the x axis is not the same as the distance between the rest of the

b �n��xy� � ��x���y�

n��x2� � ��x�2 �6�682.77� � �153.8��18.7�

�6��5859.26� � �153.8�2 � 0.106

a ���y���x2� � ��x���xy�

n��x2� � ��x�2 ��18.7��5859.26� � �153.8��682.77�

�6��5859.26� � �153.8�2 � 0.396

Rounding Rule for the Intercept and Slope Round the values of a and b tothree decimal places.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 553

令 x= 40;然後

Section 10–2 Regression 553

10–21

Formulas for the Regression Line y � � a � bx

where a is the y� intercept and b is the slope of the line.

b �n��xy� � ��x���y�

n��x2� � ��x�2

a ���y���x2� � ��x���xy�

n��x2� � ��x�2

Example 10–9 Car Rental CompaniesFind the equation of the regression line for the data in Example 10–4, and graph the lineon the scatter plot of the data.

Solution

The values needed for the equation are n � 6, �x � 153.8, �y � 18.7, �xy � 682.77,and �x2 � 5859.26. Substituting in the formulas, you get

Hence, the equation of the regression line y� � a � bx is

y� � 0.396 � 0.106x

To graph the line, select any two points for x and find the corresponding values fory. Use any x values between 10 and 60. For example, let x � 15. Substitute in the equa-tion and find the corresponding y� value.

y� � 0.396� 0.396 � 0.106(15)� 1.986

Let x � 40; then

y� � 0.396 � 0.106x� 0.396 � 0.106(40)� 4.636

Then plot the two points (15, 1.986) and (40, 4.636) and draw a line connecting the twopoints. See Figure 10–14.

Note: When you draw the regression line, it is sometimes necessary to truncate thegraph (see Chapter 2). This is done when the distance between the origin and the firstlabeled coordinate on the x axis is not the same as the distance between the rest of the

b �n��xy� � ��x���y�

n��x2� � ��x�2 �6�682.77� � �153.8��18.7�

�6��5859.26� � �153.8�2 � 0.106

a ���y���x2� � ��x���xy�

n��x2� � ��x�2 ��18.7��5859.26� � �153.8��682.77�

�6��5859.26� � �153.8�2 � 0.396

Rounding Rule for the Intercept and Slope Round the values of a and b tothree decimal places.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 553

接著在散佈圖上描上這兩點 (15, 1.986) 和 (40, 4.636),接著把這兩點用一條直線連起來。見

圖 10-14。

租車公司例題 10-9

例題 10-9 的迴歸線

圖 10-14

Reve

nue

(bill

ions

)

7.75

6.50

5.25

4.00

2.75

1.50

y

x

8.5

Cars (in 10,000s)

17.5 26.5 35.5 44.5 53.5 62.5

y9 = 0.396 + 0.106x

車輛數(萬輛)

收益︵十億美元︶

Page 22: Simple Linear Regression from Blumen in Chinese

統計學

488

注意:當你繪製迴歸線的時候,如果原點與 x 軸第一個標示的座標距離和 x 軸其他座標之間

的距離不一樣,或是原點與 y ′軸第一個標示的座標的距離和 y ′ 軸其他座標之間的距離不一樣,

這時候必須截斷圖形(請參考第 2 章)。當 x 軸或是 y ′ 軸被截斷的時候,不要用 y ′ 截距的數值

繪製迴歸線。當你繪製迴歸線的時候,永遠要挑選最小 x 值和最大 x 值之間的數字。

求出例題 10-5 數據的迴歸線方程式,並且在散佈圖上畫出這一條迴歸線。

■解答

解方程式需要的數字有 n= 7,∑x= 57,∑y=511,∑xy=3745 和 ∑x2 =579。代入公式你會得到

554 Chapter 10 Correlation and Regression

10–22

Historical Note

In 1795, Adrien-MarieLegendre (1752–1833)measured the meridianarc on the earth’ssurface fromBarcelona, Spain, toDunkirk, England. Thismeasure was used asthe basis for themeasure of the meter.Legendre developedthe least-squaresmethod around theyear 1805.

Example 10–10 Absences and Final GradesFind the equation of the regression line for the data in Example 10–5, and graph the lineon the scatter plot.

Solution

The values needed for the equation are n � 7, �x � 57, �y � 511, �xy � 3745, and�x2 � 579. Substituting in the formulas, you get

Hence, the equation of the regression line y� � a � bx is

y� � 102.493 � 3.622x

The graph of the line is shown in Figure 10–15.

The sign of the correlation coefficient and the sign of the slope of the regression linewill always be the same. That is, if r is positive, then b will be positive; if r is negative,then b will be negative. The reason is that the numerators of the formulas are the sameand determine the signs of r and b, and the denominators are always positive. The regres-sion line will always pass through the point whose x coordinate is the mean of the x val-ues and whose y coordinate is the mean of the y values, that is, ( , ).yx

b �n��xy� � ��x���y�

n��x2� � ��x�2 ��7��3745� � �57��511�

�7��579� � �57�2 � �3.622

a ���y���x2� � ��x���xy�

n��x2� � ��x�2 ��511��579� � �57��3745�

�7��579� � �57�2 � 102.493

Reve

nue

(bill

ions

)

7.75

6.50

5.25

4.00

2.75

1.50

y

x

8.5

Cars (in 10,000s)

17.5 26.5 35.5 44.5 53.5 62.5

y� = 0.396 + 0.106x

Figure 10–14

Regression Line forExample 10–9

labeled x coordinates or the distance between the origin and the first labeled y�coordinate is not the same as the distance between the other labeled y� coordinates.When the x axis or the y axis has been truncated, do not use the y� intercept value tograph the line. When you graph the regression line, always select x values between thesmallest x data value and the largest x data value.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 554

因此,迴歸線 y′ =a+bx 的方程式是

554 Chapter 10 Correlation and Regression

10–22

Historical Note

In 1795, Adrien-MarieLegendre (1752–1833)measured the meridianarc on the earth’ssurface fromBarcelona, Spain, toDunkirk, England. Thismeasure was used asthe basis for themeasure of the meter.Legendre developedthe least-squaresmethod around theyear 1805.

Example 10–10 Absences and Final GradesFind the equation of the regression line for the data in Example 10–5, and graph the lineon the scatter plot.

Solution

The values needed for the equation are n � 7, �x � 57, �y � 511, �xy � 3745, and�x2 � 579. Substituting in the formulas, you get

Hence, the equation of the regression line y� � a � bx is

y� � 102.493 � 3.622x

The graph of the line is shown in Figure 10–15.

The sign of the correlation coefficient and the sign of the slope of the regression linewill always be the same. That is, if r is positive, then b will be positive; if r is negative,then b will be negative. The reason is that the numerators of the formulas are the sameand determine the signs of r and b, and the denominators are always positive. The regres-sion line will always pass through the point whose x coordinate is the mean of the x val-ues and whose y coordinate is the mean of the y values, that is, ( , ).yx

b �n��xy� � ��x���y�

n��x2� � ��x�2 ��7��3745� � �57��511�

�7��579� � �57�2 � �3.622

a ���y���x2� � ��x���xy�

n��x2� � ��x�2 ��511��579� � �57��3745�

�7��579� � �57�2 � 102.493

Reve

nue

(bill

ions

)

7.75

6.50

5.25

4.00

2.75

1.50

y

x

8.5

Cars (in 10,000s)

17.5 26.5 35.5 44.5 53.5 62.5

y� = 0.396 + 0.106x

Figure 10–14

Regression Line forExample 10–9

labeled x coordinates or the distance between the origin and the first labeled y�coordinate is not the same as the distance between the other labeled y� coordinates.When the x axis or the y axis has been truncated, do not use the y� intercept value tograph the line. When you graph the regression line, always select x values between thesmallest x data value and the largest x data value.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 554

迴歸線的圖形顯示在圖 10-15。

缺席與期末成績例題 10-10

圖 10-15

例題 10-10 的迴歸線

100

90

80

70

60

50

40

30

0

Fina

l gra

de

y9

x

y9 = 102.493 – 3.622x

5

Number of absences

10 15

缺席次數

期末成績

相關係數的正負號和迴歸線斜率的正負號總是一樣的。也就是說,如果

r 是正的數字,則 b 也會是正的數字;如果 r 是負的數字,則 b 也會是負的數

字。理由是前後兩項公式的分子是一樣的,而且分子決定最後答案的正負號,

Page 23: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

489

因為分母總是正的。另外,迴歸線總是會通過 x 值的平均數和 y 值的平均數合

起來的那一點,也就是點 (x, y)。

迴歸線可以用來預測依變數。預測的方法提示在例題 10-11。

使用迴歸線的方程式預測租車公司有 200,000 部車時的收益。

■解答

因為 x 值以萬輛為單位,把 200,000 除以 10,000 得到 20,然後把 20 代入方程式的 x。

Section 10–2 Regression 555

10–23

The regression line can be used to make predictions for the dependent variable. Themethod for making predictions is shown in Example 10–11.

Example 10–11 Car Rental CompaniesUse the equation of the regression line to predict the income of a car rental agency thathas 200,000 automobiles.

Solution

Since the x values are in 10,000s, divide 200,000 by 10,000 to get 20, and thensubstitute 20 for x in the equation.

y� � 0.396 � 0.106x� 0.396 � 0.106(20)� 2.516

Hence, when a rental agency has 200,000 automobiles, its revenue will be approximately$2.516 billion.

The value obtained in Example 10–11 is a point prediction, and with point predic-tions, no degree of accuracy or confidence can be determined. More information onprediction is given in Section 10–3.

The magnitude of the change in one variable when the other variable changes exactly1 unit is called a marginal change. The value of slope b of the regression line equationrepresents the marginal change. For example, in Example 10–9 the slope of the regres-sion line is 0.106, which means for each increase of 10,000 cars, the value of y changes0.106 unit ($106 million) on average.

When r is not significantly different from 0, the best predictor of y is the mean of thedata values of y. For valid predictions, the value of the correlation coefficient must besignificant. Also, two other assumptions must be met.

100

90

80

70

60

50

40

30

0

Fina

l gra

de

y�

x

y� = 102.493 – 3.622x

5

Number of absences

10 15

Figure 10–15

Regression Line forExample 10–10

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 555

因此,當某家租車公司有 200,000 部車,它的收益大概會是 2.516 十億美元。

租車公司例題 10-11

在例題 10-11 中得到的數字是一種點預測,而且根據點預測無法得知準確

度也無法決定信賴程度。更多的預測資訊會在第 10-3 節討論。

當其他變數改變一個單位引起某變數改變多少,叫做邊際變化 (marginal

change)。迴歸線方程式的斜率 b 就代表邊際變化。比如說,在例題 10-9,迴

歸線的斜率是 0.106,這意味著每增加 1000 部車,平均而言會增加 0.106 個單

位的收益,也就是 1.06 億美元。

當 r 不是顯著地不為 0,則 y 的最佳預測是 y 的平均數。為了正確預測,

相關係數的值必須是顯著的,同時必須滿足三項假設。

正確迴歸預測的假設

1. 樣本是隨機樣本。

2. 針對獨立變數 x 的任意特定值,依變數 y 的值必須是依著迴歸線呈現常態分

配的。詳見圖 10-16(a)。

3. 針對獨立變數 x 的任意特定值,每一個依變數的標準差必須一致。詳見圖

10-16(b)。

外插 (extrapolation),或是超過數據範圍進行預測,必須謹慎解釋。比如

說,在 1979 年,有一些專家預測美國會在 2003 年之前用盡石油。這一項預測

是根據當時的石油用量和已知的石油存量。但是,自從那時候起,汽車工業開

始生產節省能源的車子,同時還有許多未被發現的石油礦場。最後科學家說不

Page 24: Simple Linear Regression from Blumen in Chinese

統計學

490

定會在哪一天發明一種只需使用花生油的車。另外,每加侖原油價格曾被預測

幾年後會漲到 10 美元。幸運地這件事沒發生。記得,進行預測的時候,要根

據現在的情況或是根據趨勢不會變的信心。未來也不一定有能力證實或是反證

這一項假設。

求出相關係數與迴歸線方程式的步驟摘要在以下的程序表。

圖 10-16

預測的假設

y

y

xx

mx

y ’s

(a) Dependent variable y normally distributed

y y

xmx1

x1 x2 xn

mx2

mxn

y = a + bx

(b) s1 = s2 = . . . = sn(a) 依變數 y 呈現常態分配 (b)

步驟 1 製作一張表格,如步驟 2 所示。

步驟 2 求出 xy, x2 和 y2。將這些值放在適當的行內並且加總每一行。

Extrapolation, or making predictions beyond the bounds of the data, must be inter-preted cautiously. For example, in 1979, some experts predicted that the United Stateswould run out of oil by the year 2003. This prediction was based on the current con-sumption and on known oil reserves at that time. However, since then, the automobileindustry has produced many new fuel-efficient vehicles. Also, there are many as yetundiscovered oil fields. Finally, science may someday discover a way to run a car onsomething as unlikely but as common as peanut oil. In addition, the price of a gallon ofgasoline was predicted to reach $10 a few years later. Fortunately this has not come topass. Remember that when predictions are made, they are based on present conditions oron the premise that present trends will continue. This assumption may or may not provetrue in the future.

The steps for finding the value of the correlation coefficient and the regression lineequation are summarized in this Procedure Table:

556 Chapter 10 Correlation and Regression

10–24

Assumptions for Valid Predictions in Regression

1. The sample is a random sample.2. For any specific value of the independent variable x, the value of the dependent variable

y must be normally distributed about the regression line. See Figure 10–16(a).3. The standard deviation of each of the dependent variables must be the same for each

value of the independent variable. See Figure 10–16(b).

y

y

xx

�x

y ’s

(a) Dependent variable y normally distributed

y y

x�x1

x1 x2 xn

�x2

�xn

y = a + bx

(b) �1 = �2 = . . . = �n

Figure 10–16

Assumptions for Predictions

Procedure Table

Finding the Correlation Coefficient and the Regression Line EquationStep 1 Make a table, as shown in step 2.

Step 2 Find the values of xy, x2, and y2. Place them in the appropriate columns and sumeach column.

x y xy x2 y2

� � � � �� � � � �� � � � �

�x � �y � �xy � �x2 � �y2 �

Interesting Fact

It is estimated thatwearing a motorcyclehelmet reduces therisk of a fatal accidentby 30%.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 556

步驟 3 代入公式求出 r 值。

Section 10–2 Regression 557

10–25

A scatter plot should be checked for outliers. An outlier is a point that seems out ofplace when compared with the other points (see Chapter 3). Some of these points can affectthe equation of the regression line. When this happens, the points are called influentialpoints or influential observations.

When a point on the scatter plot appears to be an outlier, it should be checked to seeif it is an influential point. An influential point tends to “pull” the regression line towardthe point itself. To check for an influential point, the regression line should be graphedwith the point included in the data set. Then a second regression line should be graphedthat excludes the point from the data set. If the position of the second line is changed con-siderably, the point is said to be an influential point. Points that are outliers in the x direc-tion tend to be influential points.

Researchers should use their judgment as to whether to include influential observa-tions in the final analysis of the data. If the researcher feels that the observation is notnecessary, then it should be excluded so that it does not influence the results of the study.However, if the researcher feels that it is necessary, then he or she may want to obtainadditional data values whose x values are near the x value of the influential point and theninclude them in the study.

© Dave Carpenter. King Features Syndicate.

Explain that to me. ”

Procedure Table (Continued )

Step 3 Substitute in the formula to find the value of r.

Step 4 When r is significant, substitute in the formulas to find the values of a and b for theregression line equation y� � a � bx.

b �n��xy� � ��x���y�

n��x 2� � ��x�2a ���y���x 2� � ��x���xy�

n��x 2� � ��x�2

r �n��xy� � ��x���y�

2[n��x 2� � ��x�2][n��y2� � ��y�2]

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 557

步驟 4 當 r 是顯著的,代入公式為迴歸方程式 y′=a+bx 求出 a 和 b 的數值。

程序表  求出相關係數與迴歸線方程式

散佈圖應該被用來檢查離群值。離群值是那些看起來和其他點處在不一

樣位置的點(請參考第 3 章)。這種點的某一些可能會影響到迴歸線的方程

式。如果的確如此,則這樣的點叫做影響點 (influential points) 或是影響觀察值

(influential observations)。

根據估計,戴安全

帽可以降低 30% 車禍的致命風險。

趣 聞

Page 25: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

491

當散佈圖上某一點可能是離群值,應該要檢查它是不是影響

點。影響點會把迴歸線拉往它的方向。為了檢查是否為影響點,我

們會先用該點繪製迴歸線,接著去掉那一點之後再繪製一次迴歸

線。如果這兩條線的位置改變不少,則這一個離群值就會被認為是

影響點。在 x 方向的離群值比較容易成為影響點。

研究員必須使用專業判斷,決定是否應該在最後分析的時候加

入影響點。如果研究員覺得不需要這一項觀察值,則剔除它,讓它

不影響研究的最後結果。不過,如果研究員覺得有必要,則他可能

希望取得那些靠近影響點 x 值的數據,然後把它們加入研究的數據

分析。

Section 10–2 Regression 557

10–25

A scatter plot should be checked for outliers. An outlier is a point that seems out ofplace when compared with the other points (see Chapter 3). Some of these points can affectthe equation of the regression line. When this happens, the points are called influentialpoints or influential observations.

When a point on the scatter plot appears to be an outlier, it should be checked to seeif it is an influential point. An influential point tends to “pull” the regression line towardthe point itself. To check for an influential point, the regression line should be graphedwith the point included in the data set. Then a second regression line should be graphedthat excludes the point from the data set. If the position of the second line is changed con-siderably, the point is said to be an influential point. Points that are outliers in the x direc-tion tend to be influential points.

Researchers should use their judgment as to whether to include influential observa-tions in the final analysis of the data. If the researcher feels that the observation is notnecessary, then it should be excluded so that it does not influence the results of the study.However, if the researcher feels that it is necessary, then he or she may want to obtainadditional data values whose x values are near the x value of the influential point and theninclude them in the study.

© Dave Carpenter. King Features Syndicate.

Explain that to me. ”

Procedure Table (Continued )

Step 3 Substitute in the formula to find the value of r.

Step 4 When r is significant, substitute in the formulas to find the values of a and b for theregression line equation y� � a � bx.

b �n��xy� � ��x���y�

n��x 2� � ��x�2a ���y���x 2� � ��x���xy�

n��x 2� � ��x�2

r �n��xy� � ��x���y�

2[n��x 2� � ��x�2][n��y2� � ��y�2]

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 557

觀念應用 10-2

再次探討剎車距離

在一項速度與煞車距離的研究,研究員透過測量煞車痕跡尋求一種方法估

計車禍前人們的車速。這項研究重視的是在不同速度下需要多少距離才能讓車

子完全停下來。使用以下的表格回答問題。

Applying the Concepts 10–2

Stopping Distances RevisitedIn a study on speed and braking distance, researchers looked for a method to estimate how fasta person was traveling before an accident by measuring the length of the skid marks. An areathat was focused on in the study was the distance required to completely stop a vehicle atvarious speeds. Use the following table to answer the questions.

MPH

20 2030 4540 8150 13360 20580 411

Assume MPH is going to be used to predict stopping distance.

1. Find the linear regression equation.

2. What does the slope tell you about MPH and the braking distance? How about the y intercept?

3. Find the braking distance when MPH � 45.

4. Find the braking distance when MPH � 100.

5. Comment on predicting beyond the given data values.

See page 590 for the answers.

558 Chapter 10 Correlation and Regression

10–26

1. What two things should be done before one performs aregression analysis?

2. What are the assumptions for regression analysis?

3. What is the general form for the regression line used instatistics? y� � a � bx

4. What is the symbol for the slope? For the y intercept? b, a

5. What is meant by the line of best fit?

6. When all the points fall on the regression line, what is thevalue of the correlation coefficient? r would equal �1 or �1.

7. What is the relationship between the sign of thecorrelation coefficient and the sign of the slope of theregression line? When r is positive, b will be positive. Whenr is negative, b will be negative.

8. As the value of the correlation coefficient increasesfrom 0 to 1, or decreases from 0 to �1, how do thepoints of the scatter plot fit the regression line? Theywould be clustered closer to the line.

9. How is the value of the correlation coefficient relatedto the accuracy of the predicted value for a specific valueof x? The closer r is to �1 or �1, the more accurate the predictedvalue will be.

10. If the value of r is not significant, what can be saidabout the regression line?

11. When the value of r is not significant, what valueshould be used to predict y? When r is not significant, themean of the y values should be used to predict y.

For Exercises 12 through 27, use the same data as forthe corresponding exercises in Section 10–1. For eachexercise, find the equation of the regression line and findthe y� value for the specified x value. Remember that noregression should be done when r is not significant.

12. Gas Tax and Fuel Use The gas tax and fuel use areshown.

Tax 21.5 23 18 24.5 26.4 19

Usage 1062 631 920 686 736 684

Find y� when x � $0.25. Not significant so no regressionshould be done.

13. Commercial Movie Releases New movie releases perstudio and gross receipts are as follows:

No. of releases 361 270 306 22 35 10 8 12 21

Gross receipts(million $) 3844 1962 1371 1064 334 241 188 154 125

Find y� when x � 200 new releases. y� � 181.661 � 7.319x;y� � 1645.5 (million $)

Exercises 10–2

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 558

煞車距離(英呎)

假設 MPH 是被用來預測煞車距離。

1. 求出迴歸線方程式。

2. 由迴歸線斜率可知道哪些關於 MPH 和煞車距離之間的事?

3. 當 MPH =45 的時候,煞車距離是多少?

4. 當 MPH =100 的時候,煞車距離是多少?

5. 評論在數據範圍外的預測。

答案在第 509 頁。

Page 26: Simple Linear Regression from Blumen in Chinese

統計學

492

1. 迴歸分析之前要先做哪兩件事?

2. 統計學一般用的迴歸線是什麼形式?

3. 何謂最適線?

4. 相關係數的正負號和迴歸線斜率的正負號有

什麼關係?

5. 相關係數與特定 x 之預測值的準確度之間有

什麼樣的關係?

6. 當相關係數 r 不顯著的時候,我們用什麼預

測 y?

針對練習題 7 到 14,使用第 10-1 節對應練習題

的數據。對於每一道練習題,求出迴歸線方程

式,以及特定 x 的 y∙ 值。記住,當 r 不顯著的

時候,不應該有迴歸。

7. 商業電影 新電影在每一家電影院的上映次

數與總收入。

Applying the Concepts 10–2

Stopping Distances RevisitedIn a study on speed and braking distance, researchers looked for a method to estimate how fasta person was traveling before an accident by measuring the length of the skid marks. An areathat was focused on in the study was the distance required to completely stop a vehicle atvarious speeds. Use the following table to answer the questions.

MPH

20 2030 4540 8150 13360 20580 411

Assume MPH is going to be used to predict stopping distance.

1. Find the linear regression equation.

2. What does the slope tell you about MPH and the braking distance? How about the y intercept?

3. Find the braking distance when MPH � 45.

4. Find the braking distance when MPH � 100.

5. Comment on predicting beyond the given data values.

See page 590 for the answers.

558 Chapter 10 Correlation and Regression

10–26

1. What two things should be done before one performs aregression analysis?

2. What are the assumptions for regression analysis?

3. What is the general form for the regression line used instatistics? y� � a � bx

4. What is the symbol for the slope? For the y intercept? b, a

5. What is meant by the line of best fit?

6. When all the points fall on the regression line, what is thevalue of the correlation coefficient? r would equal �1 or �1.

7. What is the relationship between the sign of thecorrelation coefficient and the sign of the slope of theregression line? When r is positive, b will be positive. Whenr is negative, b will be negative.

8. As the value of the correlation coefficient increasesfrom 0 to 1, or decreases from 0 to �1, how do thepoints of the scatter plot fit the regression line? Theywould be clustered closer to the line.

9. How is the value of the correlation coefficient relatedto the accuracy of the predicted value for a specific valueof x? The closer r is to �1 or �1, the more accurate the predictedvalue will be.

10. If the value of r is not significant, what can be saidabout the regression line?

11. When the value of r is not significant, what valueshould be used to predict y? When r is not significant, themean of the y values should be used to predict y.

For Exercises 12 through 27, use the same data as forthe corresponding exercises in Section 10–1. For eachexercise, find the equation of the regression line and findthe y� value for the specified x value. Remember that noregression should be done when r is not significant.

12. Gas Tax and Fuel Use The gas tax and fuel use areshown.

Tax 21.5 23 18 24.5 26.4 19

Usage 1062 631 920 686 736 684

Find y� when x � $0.25. Not significant so no regressionshould be done.

13. Commercial Movie Releases New movie releases perstudio and gross receipts are as follows:

No. of releases 361 270 306 22 35 10 8 12 21

Gross receipts(million $) 3844 1962 1371 1064 334 241 188 154 125

Find y� when x � 200 new releases. y� � 181.661 � 7.319x;y� � 1645.5 (million $)

Exercises 10–2

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 558

上映次數 x

總收入 y (百萬美元)

當 x=200,求出 y′。 8. 校友捐款 校友捐款和畢業年數的數據如下

所示。

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

畢業年數 x

捐款 y

當 x=4,求出 y′。 9. 學區與高中 學區的個數和它有幾所高中的

數據如下所示。

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

學區 x

高中 y

當 x=70,求出 y′。10. 蛋的產量 蛋的產量和每一打的價格的數據

如下所示。

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

蛋的產量 x (百萬)

每打價格 y (美元)

當 x=1600,求出 y′。11. 教職員和學生 一組小型大學的隨機樣本的

教職員人數與學生數如下所示。

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

教職員人數 x

學生數 y

現在求出調換 x 和 y 的迴歸線方程式。

12. 平均溫度與降雨量 平均每日常溫(以華氏

計)和平均每月降雨量(以英吋計)如下所

示。

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

平均每日常溫 x

平均每月降雨量 y

當 x=70,求出 y′。13. 脂肪 脂肪熱量(以卡路里計)和飽和脂肪

重量(以公克計)如下所示。

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

脂肪熱量 x

飽和脂肪重量 y

當 x=400,求出 y′。14. 醫院病床 許可床位數和工作人員床位數如

下所示。

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

Section 10–2 Regression 559

10–27

14. Forest Fires and Acres Burned Number of fires andnumber of acres burned are as follows:

Fires x 72 69 58 47 84 62 57 45

Acres y 62 41 19 26 51 15 30 15

Find y� when x � 60. y� � �31.46 � 1.036x; 30.7

15. Years and contribution data are as follows:

Years x 1 5 3 10 7 6

Contribution y, $ 500 100 300 50 75 80

Find y� when x � 4 years. y� � 453.176 � 50.439x; 251.42

16. State Debt and Per Capita Taxes Data for per capitastate debt and per capita state tax are as follows:

Per capita debt 1924 907 1445 1608 661

Per capita tax 1685 1838 1734 1842 1317

Find y� when x � $1500 in per capita debt.Not significant so no regression should be done.

17. School Districts and Secondary Schools The numberof school districts and the number of secondary schoolsin the district are shown.

School districts 53 19 24 17 95 68

Secondary schools 50 27 187 84 143 216

Find y� when x � 70. Since r is not significant, no regressionshould be done.

18. Triples and Home Runs The number of triples and thenumber of home runs obtained by a selected sample ofMLB players are shown.

Triples 25 23 51 19 20 43

Home runs 212 199 144 160 149 122

Find y� when x � 33. Since r is not significant, no regressionshould be done.

19. Egg Production Number of eggs and price per dozenare shown.

No. of eggs (million) 957 1332 1163 1865 119 273

Price perdozen ($) 0.770 0.697 0.617 0.652 1.080 1.420

Find y� when x � 1600 million eggs.y� � 1.252 � 0.000398x; y� � 0.615 per dozen

20. Emergency Calls and Temperature Temperature indegrees Fahrenheit and number of emergency calls areshown.

Temperature x 68 74 82 88 93 99 101

No. of calls y 7 4 8 10 11 9 13

Find y� when x � 80�F. y� � �7.544 � 0.190x; 7.656, or 8 calls

21. Faculty and Students The number of faculty and thenumber of students in a random selection of smallcolleges are shown.

Faculty 99 110 113 116 138 174 220

Students 1353 1290 1091 1213 1384 1283 2075

Now find the equation of the regression line when x and yare interchanged. y� � �14.974 � 0.111x

22. Precipitation and Snowfall/Sleet The number of daysof precipitation and snowfall/sleet are shown.

Precipitation 61 111 140 116 88 136

Snow/sleet 2 15 21 8 11 13

Find y� when x � 100 days.y� � �7.327 � 0.175x; 10.173 in

23. Average Temperature and Precipitation Temperatures(in degrees Fahrenheit) and precipitation (in inches) areas follows:

Avg. daily temp. x 86 81 83 89 80 74 64

Avg. mo. precip. y 3.4 1.8 3.5 3.6 3.7 1.5 0.2

Find y� when x � 70�F. y� � �8.994 � 0.1448x; 1.1

24. NHL Assists and Total Points The number of assistsand the total number of points for a sample of NHLscoring leaders are shown.

Assists 26 29 32 34 36 37 40

Total points 48 68 66 69 76 67 84

Find y� when x � 30 assists. y� � 2.693 � 1.962x; 62

25. Fat Calories and Fat Grams The number of fatcalories and the number of saturated fat grams for arandom selection of breakfast entrees are shown.

Fat calories 190 220 270 360 460 540

Sat. fat (g) 9 8 13 17 23 27

Find y� when x � 400 fat calories. y� � �2.417 � 0.055x;19.6 grams

26. Tall Buildings Stories and heights of buildings datafollow:

Stories x 64 54 40 31 45 38 42 41 37 40

Heights y 841 725 635 616 615 582 535 520 511 485

Find y� when x � 44. y� � 206.399 � 9.262x; 613.9

27. Hospital Beds Licensed beds and staffed beds datafollow:

Licensed beds x 144 32 175 185 208 100 169

Staffed beds y 112 32 162 141 103 80 118

Find y� when x � 44. y� � 22.659 � 0.582x; 48.267

For Exercises 28 through 33, do a complete regressionanalysis by performing these steps.

a. Draw a scatter plot.b. Compute the correlation coefficient.c. State the hypotheses.d. Test the hypotheses at a� 0.05. Use Table I.e. Determine the regression line equation.f. Plot the regression line on the scatter plot.g. Summarize the results.

28. Fireworks and Injuries These data were obtainedfor the years 1993 through 1998 and indicate the number

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 559

許可床位數 x

工作人員床位數 y

當 x=44,求出 y′。

針對練習題 15 到 17,透過執行以下每一步驟完

成迴歸分析。

a. 繪製一張散佈圖。

b. 計算相關係數。

c. 陳述假設。

d. 使用表 H 在 α =0.05 之下檢定假設。

e. 決定迴歸線方程式。

f. 在散佈圖上畫出迴歸線。

g. 摘要結果。

練習題 10-2

Page 27: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

493

15. 農莊面積 州的農莊個數與每個農莊的面積

之間有某種關係嗎?由一組跨郡的隨機樣

本,包含西部和東部,產生以下的結果。可

以認為這兩個變數之間有某種關係嗎?

資料來源:World Almanac.

of fireworks (in millions) used and the related injuries.Predict the number of injuries if 100 million fireworksare used during a given year.

Fireworksin use x 67.6 87.1 117 115 118 113

Relatedinjuries y 12,100 12,600 12,500 10,900 7800 7000

Source: National Council of Fireworks Safety, American Pyrotechnic Assoc.

29. Farm Acreage Is there a relationship between thenumber of farms in a state and the acreage per farm?A random selection of states across the country, botheastern and western, produced the following results. Cana relationship between these two variables be concluded?

No. of farms (thousands) x 77 52 20.8 49 28 58.2

Acreage per farm y 347 173 173 218 246 132

Source: World Almanac.

30. SAT Scores Educational researchers desired to find outif a relationship exists between the average SAT verbalscore and the average SAT mathematical score. Severalstates were randomly selected, and their SAT averagescores are recorded below. Is there sufficient evidence toconclude a relationship between the two scores?

Verbal x 526 504 594 585 503 589

Math y 530 522 606 588 517 589

Source: World Almanac.

31. Coal Production These data were obtained froma sample of counties in southwestern Pennsylvania

and indicate the number (in thousands) of tons ofbituminous coal produced in each county and thenumber of employees working in coal production ineach county. Predict the amount of coal produced for acounty that has 500 employees.

No. of employees x 110 731 1031 20 118 1162 103 752

Tons y 227 5410 5328 147 729 8095 635 6157

32. Television Viewers A television executive selects10 television shows and compares the average number

of viewers the show had last year with the averagenumber of viewers this year. The data (in millions) areshown. Describe the relationship.

Viewers last year x 26.6 17.85 20.3 16.8 20.8

Viewers this year y 28.9 19.2 26.4 13.7 20.2

Viewers last year x 16.7 19.1 18.9 16.0 15.8

Viewers this year y 18.8 25.0 21.0 16.8 15.3Source: Nielsen Media Research.

33. Absences and Final Grades An educator wants to seehow the number of absences for a student in her classaffects the student’s final grade. The data obtained froma sample are shown.

No. of absences x 10 12 2 0 8 5

Final grade y 70 65 96 94 75 82

For Exercises 34 and 35, do a complete regressionanalysis and test the significance of r at A� 0.05, usingthe P-value method.

34. Father’s and Son’s Weights A physician wishesto know whether there is a relationship between a

father’s weight (in pounds) and his newborn son’sweight (in pounds). The data are given here.

Father’s weight x 176 160 187 210 196 142 205 215

Son’s weight y 6.6 8.2 9.2 7.1 8.8 9.3 7.4 8.6

35. Age and Net Worth Is a person’s age related tohis or her net worth? A sample of 10 billionaires is

selected, and the person’s age and net worth arecompared. The data are given here.

Age x 56 39 42 60 84 37 68 66 73 55

Net worth(billion $) y 18 14 12 14 11 10 10 7 7 5Source: The Associated Press.

560 Chapter 10 Correlation and Regression

10–28

36. For Exercises 13, 15, and 21 in Section 10–1, find themean of the x and y variables. Then substitute the meanof the x variable into the corresponding regression lineequations found in Exercises 13, 15, and 21 in thissection and find y�. Compare the value of y� with foreach exercise. Generalize the results.

37. The y intercept value a can also be found by using theequation

a � y � bx

y

Extending the ConceptsVerify this result by using the data in Exercises 15 and 16of Sections 10–1 and 10–2.

38. The value of the correlation coefficient can also befound by using the formula

where sx is the standard deviation of the x values and sy isthe standard deviation of the y values. Verify this resultfor Exercises 18 and 20 of Section 10–1. r � �0.543; 0.812

r �bsx

sy

453.173; regression should not be done.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 560

農莊個數 x

每個農莊的面積 y

16. 煙煤產量 以下數據來自西南賓州的一組郡

的隨機樣本,指出每一郡的煙煤產量(以千

噸計)和每一郡煤業的員工數。如果某一郡

有 500 位員工,預測它會生產多少煙煤?

of fireworks (in millions) used and the related injuries.Predict the number of injuries if 100 million fireworksare used during a given year.

Fireworksin use x 67.6 87.1 117 115 118 113

Relatedinjuries y 12,100 12,600 12,500 10,900 7800 7000

Source: National Council of Fireworks Safety, American Pyrotechnic Assoc.

29. Farm Acreage Is there a relationship between thenumber of farms in a state and the acreage per farm?A random selection of states across the country, botheastern and western, produced the following results. Cana relationship between these two variables be concluded?

No. of farms (thousands) x 77 52 20.8 49 28 58.2

Acreage per farm y 347 173 173 218 246 132

Source: World Almanac.

30. SAT Scores Educational researchers desired to find outif a relationship exists between the average SAT verbalscore and the average SAT mathematical score. Severalstates were randomly selected, and their SAT averagescores are recorded below. Is there sufficient evidence toconclude a relationship between the two scores?

Verbal x 526 504 594 585 503 589

Math y 530 522 606 588 517 589

Source: World Almanac.

31. Coal Production These data were obtained froma sample of counties in southwestern Pennsylvania

and indicate the number (in thousands) of tons ofbituminous coal produced in each county and thenumber of employees working in coal production ineach county. Predict the amount of coal produced for acounty that has 500 employees.

No. of employees x 110 731 1031 20 118 1162 103 752

Tons y 227 5410 5328 147 729 8095 635 6157

32. Television Viewers A television executive selects10 television shows and compares the average number

of viewers the show had last year with the averagenumber of viewers this year. The data (in millions) areshown. Describe the relationship.

Viewers last year x 26.6 17.85 20.3 16.8 20.8

Viewers this year y 28.9 19.2 26.4 13.7 20.2

Viewers last year x 16.7 19.1 18.9 16.0 15.8

Viewers this year y 18.8 25.0 21.0 16.8 15.3Source: Nielsen Media Research.

33. Absences and Final Grades An educator wants to seehow the number of absences for a student in her classaffects the student’s final grade. The data obtained froma sample are shown.

No. of absences x 10 12 2 0 8 5

Final grade y 70 65 96 94 75 82

For Exercises 34 and 35, do a complete regressionanalysis and test the significance of r at A� 0.05, usingthe P-value method.

34. Father’s and Son’s Weights A physician wishesto know whether there is a relationship between a

father’s weight (in pounds) and his newborn son’sweight (in pounds). The data are given here.

Father’s weight x 176 160 187 210 196 142 205 215

Son’s weight y 6.6 8.2 9.2 7.1 8.8 9.3 7.4 8.6

35. Age and Net Worth Is a person’s age related tohis or her net worth? A sample of 10 billionaires is

selected, and the person’s age and net worth arecompared. The data are given here.

Age x 56 39 42 60 84 37 68 66 73 55

Net worth(billion $) y 18 14 12 14 11 10 10 7 7 5Source: The Associated Press.

560 Chapter 10 Correlation and Regression

10–28

36. For Exercises 13, 15, and 21 in Section 10–1, find themean of the x and y variables. Then substitute the meanof the x variable into the corresponding regression lineequations found in Exercises 13, 15, and 21 in thissection and find y�. Compare the value of y� with foreach exercise. Generalize the results.

37. The y intercept value a can also be found by using theequation

a � y � bx

y

Extending the ConceptsVerify this result by using the data in Exercises 15 and 16of Sections 10–1 and 10–2.

38. The value of the correlation coefficient can also befound by using the formula

where sx is the standard deviation of the x values and sy isthe standard deviation of the y values. Verify this resultfor Exercises 18 and 20 of Section 10–1. r � �0.543; 0.812

r �bsx

sy

453.173; regression should not be done.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 560

員工數 x

噸數 y

17. 缺席與期末成績 有一位教師想要知道她班

上學生的缺席次數如何影響期末成績。從一

組樣本得到的數據如下。

of fireworks (in millions) used and the related injuries.Predict the number of injuries if 100 million fireworksare used during a given year.

Fireworksin use x 67.6 87.1 117 115 118 113

Relatedinjuries y 12,100 12,600 12,500 10,900 7800 7000

Source: National Council of Fireworks Safety, American Pyrotechnic Assoc.

29. Farm Acreage Is there a relationship between thenumber of farms in a state and the acreage per farm?A random selection of states across the country, botheastern and western, produced the following results. Cana relationship between these two variables be concluded?

No. of farms (thousands) x 77 52 20.8 49 28 58.2

Acreage per farm y 347 173 173 218 246 132

Source: World Almanac.

30. SAT Scores Educational researchers desired to find outif a relationship exists between the average SAT verbalscore and the average SAT mathematical score. Severalstates were randomly selected, and their SAT averagescores are recorded below. Is there sufficient evidence toconclude a relationship between the two scores?

Verbal x 526 504 594 585 503 589

Math y 530 522 606 588 517 589

Source: World Almanac.

31. Coal Production These data were obtained froma sample of counties in southwestern Pennsylvania

and indicate the number (in thousands) of tons ofbituminous coal produced in each county and thenumber of employees working in coal production ineach county. Predict the amount of coal produced for acounty that has 500 employees.

No. of employees x 110 731 1031 20 118 1162 103 752

Tons y 227 5410 5328 147 729 8095 635 6157

32. Television Viewers A television executive selects10 television shows and compares the average number

of viewers the show had last year with the averagenumber of viewers this year. The data (in millions) areshown. Describe the relationship.

Viewers last year x 26.6 17.85 20.3 16.8 20.8

Viewers this year y 28.9 19.2 26.4 13.7 20.2

Viewers last year x 16.7 19.1 18.9 16.0 15.8

Viewers this year y 18.8 25.0 21.0 16.8 15.3Source: Nielsen Media Research.

33. Absences and Final Grades An educator wants to seehow the number of absences for a student in her classaffects the student’s final grade. The data obtained froma sample are shown.

No. of absences x 10 12 2 0 8 5

Final grade y 70 65 96 94 75 82

For Exercises 34 and 35, do a complete regressionanalysis and test the significance of r at A� 0.05, usingthe P-value method.

34. Father’s and Son’s Weights A physician wishesto know whether there is a relationship between a

father’s weight (in pounds) and his newborn son’sweight (in pounds). The data are given here.

Father’s weight x 176 160 187 210 196 142 205 215

Son’s weight y 6.6 8.2 9.2 7.1 8.8 9.3 7.4 8.6

35. Age and Net Worth Is a person’s age related tohis or her net worth? A sample of 10 billionaires is

selected, and the person’s age and net worth arecompared. The data are given here.

Age x 56 39 42 60 84 37 68 66 73 55

Net worth(billion $) y 18 14 12 14 11 10 10 7 7 5Source: The Associated Press.

560 Chapter 10 Correlation and Regression

10–28

36. For Exercises 13, 15, and 21 in Section 10–1, find themean of the x and y variables. Then substitute the meanof the x variable into the corresponding regression lineequations found in Exercises 13, 15, and 21 in thissection and find y�. Compare the value of y� with foreach exercise. Generalize the results.

37. The y intercept value a can also be found by using theequation

a � y � bx

y

Extending the ConceptsVerify this result by using the data in Exercises 15 and 16of Sections 10–1 and 10–2.

38. The value of the correlation coefficient can also befound by using the formula

where sx is the standard deviation of the x values and sy isthe standard deviation of the y values. Verify this resultfor Exercises 18 and 20 of Section 10–1. r � �0.543; 0.812

r �bsx

sy

453.173; regression should not be done.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 560

缺席次數 x

期末成績 y

18. 年紀與財產 人們的年紀和他們的財產有關

係嗎?挑選一組 10 位億萬富翁的樣本,並且

比較他們的年紀和財產(以十億美元計),

數據如下所示。執行完整的迴歸分析,並且

在 α =0.05 之下用 p 值法檢定 r 的顯著性。

資料來源:The Associated Press.

of fireworks (in millions) used and the related injuries.Predict the number of injuries if 100 million fireworksare used during a given year.

Fireworksin use x 67.6 87.1 117 115 118 113

Relatedinjuries y 12,100 12,600 12,500 10,900 7800 7000

Source: National Council of Fireworks Safety, American Pyrotechnic Assoc.

29. Farm Acreage Is there a relationship between thenumber of farms in a state and the acreage per farm?A random selection of states across the country, botheastern and western, produced the following results. Cana relationship between these two variables be concluded?

No. of farms (thousands) x 77 52 20.8 49 28 58.2

Acreage per farm y 347 173 173 218 246 132

Source: World Almanac.

30. SAT Scores Educational researchers desired to find outif a relationship exists between the average SAT verbalscore and the average SAT mathematical score. Severalstates were randomly selected, and their SAT averagescores are recorded below. Is there sufficient evidence toconclude a relationship between the two scores?

Verbal x 526 504 594 585 503 589

Math y 530 522 606 588 517 589

Source: World Almanac.

31. Coal Production These data were obtained froma sample of counties in southwestern Pennsylvania

and indicate the number (in thousands) of tons ofbituminous coal produced in each county and thenumber of employees working in coal production ineach county. Predict the amount of coal produced for acounty that has 500 employees.

No. of employees x 110 731 1031 20 118 1162 103 752

Tons y 227 5410 5328 147 729 8095 635 6157

32. Television Viewers A television executive selects10 television shows and compares the average number

of viewers the show had last year with the averagenumber of viewers this year. The data (in millions) areshown. Describe the relationship.

Viewers last year x 26.6 17.85 20.3 16.8 20.8

Viewers this year y 28.9 19.2 26.4 13.7 20.2

Viewers last year x 16.7 19.1 18.9 16.0 15.8

Viewers this year y 18.8 25.0 21.0 16.8 15.3Source: Nielsen Media Research.

33. Absences and Final Grades An educator wants to seehow the number of absences for a student in her classaffects the student’s final grade. The data obtained froma sample are shown.

No. of absences x 10 12 2 0 8 5

Final grade y 70 65 96 94 75 82

For Exercises 34 and 35, do a complete regressionanalysis and test the significance of r at A� 0.05, usingthe P-value method.

34. Father’s and Son’s Weights A physician wishesto know whether there is a relationship between a

father’s weight (in pounds) and his newborn son’sweight (in pounds). The data are given here.

Father’s weight x 176 160 187 210 196 142 205 215

Son’s weight y 6.6 8.2 9.2 7.1 8.8 9.3 7.4 8.6

35. Age and Net Worth Is a person’s age related tohis or her net worth? A sample of 10 billionaires is

selected, and the person’s age and net worth arecompared. The data are given here.

Age x 56 39 42 60 84 37 68 66 73 55

Net worth(billion $) y 18 14 12 14 11 10 10 7 7 5Source: The Associated Press.

560 Chapter 10 Correlation and Regression

10–28

36. For Exercises 13, 15, and 21 in Section 10–1, find themean of the x and y variables. Then substitute the meanof the x variable into the corresponding regression lineequations found in Exercises 13, 15, and 21 in thissection and find y�. Compare the value of y� with foreach exercise. Generalize the results.

37. The y intercept value a can also be found by using theequation

a � y � bx

y

Extending the ConceptsVerify this result by using the data in Exercises 15 and 16of Sections 10–1 and 10–2.

38. The value of the correlation coefficient can also befound by using the formula

where sx is the standard deviation of the x values and sy isthe standard deviation of the y values. Verify this resultfor Exercises 18 and 20 of Section 10–1. r � �0.543; 0.812

r �bsx

sy

453.173; regression should not be done.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 560

of fireworks (in millions) used and the related injuries.Predict the number of injuries if 100 million fireworksare used during a given year.

Fireworksin use x 67.6 87.1 117 115 118 113

Relatedinjuries y 12,100 12,600 12,500 10,900 7800 7000

Source: National Council of Fireworks Safety, American Pyrotechnic Assoc.

29. Farm Acreage Is there a relationship between thenumber of farms in a state and the acreage per farm?A random selection of states across the country, botheastern and western, produced the following results. Cana relationship between these two variables be concluded?

No. of farms (thousands) x 77 52 20.8 49 28 58.2

Acreage per farm y 347 173 173 218 246 132

Source: World Almanac.

30. SAT Scores Educational researchers desired to find outif a relationship exists between the average SAT verbalscore and the average SAT mathematical score. Severalstates were randomly selected, and their SAT averagescores are recorded below. Is there sufficient evidence toconclude a relationship between the two scores?

Verbal x 526 504 594 585 503 589

Math y 530 522 606 588 517 589

Source: World Almanac.

31. Coal Production These data were obtained froma sample of counties in southwestern Pennsylvania

and indicate the number (in thousands) of tons ofbituminous coal produced in each county and thenumber of employees working in coal production ineach county. Predict the amount of coal produced for acounty that has 500 employees.

No. of employees x 110 731 1031 20 118 1162 103 752

Tons y 227 5410 5328 147 729 8095 635 6157

32. Television Viewers A television executive selects10 television shows and compares the average number

of viewers the show had last year with the averagenumber of viewers this year. The data (in millions) areshown. Describe the relationship.

Viewers last year x 26.6 17.85 20.3 16.8 20.8

Viewers this year y 28.9 19.2 26.4 13.7 20.2

Viewers last year x 16.7 19.1 18.9 16.0 15.8

Viewers this year y 18.8 25.0 21.0 16.8 15.3Source: Nielsen Media Research.

33. Absences and Final Grades An educator wants to seehow the number of absences for a student in her classaffects the student’s final grade. The data obtained froma sample are shown.

No. of absences x 10 12 2 0 8 5

Final grade y 70 65 96 94 75 82

For Exercises 34 and 35, do a complete regressionanalysis and test the significance of r at A� 0.05, usingthe P-value method.

34. Father’s and Son’s Weights A physician wishesto know whether there is a relationship between a

father’s weight (in pounds) and his newborn son’sweight (in pounds). The data are given here.

Father’s weight x 176 160 187 210 196 142 205 215

Son’s weight y 6.6 8.2 9.2 7.1 8.8 9.3 7.4 8.6

35. Age and Net Worth Is a person’s age related tohis or her net worth? A sample of 10 billionaires is

selected, and the person’s age and net worth arecompared. The data are given here.

Age x 56 39 42 60 84 37 68 66 73 55

Net worth(billion $) y 18 14 12 14 11 10 10 7 7 5Source: The Associated Press.

560 Chapter 10 Correlation and Regression

10–28

36. For Exercises 13, 15, and 21 in Section 10–1, find themean of the x and y variables. Then substitute the meanof the x variable into the corresponding regression lineequations found in Exercises 13, 15, and 21 in thissection and find y�. Compare the value of y� with foreach exercise. Generalize the results.

37. The y intercept value a can also be found by using theequation

a � y � bx

y

Extending the ConceptsVerify this result by using the data in Exercises 15 and 16of Sections 10–1 and 10–2.

38. The value of the correlation coefficient can also befound by using the formula

where sx is the standard deviation of the x values and sy isthe standard deviation of the y values. Verify this resultfor Exercises 18 and 20 of Section 10–1. r � �0.543; 0.812

r �bsx

sy

453.173; regression should not be done.

lu38582_ch10_533-590.qxd 9/13/10 2:17 PM Page 560

年紀 x

財產 y

ExcelStep by Step

散佈圖

當你使用 Chart Wizard 的時候,產生一張散佈圖是一件直接了當的事。

1. 為了使用 Scatter Plot(散佈圖)選項,你必須至少有兩欄的數據。

2. 反白想要畫圖的數據。從 toolbar(工具列)點選 Insert(插入),然後點選

Scatter chart 的第一類。

3. 用滑鼠左鍵點選圖上任何一處,你會自動帶出 toolbar(工具列)上頭的 Chart

Tools(畫圖工具)。Chart Tools(畫圖工具)選單包含三種額外編輯圖形的

選項:Design、Layout 以及 Format。

4. 你可以點選 Layout 為圖形以及座標軸加標題 (title),然後從 Labels group 點

選合適的選項。

相關係數

Excel 的 CORREL 函數不需要迴歸分析就可以回饋相關係數。

1. 在 A 欄和 B 欄輸入數據。

2. 點選空白儲存格,接著從 toolbar(工具列)點選 Formulas。

3. 從 toolbar(工具列)點選 Insert Function

To test the significance of b and r:

1. Press STAT and move the cursor to TESTS.

2. Press E (ALPHA SIN) for LinRegTTest. Make sure the Xlist is L1, the Ylist is L2, and theFreq is 1. (Use F for TI-84)

3. Select the appropriate alternative hypothesis.

4. Move the cursor to Calculate and press ENTER.

Example TI10–4

Test the hypothesis H0: r � 0 for the data in Example TI 10–1. Use a� 0.05.

In this case, the t test value is 4.050983638. The P-value is 0.0154631742, which is significant.The decision is to reject the null hypothesis at a � 0.05, since 0.0154631742 � 0.05; r �0.8966728145, r 2 � 0.8040221364.

There are two other ways to store the equation for the regression line in Y1 for graphing.

1. Type Y1 after the LinReg(a�bx) command.

2. Type Y1 in the RegEQ: spot in the LinRegTTest.To get Y1 do this:

Press VARS for variables, move cursor to Y-VARS, press 1 for Function, press 1 for Y1.

OutputOutputInput

564 Chapter 10 Correlation and Regression

10–32

ExcelStep by Step

Scatter PlotsCreating a scatter plot is straightforward when you use the Chart Wizard.

1. You must have at least two columns of data to use the Scatter Plot option.2. Highlight the data to be plotted. Select the Insert tab from the toolbar. Then select the

Scatter chart and the first type (Scatter with only markers).3. By left-clicking anywhere on the chart, you automatically bring up the Chart Tools group

on the toolbar. The Chart Tools menu includes three additional tabs for editing your chart:Design, Layout, and Format.

4. You can add titles to your chart and to the axes by selecting the Layout tab, then selectingthe appropriate option from the Labels group.

Correlation CoefficientThe CORREL function in Excel returns the correlation coefficient without regression analysis.

1. Enter the data in columns A and B.2. Select a blank cell, and then select the Formulas tab from the toolbar.

3. Select Insert Function icon from the toolbar.4. Select the Statistical function category and select the CORREL function.5. Enter the data range A1:AN, where N is the number of sample data pairs for the first variable

in Array1. Enter the data range B1:BN for the second variable in Array2, and then click [OK].

Correlation and RegressionThis procedure will allow you to calculate the Pearson product moment correlation coefficientwithout performing a regression analysis.

1. Enter the data from the example shown in a new worksheet. Enter the six values for thex numbers in column A and the corresponding y numbers in column B.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 564

小圖示。

4. 點選 Statistical(統計類)函數,接著點選 CORREL 函數。

5. 輸入數據範圍 A1:AN,其中的 N 是出現在 Array1 第一個變數的成對樣本的個

數。為 Array2 的第二個變數輸入數據範圍 B1:BN,然後點選 [OK]。

技 術 步 驟 解 析

Page 28: Simple Linear Regression from Blumen in Chinese

統計學

494

相關與迴歸

這一項程序讓你不用進行某一種迴歸分析就可以計算 Pearson 動差相關係數。

1. 在一張新 worksheet(工作表)輸入以下範例顯示的數據。在 A 欄輸入 x 數字

的 6 個值,以及在 B 欄輸入對應的 y 數字。

範例

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 565

10–33

Example

x 43 48 56 61 67 70

y 128 120 135 143 141 152

2. Select Data from the toolbar. Then select Data Analysis. Under Analysis Tools, selectCorrelation.

3. In the Correlation dialog box, type A1:B6 for the Input Range and check the GroupedBy: Columns option.

4. Under Output options, select Output Range, and type D2. Then click [OK].

This procedure will allow you to conduct a regression analysis and compute the correlationcoefficient. Use the data from Example 10–2.

1. Select the Data tab on the toolbar, then Data Analysis>Regression.

2. In the Regression dialog box, type B1:B6 in the Input Y Range and type A1:A6 in theInput X Range.

3. Under Output options, select Output Range, and type D6. Then click [OK].

Note: To see all of the decimal places for the statistics in the Summary Output, expand thewidth of columns D to L.

1. Highlight columns D through L.

2. Select the Home tab, and then select Format Autofit Column Width.

10–3 Coefficient of Determination and Standard Error of the EstimateThe previous sections stated that if the correlation coefficient is significant, the equationof the regression line can be determined. Also, for various values of the independent vari-able x, the corresponding values of the dependent variable y can be predicted. Severalother measures are associated with the correlation and regression techniques. They includethe coefficient of determination, the standard error of the estimate, and the predictioninterval. But before these concepts can be explained, the different types of variationassociated with the regression model must be defined.

Types of Variation for the Regression ModelConsider the following hypothetical regression model.

x 1 2 3 4 5

y 10 8 12 16 20

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 565

2. 從 toolbar(工具列)點選 Data(數據)。接著點選 Data Analysis(數據分

析)。在 Analysis Tools(分析工具)下點選 Correlation(相關)。

3. 在 Correlation(相關)對話框,為 Input Range(輸入範圍)輸入 A1:B6,然

後點選 Grouped By: Column(一欄是一群)選項。

4. 在 Output(輸出)選項點選 Output Range(輸出範圍),並且輸入 D2。最

後點選 [OK]。

這一項程序讓你進行某一項迴歸分析並且計算相關係數。使用例題 10-2 的數據。

1. 從 toolbar(工具列)點選 Data(數據),然後 Data Analysis>Regression。

2. 在 Regression(迴歸)對話框,在 Input Y Range(輸入 Y 範圍)輸入

B1:B6,以及在 Input X Range(輸入 X 範圍)輸入 A1:A6。

3. 在 Output(輸出)選項點選 Output Range(輸出範圍),並且輸入 D6。最

後點選 [OK]。

注意:為了清楚看到 Summary Output 的所有小數,擴大 D 欄到 L 欄的寬度。

1. 反白 D 欄到 L 欄。

2. 點選 Home tab,接著選擇 Format Autofit Column Width(自動調整欄位寬

度)。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 565

10–33

Example

x 43 48 56 61 67 70

y 128 120 135 143 141 152

2. Select Data from the toolbar. Then select Data Analysis. Under Analysis Tools, selectCorrelation.

3. In the Correlation dialog box, type A1:B6 for the Input Range and check the GroupedBy: Columns option.

4. Under Output options, select Output Range, and type D2. Then click [OK].

This procedure will allow you to conduct a regression analysis and compute the correlationcoefficient. Use the data from Example 10–2.

1. Select the Data tab on the toolbar, then Data Analysis>Regression.

2. In the Regression dialog box, type B1:B6 in the Input Y Range and type A1:A6 in theInput X Range.

3. Under Output options, select Output Range, and type D6. Then click [OK].

Note: To see all of the decimal places for the statistics in the Summary Output, expand thewidth of columns D to L.

1. Highlight columns D through L.

2. Select the Home tab, and then select Format Autofit Column Width.

10–3 Coefficient of Determination and Standard Error of the EstimateThe previous sections stated that if the correlation coefficient is significant, the equationof the regression line can be determined. Also, for various values of the independent vari-able x, the corresponding values of the dependent variable y can be predicted. Severalother measures are associated with the correlation and regression techniques. They includethe coefficient of determination, the standard error of the estimate, and the predictioninterval. But before these concepts can be explained, the different types of variationassociated with the regression model must be defined.

Types of Variation for the Regression ModelConsider the following hypothetical regression model.

x 1 2 3 4 5

y 10 8 12 16 20

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 565

Page 29: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

495

10-3 決定係數與估計的標準誤

前一節提過,如果相關係數是顯著的,可以決定迴歸線方程式。同時,針

對獨立變數 x 的各種數值,可以預測依變數 y 的對應數值。有許多測度都和相

關係數與迴歸技術有關,包括決定係數、估計的標準誤以及預測區間。但是在

解釋這一些概念之前,必須先定義和迴歸模型有關的各種變異。

��迴歸模型的各種變異

考慮以下的假設迴歸模型。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 565

10–33

Example

x 43 48 56 61 67 70

y 128 120 135 143 141 152

2. Select Data from the toolbar. Then select Data Analysis. Under Analysis Tools, selectCorrelation.

3. In the Correlation dialog box, type A1:B6 for the Input Range and check the GroupedBy: Columns option.

4. Under Output options, select Output Range, and type D2. Then click [OK].

This procedure will allow you to conduct a regression analysis and compute the correlationcoefficient. Use the data from Example 10–2.

1. Select the Data tab on the toolbar, then Data Analysis>Regression.

2. In the Regression dialog box, type B1:B6 in the Input Y Range and type A1:A6 in theInput X Range.

3. Under Output options, select Output Range, and type D6. Then click [OK].

Note: To see all of the decimal places for the statistics in the Summary Output, expand thewidth of columns D to L.

1. Highlight columns D through L.

2. Select the Home tab, and then select Format Autofit Column Width.

10–3 Coefficient of Determination and Standard Error of the EstimateThe previous sections stated that if the correlation coefficient is significant, the equationof the regression line can be determined. Also, for various values of the independent vari-able x, the corresponding values of the dependent variable y can be predicted. Severalother measures are associated with the correlation and regression techniques. They includethe coefficient of determination, the standard error of the estimate, and the predictioninterval. But before these concepts can be explained, the different types of variationassociated with the regression model must be defined.

Types of Variation for the Regression ModelConsider the following hypothetical regression model.

x 1 2 3 4 5

y 10 8 12 16 20

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 565

迴歸線的方程式是 y′ =4.8+2.8x,而且 r=0.919。y 的樣本值是 10, 8, 12,

16, 20。針對每一個 x,符號 y′ 表示預測值,透過把 x 值代入迴歸方程式求出

y′。比如說,當 x=1,

The equation of the regression line is y� � 4.8 � 2.8x, and r � 0.919. The sample yvalues are 10, 8, 12, 16, and 20. The predicted values, designated by y�, for each x can befound by substituting each x value into the regression equation and finding y�. For exam-ple, when x � 1,

y� � 4.8 � 2.8x � 4.8 � (2.8)(1) � 7.6

Now, for each x, there is an observed y value and a predicted y� value; for example,when x � 1, y � 10, and y� � 7.6. Recall that the closer the observed values are to thepredicted values, the better the fit is and the closer r is to �1 or �1.

The total variation �(y � )2 is the sum of the squares of the vertical distances eachpoint is from the mean. The total variation can be divided into two parts: that which isattributed to the relationship of x and y and that which is due to chance. The variationobtained from the relationship (i.e., from the predicted y� values) is �(y� � )2 and iscalled the explained variation. Most of the variations can be explained by the relation-ship. The closer the value r is to �1 or �1, the better the points fit the line and the closer�(y� � )2 is to �(y � )2. In fact, if all points fall on the regression line, �(y� � )2 willequal �(y � )2, since y� is equal to y in each case.

On the other hand, the variation due to chance, found by �(y � y�)2, is called theunexplained variation. This variation cannot be attributed to the relationship. Whenthe unexplained variation is small, the value of r is close to �1 or �1. If all points fall onthe regression line, the unexplained variation �(y � y�)2 will be 0. Hence, the total variationis equal to the sum of the explained variation and the unexplained variation. That is,

�(y � )2 � �(y� � )2 � �(y � y�)2

These values are shown in Figure 10–17. For a single point, the differences are calleddeviations. For the hypothetical regression model given earlier, for x � 1 and y � 10, youget y� � 7.6 and � 13.2.

The procedure for finding the three types of variation is illustrated next.

Step 1 Find the predicted y� values.

For x � 1 y� � 4.8 � 2.8x � 4.8 � (2.8)(1) � 7.6

For x � 2 y� � 4.8 � (2.8)(2) � 10.4

For x � 3 y� � 4.8 � (2.8)(3) � 13.2

For x � 4 y� � 4.8 � (2.8)(4) � 16.0

For x � 5 y� � 4.8 � (2.8)(5) � 18.8

y

yy

yyyy

y

y

566 Chapter 10 Correlation and Regression

10–34

y

x

(x, y )

(x, y �)

Unexplaineddeviationy – y �

(x, y–)

Total deviation y – y–

Explaineddeviationy � – y–

y–y–

x–

Figure 10–17

Deviations for theRegression Equation

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 566

現在,針對每一個 x,有一個觀察值 y 和一個預測值 y′;比如說,當 x =1 的

時候,y=10 以及 y′=7.6。回憶一下,觀察值愈接近預測值,數據與迴歸線愈

相符,而且 r 愈接近+1 或是−1。

總變異 ∑ (y −y)2 是每一個 y 和平均數之垂直距離的平方和。總變異分成

兩部分:一部分是因為 x 和 y 的關係,一部分是因為機會。因為關係得到的變

異是 ∑ (y′ −y)2,也叫做可解釋的變異。迴歸關係可以解釋大部分的變異。r 值

愈接近+1 或是−1,數據與迴歸線愈相符,∑ (y′ −y)2 和 ∑ (y −y)2 愈靠近。

事實上,如果所有點都落在迴歸線上,∑ (y′−y)2 會等於 ∑ (y−y)2,因為 y′ 在

每一個 x 值都等於 y。

另一方面,因為機會帶出來的變異,是 ∑ (y −y′)2,叫做無法解釋的變

異。這一項變異不是關係貢獻的。當無法解釋的變異很小的時候,r 值會接近

+1 或是−1。如果所有點都落在迴歸線上,∑ (y −y′)2 會等於 0。因此,總變

異等於可解釋的變異加上無法解釋的變異。也就是說,

The equation of the regression line is y� � 4.8 � 2.8x, and r � 0.919. The sample yvalues are 10, 8, 12, 16, and 20. The predicted values, designated by y�, for each x can befound by substituting each x value into the regression equation and finding y�. For exam-ple, when x � 1,

y� � 4.8 � 2.8x � 4.8 � (2.8)(1) � 7.6

Now, for each x, there is an observed y value and a predicted y� value; for example,when x � 1, y � 10, and y� � 7.6. Recall that the closer the observed values are to thepredicted values, the better the fit is and the closer r is to �1 or �1.

The total variation �(y � )2 is the sum of the squares of the vertical distances eachpoint is from the mean. The total variation can be divided into two parts: that which isattributed to the relationship of x and y and that which is due to chance. The variationobtained from the relationship (i.e., from the predicted y� values) is �(y� � )2 and iscalled the explained variation. Most of the variations can be explained by the relation-ship. The closer the value r is to �1 or �1, the better the points fit the line and the closer�(y� � )2 is to �(y � )2. In fact, if all points fall on the regression line, �(y� � )2 willequal �(y � )2, since y� is equal to y in each case.

On the other hand, the variation due to chance, found by �(y � y�)2, is called theunexplained variation. This variation cannot be attributed to the relationship. Whenthe unexplained variation is small, the value of r is close to �1 or �1. If all points fall onthe regression line, the unexplained variation �(y � y�)2 will be 0. Hence, the total variationis equal to the sum of the explained variation and the unexplained variation. That is,

�(y � )2 � �(y� � )2 � �(y � y�)2

These values are shown in Figure 10–17. For a single point, the differences are calleddeviations. For the hypothetical regression model given earlier, for x � 1 and y � 10, youget y� � 7.6 and � 13.2.

The procedure for finding the three types of variation is illustrated next.

Step 1 Find the predicted y� values.

For x � 1 y� � 4.8 � 2.8x � 4.8 � (2.8)(1) � 7.6

For x � 2 y� � 4.8 � (2.8)(2) � 10.4

For x � 3 y� � 4.8 � (2.8)(3) � 13.2

For x � 4 y� � 4.8 � (2.8)(4) � 16.0

For x � 5 y� � 4.8 � (2.8)(5) � 18.8

y

yy

yyyy

y

y

566 Chapter 10 Correlation and Regression

10–34

y

x

(x, y )

(x, y �)

Unexplaineddeviationy – y �

(x, y–)

Total deviation y – y–

Explaineddeviationy � – y–

y–y–

x–

Figure 10–17

Deviations for theRegression Equation

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 566

這些數值顯示在圖 10-17。針對某一點,差距叫做離異(deviation)。對

一開始假設的迴歸模型而言,針對 x =1 以及 y =10,你會得到 y ′ =7.6 和

y =13.2。

Page 30: Simple Linear Regression from Blumen in Chinese

統計學

496

求出三種變異的程序如下所示。

步驟 1 求出預測值 y′。

The equation of the regression line is y� � 4.8 � 2.8x, and r � 0.919. The sample yvalues are 10, 8, 12, 16, and 20. The predicted values, designated by y�, for each x can befound by substituting each x value into the regression equation and finding y�. For exam-ple, when x � 1,

y� � 4.8 � 2.8x � 4.8 � (2.8)(1) � 7.6

Now, for each x, there is an observed y value and a predicted y� value; for example,when x � 1, y � 10, and y� � 7.6. Recall that the closer the observed values are to thepredicted values, the better the fit is and the closer r is to �1 or �1.

The total variation �(y � )2 is the sum of the squares of the vertical distances eachpoint is from the mean. The total variation can be divided into two parts: that which isattributed to the relationship of x and y and that which is due to chance. The variationobtained from the relationship (i.e., from the predicted y� values) is �(y� � )2 and iscalled the explained variation. Most of the variations can be explained by the relation-ship. The closer the value r is to �1 or �1, the better the points fit the line and the closer�(y� � )2 is to �(y � )2. In fact, if all points fall on the regression line, �(y� � )2 willequal �(y � )2, since y� is equal to y in each case.

On the other hand, the variation due to chance, found by �(y � y�)2, is called theunexplained variation. This variation cannot be attributed to the relationship. Whenthe unexplained variation is small, the value of r is close to �1 or �1. If all points fall onthe regression line, the unexplained variation �(y � y�)2 will be 0. Hence, the total variationis equal to the sum of the explained variation and the unexplained variation. That is,

�(y � )2 � �(y� � )2 � �(y � y�)2

These values are shown in Figure 10–17. For a single point, the differences are calleddeviations. For the hypothetical regression model given earlier, for x � 1 and y � 10, youget y� � 7.6 and � 13.2.

The procedure for finding the three types of variation is illustrated next.

Step 1 Find the predicted y� values.

For x � 1 y� � 4.8 � 2.8x � 4.8 � (2.8)(1) � 7.6

For x � 2 y� � 4.8 � (2.8)(2) � 10.4

For x � 3 y� � 4.8 � (2.8)(3) � 13.2

For x � 4 y� � 4.8 � (2.8)(4) � 16.0

For x � 5 y� � 4.8 � (2.8)(5) � 18.8

y

yy

yyyy

y

y

566 Chapter 10 Correlation and Regression

10–34

y

x

(x, y )

(x, y �)

Unexplaineddeviationy – y �

(x, y–)

Total deviation y – y–

Explaineddeviationy � – y–

y–y–

x–

Figure 10–17

Deviations for theRegression Equation

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 566

因此,這個例題的數值如下所示:

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 567

10–35

Hence, the values for this example are as follows:

x y y�

1 10 7.62 8 10.43 12 13.24 16 16.05 20 18.8

Step 2 Find the mean of the y values.

Step 3 Find the total variation �(y � )2.

(10 � 13.2)2 � 10.24

(8 � 13.2)2 � 27.04

(12 � 13.2)2 � 1.44

(16 � 13.2)2 � 7.84

(20 � 13.2)2 � 46.24

�(y � )2 � 92.8

Step 4 Find the explained variation �(y� � )2.

(7.6 � 13.2)2 � 31.36

(10.4 � 13.2)2 � 7.84

(13.2 � 13.2)2 � 0.00

(16 � 13.2)2 � 7.84

(18.8 � 13.2)2 � 31.36

�(y� � )2 � 78.4

Step 5 Find the unexplained variation �(y � y�)2.

(10 � 7.6)2 � 5.76

(8 � 10.4)2 � 5.76

(12 � 13.2)2 � 1.44

(16 � 16)2 � 0.00

(20 � 18.8)2 � 1.44

�(y � y�)2 � 14.4

Notice that

Total variation � explained variation � unexplained variation

92.8 � 78.4 � 14.4

Note: The values (y � y�) are called residuals. A residual is the difference betweenthe actual value of y and the predicted value y� for a given x value. The mean of the resid-uals is always zero. As stated previously, the regression line determined by the formulasin Section 10–2 is the line that best fits the points of the scatter plot. The sum of thesquares of the residuals computed by using the regression line is the smallest possiblevalue. For this reason, a regression line is also called a least-squares line.

y

y

y

y

y �10 � 8 � 12 � 16 � 20

5� 13.2

Unusual Stat

There are 1,929,770,126,028,800 differentcolor combinations forRubik’s cube and onlyone correct solution inwhich all the colors ofthe squares on eachface are the same.

Historical Note

In the 19th century,astronomers such asGauss and Laplaceused what is called theprinciple of leastsquares based onmeasurement errors todetermine the shape ofEarth. It is now used inregression theory.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 567

步驟 2 求出 y 值的平均數。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 567

10–35

Hence, the values for this example are as follows:

x y y�

1 10 7.62 8 10.43 12 13.24 16 16.05 20 18.8

Step 2 Find the mean of the y values.

Step 3 Find the total variation �(y � )2.

(10 � 13.2)2 � 10.24

(8 � 13.2)2 � 27.04

(12 � 13.2)2 � 1.44

(16 � 13.2)2 � 7.84

(20 � 13.2)2 � 46.24

�(y � )2 � 92.8

Step 4 Find the explained variation �(y� � )2.

(7.6 � 13.2)2 � 31.36

(10.4 � 13.2)2 � 7.84

(13.2 � 13.2)2 � 0.00

(16 � 13.2)2 � 7.84

(18.8 � 13.2)2 � 31.36

�(y� � )2 � 78.4

Step 5 Find the unexplained variation �(y � y�)2.

(10 � 7.6)2 � 5.76

(8 � 10.4)2 � 5.76

(12 � 13.2)2 � 1.44

(16 � 16)2 � 0.00

(20 � 18.8)2 � 1.44

�(y � y�)2 � 14.4

Notice that

Total variation � explained variation � unexplained variation

92.8 � 78.4 � 14.4

Note: The values (y � y�) are called residuals. A residual is the difference betweenthe actual value of y and the predicted value y� for a given x value. The mean of the resid-uals is always zero. As stated previously, the regression line determined by the formulasin Section 10–2 is the line that best fits the points of the scatter plot. The sum of thesquares of the residuals computed by using the regression line is the smallest possiblevalue. For this reason, a regression line is also called a least-squares line.

y

y

y

y

y �10 � 8 � 12 � 16 � 20

5� 13.2

Unusual Stat

There are 1,929,770,126,028,800 differentcolor combinations forRubik’s cube and onlyone correct solution inwhich all the colors ofthe squares on eachface are the same.

Historical Note

In the 19th century,astronomers such asGauss and Laplaceused what is called theprinciple of leastsquares based onmeasurement errors todetermine the shape ofEarth. It is now used inregression theory.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 567

步驟 3 求出總變異 ∑ (y−y)2。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 567

10–35

Hence, the values for this example are as follows:

x y y�

1 10 7.62 8 10.43 12 13.24 16 16.05 20 18.8

Step 2 Find the mean of the y values.

Step 3 Find the total variation �(y � )2.

(10 � 13.2)2 � 10.24

(8 � 13.2)2 � 27.04

(12 � 13.2)2 � 1.44

(16 � 13.2)2 � 7.84

(20 � 13.2)2 � 46.24

�(y � )2 � 92.8

Step 4 Find the explained variation �(y� � )2.

(7.6 � 13.2)2 � 31.36

(10.4 � 13.2)2 � 7.84

(13.2 � 13.2)2 � 0.00

(16 � 13.2)2 � 7.84

(18.8 � 13.2)2 � 31.36

�(y� � )2 � 78.4

Step 5 Find the unexplained variation �(y � y�)2.

(10 � 7.6)2 � 5.76

(8 � 10.4)2 � 5.76

(12 � 13.2)2 � 1.44

(16 � 16)2 � 0.00

(20 � 18.8)2 � 1.44

�(y � y�)2 � 14.4

Notice that

Total variation � explained variation � unexplained variation

92.8 � 78.4 � 14.4

Note: The values (y � y�) are called residuals. A residual is the difference betweenthe actual value of y and the predicted value y� for a given x value. The mean of the resid-uals is always zero. As stated previously, the regression line determined by the formulasin Section 10–2 is the line that best fits the points of the scatter plot. The sum of thesquares of the residuals computed by using the regression line is the smallest possiblevalue. For this reason, a regression line is also called a least-squares line.

y

y

y

y

y �10 � 8 � 12 � 16 � 20

5� 13.2

Unusual Stat

There are 1,929,770,126,028,800 differentcolor combinations forRubik’s cube and onlyone correct solution inwhich all the colors ofthe squares on eachface are the same.

Historical Note

In the 19th century,astronomers such asGauss and Laplaceused what is called theprinciple of leastsquares based onmeasurement errors todetermine the shape ofEarth. It is now used inregression theory.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 567

圖 10-17

迴歸方程式的離

總離異 y−y

無法解釋的離異 y −y ′

可解釋的 離異 y ′−y

(x, y)

(x, y )

x

y

(x, y ′)

yy

x

Page 31: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

497

步驟 4 求出可解釋的變異 ∑ (y′ −y)2。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 567

10–35

Hence, the values for this example are as follows:

x y y�

1 10 7.62 8 10.43 12 13.24 16 16.05 20 18.8

Step 2 Find the mean of the y values.

Step 3 Find the total variation �(y � )2.

(10 � 13.2)2 � 10.24

(8 � 13.2)2 � 27.04

(12 � 13.2)2 � 1.44

(16 � 13.2)2 � 7.84

(20 � 13.2)2 � 46.24

�(y � )2 � 92.8

Step 4 Find the explained variation �(y� � )2.

(7.6 � 13.2)2 � 31.36

(10.4 � 13.2)2 � 7.84

(13.2 � 13.2)2 � 0.00

(16 � 13.2)2 � 7.84

(18.8 � 13.2)2 � 31.36

�(y� � )2 � 78.4

Step 5 Find the unexplained variation �(y � y�)2.

(10 � 7.6)2 � 5.76

(8 � 10.4)2 � 5.76

(12 � 13.2)2 � 1.44

(16 � 16)2 � 0.00

(20 � 18.8)2 � 1.44

�(y � y�)2 � 14.4

Notice that

Total variation � explained variation � unexplained variation

92.8 � 78.4 � 14.4

Note: The values (y � y�) are called residuals. A residual is the difference betweenthe actual value of y and the predicted value y� for a given x value. The mean of the resid-uals is always zero. As stated previously, the regression line determined by the formulasin Section 10–2 is the line that best fits the points of the scatter plot. The sum of thesquares of the residuals computed by using the regression line is the smallest possiblevalue. For this reason, a regression line is also called a least-squares line.

y

y

y

y

y �10 � 8 � 12 � 16 � 20

5� 13.2

Unusual Stat

There are 1,929,770,126,028,800 differentcolor combinations forRubik’s cube and onlyone correct solution inwhich all the colors ofthe squares on eachface are the same.

Historical Note

In the 19th century,astronomers such asGauss and Laplaceused what is called theprinciple of leastsquares based onmeasurement errors todetermine the shape ofEarth. It is now used inregression theory.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 567

步驟 5 求出無法解釋的變異 ∑ (y−y′)2。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 567

10–35

Hence, the values for this example are as follows:

x y y�

1 10 7.62 8 10.43 12 13.24 16 16.05 20 18.8

Step 2 Find the mean of the y values.

Step 3 Find the total variation �(y � )2.

(10 � 13.2)2 � 10.24

(8 � 13.2)2 � 27.04

(12 � 13.2)2 � 1.44

(16 � 13.2)2 � 7.84

(20 � 13.2)2 � 46.24

�(y � )2 � 92.8

Step 4 Find the explained variation �(y� � )2.

(7.6 � 13.2)2 � 31.36

(10.4 � 13.2)2 � 7.84

(13.2 � 13.2)2 � 0.00

(16 � 13.2)2 � 7.84

(18.8 � 13.2)2 � 31.36

�(y� � )2 � 78.4

Step 5 Find the unexplained variation �(y � y�)2.

(10 � 7.6)2 � 5.76

(8 � 10.4)2 � 5.76

(12 � 13.2)2 � 1.44

(16 � 16)2 � 0.00

(20 � 18.8)2 � 1.44

�(y � y�)2 � 14.4

Notice that

Total variation � explained variation � unexplained variation

92.8 � 78.4 � 14.4

Note: The values (y � y�) are called residuals. A residual is the difference betweenthe actual value of y and the predicted value y� for a given x value. The mean of the resid-uals is always zero. As stated previously, the regression line determined by the formulasin Section 10–2 is the line that best fits the points of the scatter plot. The sum of thesquares of the residuals computed by using the regression line is the smallest possiblevalue. For this reason, a regression line is also called a least-squares line.

y

y

y

y

y �10 � 8 � 12 � 16 � 20

5� 13.2

Unusual Stat

There are 1,929,770,126,028,800 differentcolor combinations forRubik’s cube and onlyone correct solution inwhich all the colors ofthe squares on eachface are the same.

Historical Note

In the 19th century,astronomers such asGauss and Laplaceused what is called theprinciple of leastsquares based onmeasurement errors todetermine the shape ofEarth. It is now used inregression theory.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 567

注意到,

總變異=可解釋的變異 + 無法解釋的變異

92.8 = 78.4 + 14.4

注意:數值 (y−y′) 叫做殘差。殘差 (residual) 是真實觀察值 y 和預測值 y′

之間的差距。殘差的平均數永遠是 0。如前述,第 10-2 節公式所決定的直線,

是散佈圖上與數據最相符的直線。用迴歸線取得的殘差平方和是最小的。因為

這樣的理由,迴歸線也叫做最小平方線 (least-squares line)。

��殘差圖

如前述,數值 y −y′ 叫做殘差(有時候也叫做預測誤差)。這些數值可以

和 x 值畫在一張圖上,叫做殘差圖 (residual plot),可以用殘差圖來決定用迴歸

線預測得有多好。

前述例子的殘差的計算過程如下。

Residual PlotsAs previously stated, the values y � y� are called residuals (sometimes called the predictionerrors). These values can be plotted with the x values, and the plot, called a residual plot,can be used to determine how well the regression line can be used to make predictions.

The residuals for the previous example are calculated as shown.

x y y� y � y� � residual

1 10 7.6 10 � 7.6 � 2.42 8 10.4 8 � 10.4 � �2.43 12 13.2 12 � 13.2 � �1.24 16 16 16 � 16 � 05 20 18.8 20 � 18.8 � 1.2

The x values are plotted using the horizontal axis, and the residuals are plotted usingthe vertical axis. Since the mean of the residuals is always zero, a horizontal line with ay coordinate of zero is placed on the y axis as shown in Figure 10–18.

Plot the x and residual values as shown in Figure 10–18.

x 1 2 3 4 5

y � y� 2.4 �2.4 �1.2 0 1.2

568 Chapter 10 Correlation and Regression

10–36

3

2

1

0

�1

�2

�3

y � y �

x

1 2 3 4 5

Figure 10–18

Residual Plot

To interpret a residual plot, you need to determine if the residuals form a pattern.Figure 10–19 shows four examples of residual plots. If the residual values are more orless evenly distributed about the line, as shown in Figure 10–19(a), then the relationshipbetween x and y is linear and the regression line can be used to make predictions. Thismeans that the standard deviations of each of the dependent variables must be the samefor each value of the independent variable. This is called the homoscedasticity assumption.See assumption 3 on page 556.

Figure 10–19(b) shows that the variance of the residuals increases as the values of xincrease. This means that the regression line is not suitable for predictions.

Figure 10–19(c) shows a curvilinear relationship between the x values and the resid-ual values; hence, the regression line is not suitable for making predictions.

Figure 10–19(d) shows that as the x values increase, the residuals increase andbecome more dispersed. This means that the regression line is not suitable for makingpredictions.

lu38582_ch10_533-590.qxd 10/7/10 7:36 AM Page 568

殘差

魔術方塊總共有 1,929, 770,126,028,800 種不同顏色的組

合,而且只有一種

正確答案,那就是

每一面的每一個方

格有著一樣的顏

色。

非凡數字

Page 32: Simple Linear Regression from Blumen in Chinese

統計學

498

將 x 值畫在水平軸上,而將殘差畫在垂直軸上。因為殘差的平均數永遠是

0,所以我們會在 y 軸的原點處往右邊在圖上加一條水平線段,如圖 10-18 所

示。

x 與殘差的圖形如圖 10-18 所示。

Residual PlotsAs previously stated, the values y � y� are called residuals (sometimes called the predictionerrors). These values can be plotted with the x values, and the plot, called a residual plot,can be used to determine how well the regression line can be used to make predictions.

The residuals for the previous example are calculated as shown.

x y y� y � y� � residual

1 10 7.6 10 � 7.6 � 2.42 8 10.4 8 � 10.4 � �2.43 12 13.2 12 � 13.2 � �1.24 16 16 16 � 16 � 05 20 18.8 20 � 18.8 � 1.2

The x values are plotted using the horizontal axis, and the residuals are plotted usingthe vertical axis. Since the mean of the residuals is always zero, a horizontal line with ay coordinate of zero is placed on the y axis as shown in Figure 10–18.

Plot the x and residual values as shown in Figure 10–18.

x 1 2 3 4 5

y � y� 2.4 �2.4 �1.2 0 1.2

568 Chapter 10 Correlation and Regression

10–36

3

2

1

0

�1

�2

�3

y � y �

x

1 2 3 4 5

Figure 10–18

Residual Plot

To interpret a residual plot, you need to determine if the residuals form a pattern.Figure 10–19 shows four examples of residual plots. If the residual values are more orless evenly distributed about the line, as shown in Figure 10–19(a), then the relationshipbetween x and y is linear and the regression line can be used to make predictions. Thismeans that the standard deviations of each of the dependent variables must be the samefor each value of the independent variable. This is called the homoscedasticity assumption.See assumption 3 on page 556.

Figure 10–19(b) shows that the variance of the residuals increases as the values of xincrease. This means that the regression line is not suitable for predictions.

Figure 10–19(c) shows a curvilinear relationship between the x values and the resid-ual values; hence, the regression line is not suitable for making predictions.

Figure 10–19(d) shows that as the x values increase, the residuals increase andbecome more dispersed. This means that the regression line is not suitable for makingpredictions.

lu38582_ch10_533-590.qxd 10/7/10 7:36 AM Page 568

為了解釋殘差圖,你需要決定殘差是否形成某種樣式。圖 10-19 顯示四個

殘差圖的例子。如果殘差或多或少沿直線分佈,如圖 10-19(a) 所示,則 x 和 y

的關係是線性的,而且此迴歸線可以用來預測。這意味著每一個已知獨立變數

下的依變數的標準差會一致。這是所謂的變異數均質假設。詳見第 489 頁的假

設 3。

圖 10-19(b) 顯示殘差的變異數隨著 x 值增加而增加。這意味著此迴歸線不

適合用來預測。

圖 10-19(c) 顯示 x 值與殘差之間有一種曲線關係;因此,此迴歸線也一樣

不適合用來進行預測。

圖 10-19(d) 顯示當 x 值漸增,殘差漸增而且愈來愈分散。這意味著此迴歸

線也一樣不適合用來進行預測。

圖 10-18 的殘差圖顯示迴歸線 y′ =4.8 +2.8x 有點不適合用來預測,因為

樣本很小。

圖 10-18

殘差圖 3

2

1

0

�1

�2

�3

y � y �

x

1 2 3 4 5

Page 33: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

499

��決定係數

決定係數是可解釋的變異和總變異的比值,記作 r2。也就是說,

r2 = 可解釋的變異總變異

舉個例子,r2 =78.4/92.8 =0.845。r2 通常用百分比表示。所以在這個例

子,透過獨立變數迴歸線解釋了 84.5% 的總變異。

另一種取得決定 r2 值的方法是將相關係數平方。在這個例子,由 r=0.919

得出 r2 =0.845,這個數字和使用變異比值得到的答案一樣。

決定係數 (coefficient of determination) 是一種用迴歸線與獨立變數可以解

釋多少比例依變數變異的測度。決定係數的符號是 r 2。

當然,通常而言,把相關係數 r 取平方然後換成百分比會比較簡單,因

此,如果 r =0.90,則 r2 =0.81,也就是 81%。這個結果意味著解釋變數會算

到依變數 81% 的變異。其他的變異,0.19 或說是 19% 是無法解釋的。這一個

數字也叫做無決定係數,可以用 1 減去決定係數來求出這一個數字。當 r 逼近

0,r2 會掉得很快。比如說,如果 r =0.6,則 r2 =0.36,這意味著用獨立變數

只能解釋 36% 的依變數變異。

圖 10-19

殘差圖的例子

y � y �

x

(a)

0

y � y �

x

(b)

0

y � y �

x

(c)

0

y � y �

x

(d)

0

(a)

(c)

(b)

(d)

學習目標 浺

計算決定係數。

Page 34: Simple Linear Regression from Blumen in Chinese

統計學

500

無決定係數

1.00− r2

��估計的標準誤

當預測某一個特定 x 值的 y′ 值,預測是一種點預測。不過,我們也可以

建構 y′ 值的預測區間,就像為母體平均數的點估計建構某種信賴區間一樣。

預測區間使用一種統計量,叫做估計的標準誤。

估計的標準誤 (standard error of the estimate),記作 sest,是觀察到的 y

值關於預測值 y′ 的標準差。估計的標準誤公式是

570 Chapter 10 Correlation and Regression

10–38

Coefficient of Nondetermination

1.00 � r 2

Standard Error of the EstimateWhen a value is predicted for a specific x value, the prediction is a point prediction.However, a prediction interval about the value can be constructed, just as a confidenceinterval was constructed for an estimate of the population mean. The prediction intervaluses a statistic called the standard error of the estimate.

The standard error of the estimate, denoted by sest, is the standard deviation of theobserved y values about the predicted values. The formula for the standard error ofthe estimate is

The standard error of the estimate is similar to the standard deviation, but the meanis not used. As can be seen from the formula, the standard error of the estimate is thesquare root of the unexplained variation—that is, the variation due to the difference ofthe observed values and the expected values—divided by n � 2. So the closer theobserved values are to the predicted values, the smaller the standard error of the estimatewill be.

Example 10–12 shows how to compute the standard error of the estimate.

sest � ���y � y��2

n � 2

y�

y�y�

Objective

Compute the standarderror of the estimate.

6

Example 10–12 Copy Machine Maintenance CostsA researcher collects the following data and determines that there is a significantrelationship between the age of a copy machine and its monthly maintenance cost.

The regression equation is � 55.57 � 8.13x. Find the standard error of the estimate.

Machine Age x (years) Monthly cost y

A 1 $ 62B 2 78C 3 70D 4 90E 4 93F 6 103

Solution

Step 1 Make a table, as shown.

x y y� y � y� (y � y�)2

1 622 783 704 904 936 103

y�

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 570

估計的標準誤類似標準差,但是不使用平均數。可以從公式發現,估計的

標準誤是無法解釋的變異  也就是說,因為觀察值與期望值之差距的變異

  除以 n −2 的正方根。所以,如果觀察值愈靠近預測值,估計的標準誤會

愈小。

例題 10-12 顯示如何計算估計的標準誤。

學習目標 浣

計算估計的標準

誤。

有一位研究員收集了以下的數據,而且決定影印機使用年數與每個月維護費用之間有一種

顯著關係。迴歸線方程式是 y′ =55.57+8.13x。求出估計的標準誤。

570 Chapter 10 Correlation and Regression

10–38

Coefficient of Nondetermination

1.00 � r 2

Standard Error of the EstimateWhen a value is predicted for a specific x value, the prediction is a point prediction.However, a prediction interval about the value can be constructed, just as a confidenceinterval was constructed for an estimate of the population mean. The prediction intervaluses a statistic called the standard error of the estimate.

The standard error of the estimate, denoted by sest, is the standard deviation of theobserved y values about the predicted values. The formula for the standard error ofthe estimate is

The standard error of the estimate is similar to the standard deviation, but the meanis not used. As can be seen from the formula, the standard error of the estimate is thesquare root of the unexplained variation—that is, the variation due to the difference ofthe observed values and the expected values—divided by n � 2. So the closer theobserved values are to the predicted values, the smaller the standard error of the estimatewill be.

Example 10–12 shows how to compute the standard error of the estimate.

sest � ���y � y��2

n � 2

y�

y�y�

Objective

Compute the standarderror of the estimate.

6

Example 10–12 Copy Machine Maintenance CostsA researcher collects the following data and determines that there is a significantrelationship between the age of a copy machine and its monthly maintenance cost.

The regression equation is � 55.57 � 8.13x. Find the standard error of the estimate.

Machine Age x (years) Monthly cost y

A 1 $ 62B 2 78C 3 70D 4 90E 4 93F 6 103

Solution

Step 1 Make a table, as shown.

x y y� y � y� (y � y�)2

1 622 783 704 904 936 103

y�

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 570

機器 使用年數 x 每月費用 y

■解答

步驟 1

建立如下頁所示的表格。

影印機維修費用例題 10-12

Page 35: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

501

570 Chapter 10 Correlation and Regression

10–38

Coefficient of Nondetermination

1.00 � r 2

Standard Error of the EstimateWhen a value is predicted for a specific x value, the prediction is a point prediction.However, a prediction interval about the value can be constructed, just as a confidenceinterval was constructed for an estimate of the population mean. The prediction intervaluses a statistic called the standard error of the estimate.

The standard error of the estimate, denoted by sest, is the standard deviation of theobserved y values about the predicted values. The formula for the standard error ofthe estimate is

The standard error of the estimate is similar to the standard deviation, but the meanis not used. As can be seen from the formula, the standard error of the estimate is thesquare root of the unexplained variation—that is, the variation due to the difference ofthe observed values and the expected values—divided by n � 2. So the closer theobserved values are to the predicted values, the smaller the standard error of the estimatewill be.

Example 10–12 shows how to compute the standard error of the estimate.

sest � ���y � y��2

n � 2

y�

y�y�

Objective

Compute the standarderror of the estimate.

6

Example 10–12 Copy Machine Maintenance CostsA researcher collects the following data and determines that there is a significantrelationship between the age of a copy machine and its monthly maintenance cost.

The regression equation is � 55.57 � 8.13x. Find the standard error of the estimate.

Machine Age x (years) Monthly cost y

A 1 $ 62B 2 78C 3 70D 4 90E 4 93F 6 103

Solution

Step 1 Make a table, as shown.

x y y� y � y� (y � y�)2

1 622 783 704 904 936 103

y�

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 570

步驟 2

利用迴歸線方程式 y′ =55.57 +8.13x,為每一個 x 計算預測值 y′,並且把結果放在標示為 y′ 的那一行。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 571

10–39

Step 2 Using the regression line equation y� � 55.57 � 8.13x, compute the predictedvalues y� for each x and place the results in the column labeled .

x � 1 y� � 55.57 � (8.13)(1) � 63.70

x � 2 y� � 55.57 � (8.13)(2) � 71.83

x � 3 y� � 55.57 � (8.13)(3) � 79.96

x � 4 y� � 55.57 � (8.13)(4) � 88.09

x � 6 y� � 55.57 � (8.13)(6) � 104.35

Step 3 For each y, subtract y� and place the answer in the column labeled y � y�.

62 � 63.70 � �1.70 90 � 88.09 � 1.91

78 � 71.83 � 6.17 93 � 88.09 � 4.91

70 � 79.96 � �9.96 103 � 104.35 � �1.35

Step 4 Square the numbers found in step 3 and place the squares in the columnlabeled (y � y�)2.

Step 5 Find the sum of the numbers in the last column. The completed table isshown.

x y y� y � y� ( y � y�)2

1 62 63.70 �1.70 2.892 78 71.83 6.17 38.06893 70 79.96 �9.96 99.20164 90 88.09 1.91 3.64814 93 88.09 4.91 24.10816 103 104.35 �1.35 1.8225

169.7392Step 6 Substitute in the formula and find sest.

In this case, the standard deviation of observed values about the predictedvalues is 6.51.

The standard error of the estimate can also be found by using the formula

sest �A�y2 � a �y � b �xy

n � 2

sest �A��y � y��2

n � 2�A

169.73926 � 2

� 6.51

y�

Example 10–13 Find the standard error of the estimate for the data for Example 10–12 by using thepreceding formula. The equation of the regression line is y� � 55.57 � 8.13x.

Solution

Step 1 Make a table.

Step 2 Find the product of x and y values, and place the results in the third column.

Step 3 Square the y values, and place the results in the fourth column.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 571

步驟 3

針對每一個 y,減去 y′,並且把結果擺在標示為 y−y′ 的那一行。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 571

10–39

Step 2 Using the regression line equation y� � 55.57 � 8.13x, compute the predictedvalues y� for each x and place the results in the column labeled .

x � 1 y� � 55.57 � (8.13)(1) � 63.70

x � 2 y� � 55.57 � (8.13)(2) � 71.83

x � 3 y� � 55.57 � (8.13)(3) � 79.96

x � 4 y� � 55.57 � (8.13)(4) � 88.09

x � 6 y� � 55.57 � (8.13)(6) � 104.35

Step 3 For each y, subtract y� and place the answer in the column labeled y � y�.

62 � 63.70 � �1.70 90 � 88.09 � 1.91

78 � 71.83 � 6.17 93 � 88.09 � 4.91

70 � 79.96 � �9.96 103 � 104.35 � �1.35

Step 4 Square the numbers found in step 3 and place the squares in the columnlabeled (y � y�)2.

Step 5 Find the sum of the numbers in the last column. The completed table isshown.

x y y� y � y� ( y � y�)2

1 62 63.70 �1.70 2.892 78 71.83 6.17 38.06893 70 79.96 �9.96 99.20164 90 88.09 1.91 3.64814 93 88.09 4.91 24.10816 103 104.35 �1.35 1.8225

169.7392Step 6 Substitute in the formula and find sest.

In this case, the standard deviation of observed values about the predictedvalues is 6.51.

The standard error of the estimate can also be found by using the formula

sest �A�y2 � a �y � b �xy

n � 2

sest �A��y � y��2

n � 2�A

169.73926 � 2

� 6.51

y�

Example 10–13 Find the standard error of the estimate for the data for Example 10–12 by using thepreceding formula. The equation of the regression line is y� � 55.57 � 8.13x.

Solution

Step 1 Make a table.

Step 2 Find the product of x and y values, and place the results in the third column.

Step 3 Square the y values, and place the results in the fourth column.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 571

步驟 4

將步驟 3 求出的每一個數字平方,並且把答案擺在標示為 (y−y′)2 的那一行。

步驟 5

求出最後一行的總和。完成的表格如下所示。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 571

10–39

Step 2 Using the regression line equation y� � 55.57 � 8.13x, compute the predictedvalues y� for each x and place the results in the column labeled .

x � 1 y� � 55.57 � (8.13)(1) � 63.70

x � 2 y� � 55.57 � (8.13)(2) � 71.83

x � 3 y� � 55.57 � (8.13)(3) � 79.96

x � 4 y� � 55.57 � (8.13)(4) � 88.09

x � 6 y� � 55.57 � (8.13)(6) � 104.35

Step 3 For each y, subtract y� and place the answer in the column labeled y � y�.

62 � 63.70 � �1.70 90 � 88.09 � 1.91

78 � 71.83 � 6.17 93 � 88.09 � 4.91

70 � 79.96 � �9.96 103 � 104.35 � �1.35

Step 4 Square the numbers found in step 3 and place the squares in the columnlabeled (y � y�)2.

Step 5 Find the sum of the numbers in the last column. The completed table isshown.

x y y� y � y� ( y � y�)2

1 62 63.70 �1.70 2.892 78 71.83 6.17 38.06893 70 79.96 �9.96 99.20164 90 88.09 1.91 3.64814 93 88.09 4.91 24.10816 103 104.35 �1.35 1.8225

169.7392Step 6 Substitute in the formula and find sest.

In this case, the standard deviation of observed values about the predictedvalues is 6.51.

The standard error of the estimate can also be found by using the formula

sest �A�y2 � a �y � b �xy

n � 2

sest �A��y � y��2

n � 2�A

169.73926 � 2

� 6.51

y�

Example 10–13 Find the standard error of the estimate for the data for Example 10–12 by using thepreceding formula. The equation of the regression line is y� � 55.57 � 8.13x.

Solution

Step 1 Make a table.

Step 2 Find the product of x and y values, and place the results in the third column.

Step 3 Square the y values, and place the results in the fourth column.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 571

步驟 6

代入公式並且求出 sest。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 571

10–39

Step 2 Using the regression line equation y� � 55.57 � 8.13x, compute the predictedvalues y� for each x and place the results in the column labeled .

x � 1 y� � 55.57 � (8.13)(1) � 63.70

x � 2 y� � 55.57 � (8.13)(2) � 71.83

x � 3 y� � 55.57 � (8.13)(3) � 79.96

x � 4 y� � 55.57 � (8.13)(4) � 88.09

x � 6 y� � 55.57 � (8.13)(6) � 104.35

Step 3 For each y, subtract y� and place the answer in the column labeled y � y�.

62 � 63.70 � �1.70 90 � 88.09 � 1.91

78 � 71.83 � 6.17 93 � 88.09 � 4.91

70 � 79.96 � �9.96 103 � 104.35 � �1.35

Step 4 Square the numbers found in step 3 and place the squares in the columnlabeled (y � y�)2.

Step 5 Find the sum of the numbers in the last column. The completed table isshown.

x y y� y � y� ( y � y�)2

1 62 63.70 �1.70 2.892 78 71.83 6.17 38.06893 70 79.96 �9.96 99.20164 90 88.09 1.91 3.64814 93 88.09 4.91 24.10816 103 104.35 �1.35 1.8225

169.7392Step 6 Substitute in the formula and find sest.

In this case, the standard deviation of observed values about the predictedvalues is 6.51.

The standard error of the estimate can also be found by using the formula

sest �A�y2 � a �y � b �xy

n � 2

sest �A��y � y��2

n � 2�A

169.73926 � 2

� 6.51

y�

Example 10–13 Find the standard error of the estimate for the data for Example 10–12 by using thepreceding formula. The equation of the regression line is y� � 55.57 � 8.13x.

Solution

Step 1 Make a table.

Step 2 Find the product of x and y values, and place the results in the third column.

Step 3 Square the y values, and place the results in the fourth column.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 571

這時候,觀察值關於預測值的標準差是 6.51。

Page 36: Simple Linear Regression from Blumen in Chinese

統計學

502

也可使用以下的公式求出估計的標準誤。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 571

10–39

Step 2 Using the regression line equation y� � 55.57 � 8.13x, compute the predictedvalues y� for each x and place the results in the column labeled .

x � 1 y� � 55.57 � (8.13)(1) � 63.70

x � 2 y� � 55.57 � (8.13)(2) � 71.83

x � 3 y� � 55.57 � (8.13)(3) � 79.96

x � 4 y� � 55.57 � (8.13)(4) � 88.09

x � 6 y� � 55.57 � (8.13)(6) � 104.35

Step 3 For each y, subtract y� and place the answer in the column labeled y � y�.

62 � 63.70 � �1.70 90 � 88.09 � 1.91

78 � 71.83 � 6.17 93 � 88.09 � 4.91

70 � 79.96 � �9.96 103 � 104.35 � �1.35

Step 4 Square the numbers found in step 3 and place the squares in the columnlabeled (y � y�)2.

Step 5 Find the sum of the numbers in the last column. The completed table isshown.

x y y� y � y� ( y � y�)2

1 62 63.70 �1.70 2.892 78 71.83 6.17 38.06893 70 79.96 �9.96 99.20164 90 88.09 1.91 3.64814 93 88.09 4.91 24.10816 103 104.35 �1.35 1.8225

169.7392Step 6 Substitute in the formula and find sest.

In this case, the standard deviation of observed values about the predictedvalues is 6.51.

The standard error of the estimate can also be found by using the formula

sest �A�y2 � a �y � b �xy

n � 2

sest �A��y � y��2

n � 2�A

169.73926 � 2

� 6.51

y�

Example 10–13 Find the standard error of the estimate for the data for Example 10–12 by using thepreceding formula. The equation of the regression line is y� � 55.57 � 8.13x.

Solution

Step 1 Make a table.

Step 2 Find the product of x and y values, and place the results in the third column.

Step 3 Square the y values, and place the results in the fourth column.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 571

為例題 10-12 的數據用前述的公式找到估計的標準誤。迴歸線的方程式是 y′=55.57+8.13x。

■解答

步驟 1

建立表格。

步驟 2

求出 x 和 y 的乘積,並且把結果放在第三行。

步驟 3

為 y 值取平方,並且把結果放在第四行。

步驟 4

求出第二行、第三行、第四行的總和。完成的表格如下所示。

572 Chapter 10 Correlation and Regression

10–40

Objective

Find a predictioninterval.

7

Formula for the Prediction Interval about a Value y �

with d.f. � n � 2.

� tA�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2y� � tA�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

Step 4 Find the sums of the second, third, and fourth columns. The completed table isshown here.

x y xy y2

1 62 62 3,8442 78 156 6,0843 70 210 4,9004 90 360 8,1004 93 372 8,6496 103 618 10,609

�y � 496 �xy � 1778 �y2 � 42,186

Step 5 From the regression equation y� � 55.57 � 8.13x, a � 55.57, and b � 8.13.

Step 6 Substitute in the formula and solve for sest.

This value is close to the value found in Example 10–12. The difference is dueto rounding.

Prediction IntervalThe standard error of the estimate can be used for constructing a prediction interval(similar to a confidence interval) about a y� value.

When a specific value x is substituted into the regression equation, the y� that you getis a point estimate for y. For example, if the regression line equation for the age of amachine and the monthly maintenance cost is y� � 55.57 � 8.13x (Example 10–12), thenthe predicted maintenance cost for a 3-year-old machine would be y� � 55.57 � 8.13(3),or $79.96. Since this is a point estimate, you have no idea how accurate it is. But you canconstruct a prediction interval about the estimate. By selecting an a value, you canachieve a (1 � a) • 100% confidence that the interval contains the actual mean of the yvalues that correspond to the given value of x.

The reason is that there are possible sources of prediction errors in finding the regres-sion line equation. One source occurs when finding the standard error of the estimate sest.Two others are errors made in estimating the slope and the y� intercept, since the equa-tion of the regression line will change somewhat if different random samples are usedwhen calculating the equation.

� �42,186 � �55.57��496� � �8.13��1778�

6 � 2� 6.48

sest � ��y2 � a �y � b �xyn � 2

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 572

步驟 5

從方程式 y′ =55.57+ 8.13x 求出 a 和 b,a=55.57,b=8.13。

步驟 6

代入公式並且求出 sest。

572 Chapter 10 Correlation and Regression

10–40

Objective

Find a predictioninterval.

7

Formula for the Prediction Interval about a Value y �

with d.f. � n � 2.

� tA�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2y� � tA�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

Step 4 Find the sums of the second, third, and fourth columns. The completed table isshown here.

x y xy y2

1 62 62 3,8442 78 156 6,0843 70 210 4,9004 90 360 8,1004 93 372 8,6496 103 618 10,609

�y � 496 �xy � 1778 �y2 � 42,186

Step 5 From the regression equation y� � 55.57 � 8.13x, a � 55.57, and b � 8.13.

Step 6 Substitute in the formula and solve for sest.

This value is close to the value found in Example 10–12. The difference is dueto rounding.

Prediction IntervalThe standard error of the estimate can be used for constructing a prediction interval(similar to a confidence interval) about a y� value.

When a specific value x is substituted into the regression equation, the y� that you getis a point estimate for y. For example, if the regression line equation for the age of amachine and the monthly maintenance cost is y� � 55.57 � 8.13x (Example 10–12), thenthe predicted maintenance cost for a 3-year-old machine would be y� � 55.57 � 8.13(3),or $79.96. Since this is a point estimate, you have no idea how accurate it is. But you canconstruct a prediction interval about the estimate. By selecting an a value, you canachieve a (1 � a) • 100% confidence that the interval contains the actual mean of the yvalues that correspond to the given value of x.

The reason is that there are possible sources of prediction errors in finding the regres-sion line equation. One source occurs when finding the standard error of the estimate sest.Two others are errors made in estimating the slope and the y� intercept, since the equa-tion of the regression line will change somewhat if different random samples are usedwhen calculating the equation.

� �42,186 � �55.57��496� � �8.13��1778�

6 � 2� 6.48

sest � ��y2 � a �y � b �xyn � 2

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 572

這個數字非常接近例題 10-12 所求出的答案,其中的差距是因為四捨五入所造成的誤差。

例題 10-13

Page 37: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

503

��預測區間

估計的標準誤可以用來建構關於 y′ 值的預測區間 (prediction interval)(類

似信賴區間)。

當某一個特定的 x 值被代入迴歸線方程式,會得到 y 的一種點預測 y′。比

如說,如果機器使用年數與維護費用的迴歸線方程式是 y′ = 55.57 +8.13x(例

題 10-12),則一部 3 年的機器的預測維護費用會是 y′ =55.57 +8.13(3) 或是

79.96 美元。因為這是一種點預測,你不會知道它有多準確。但是你可以為它

建構一種預測區間。透過選擇一個 α 值,你能求出一種有 (1 −α) · 100% 信心

包含真實反應變數 y 的區間。

理由是在發現迴歸線方程式的時候有幾種預測誤差的來源。第一種來源來

自發現估計標準誤的時候。第二種與第三種來自估計斜率與 y′ 截距的時候,

這是因為如果用不同樣本會帶出些微不一樣的方程式。

針對一個 y′ 的預測區間公式

572 Chapter 10 Correlation and Regression

10–40

Objective

Find a predictioninterval.

7

Formula for the Prediction Interval about a Value y �

with d.f. � n � 2.

� tA�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2y� � tA�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

Step 4 Find the sums of the second, third, and fourth columns. The completed table isshown here.

x y xy y2

1 62 62 3,8442 78 156 6,0843 70 210 4,9004 90 360 8,1004 93 372 8,6496 103 618 10,609

�y � 496 �xy � 1778 �y2 � 42,186

Step 5 From the regression equation y� � 55.57 � 8.13x, a � 55.57, and b � 8.13.

Step 6 Substitute in the formula and solve for sest.

This value is close to the value found in Example 10–12. The difference is dueto rounding.

Prediction IntervalThe standard error of the estimate can be used for constructing a prediction interval(similar to a confidence interval) about a y� value.

When a specific value x is substituted into the regression equation, the y� that you getis a point estimate for y. For example, if the regression line equation for the age of amachine and the monthly maintenance cost is y� � 55.57 � 8.13x (Example 10–12), thenthe predicted maintenance cost for a 3-year-old machine would be y� � 55.57 � 8.13(3),or $79.96. Since this is a point estimate, you have no idea how accurate it is. But you canconstruct a prediction interval about the estimate. By selecting an a value, you canachieve a (1 � a) • 100% confidence that the interval contains the actual mean of the yvalues that correspond to the given value of x.

The reason is that there are possible sources of prediction errors in finding the regres-sion line equation. One source occurs when finding the standard error of the estimate sest.Two others are errors made in estimating the slope and the y� intercept, since the equa-tion of the regression line will change somewhat if different random samples are usedwhen calculating the equation.

� �42,186 � �55.57��496� � �8.13��1778�

6 � 2� 6.48

sest � ��y2 � a �y � b �xyn � 2

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 572

自由度是 d.f.= n− 2。

學習目標 浤

求出預測區間。

針對例題 10-12 的數據,求出 3 年機器每月維護費用的 95% 預測區間。

■解答

步驟 1

求出 ∑x, ∑x2 和 X。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 573

10–41

Example 10–14 For the data in Example 10–12, find the 95% prediction interval for the monthlymaintenance cost of a machine that is 3 years old.

Solution

Step 1 Find �x, �x2, and .

�x � 20 �x2 � 82

Step 2 Find y� for x � 3.

y� � 55.57 � 8.13x

� 55.57 � 8.13(3) � 79.96

Step 3 Find sest.

sest � 6.48

as shown in Example 10–13.

Step 4 Substitute in the formula and solve: ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%.

79.96 � (2.776)(6.48)(1.08) � y � 79.96 � (2.776)(6.48)(1.08)79.96 � 19.43 � y � 79.96 � 19.43

60.53 � y � 99.39

Hence, you can be 95% confident that the interval 60.53 � y � 99.39 containsthe actual value of y.

Applying the Concepts 10–3

Interpreting Simple Linear RegressionAnswer the questions about the following computer-generated information.

Linear correlation coefficient r � 0.794556Coefficient of determination � 0.631319Standard error of estimate � 12.9668Explained variation � 5182.41Unexplained variation � 3026.49Total variation � 8208.90Equation of regression lineLevel of significance � 0.1Test statistic � 0.794556Critical value � 0.378419

1. Are both variables moving in the same direction?

2. Which number measures the distances from the prediction line to the actual values?

y� � 0.725983X � 16.5523

� �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2

79.96 � �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2 � y � 79.96

� ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2

y� � ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

X �206

� 3.3

X

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 573

步驟 2

求出當 x=3 的 y′。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 573

10–41

Example 10–14 For the data in Example 10–12, find the 95% prediction interval for the monthlymaintenance cost of a machine that is 3 years old.

Solution

Step 1 Find �x, �x2, and .

�x � 20 �x2 � 82

Step 2 Find y� for x � 3.

y� � 55.57 � 8.13x

� 55.57 � 8.13(3) � 79.96

Step 3 Find sest.

sest � 6.48

as shown in Example 10–13.

Step 4 Substitute in the formula and solve: ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%.

79.96 � (2.776)(6.48)(1.08) � y � 79.96 � (2.776)(6.48)(1.08)79.96 � 19.43 � y � 79.96 � 19.43

60.53 � y � 99.39

Hence, you can be 95% confident that the interval 60.53 � y � 99.39 containsthe actual value of y.

Applying the Concepts 10–3

Interpreting Simple Linear RegressionAnswer the questions about the following computer-generated information.

Linear correlation coefficient r � 0.794556Coefficient of determination � 0.631319Standard error of estimate � 12.9668Explained variation � 5182.41Unexplained variation � 3026.49Total variation � 8208.90Equation of regression lineLevel of significance � 0.1Test statistic � 0.794556Critical value � 0.378419

1. Are both variables moving in the same direction?

2. Which number measures the distances from the prediction line to the actual values?

y� � 0.725983X � 16.5523

� �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2

79.96 � �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2 � y � 79.96

� ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2

y� � ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

X �206

� 3.3

X

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 573

步驟 3

求出 sest。

例題 10-14

Page 38: Simple Linear Regression from Blumen in Chinese

統計學

504

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 573

10–41

Example 10–14 For the data in Example 10–12, find the 95% prediction interval for the monthlymaintenance cost of a machine that is 3 years old.

Solution

Step 1 Find �x, �x2, and .

�x � 20 �x2 � 82

Step 2 Find y� for x � 3.

y� � 55.57 � 8.13x

� 55.57 � 8.13(3) � 79.96

Step 3 Find sest.

sest � 6.48

as shown in Example 10–13.

Step 4 Substitute in the formula and solve: ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%.

79.96 � (2.776)(6.48)(1.08) � y � 79.96 � (2.776)(6.48)(1.08)79.96 � 19.43 � y � 79.96 � 19.43

60.53 � y � 99.39

Hence, you can be 95% confident that the interval 60.53 � y � 99.39 containsthe actual value of y.

Applying the Concepts 10–3

Interpreting Simple Linear RegressionAnswer the questions about the following computer-generated information.

Linear correlation coefficient r � 0.794556Coefficient of determination � 0.631319Standard error of estimate � 12.9668Explained variation � 5182.41Unexplained variation � 3026.49Total variation � 8208.90Equation of regression lineLevel of significance � 0.1Test statistic � 0.794556Critical value � 0.378419

1. Are both variables moving in the same direction?

2. Which number measures the distances from the prediction line to the actual values?

y� � 0.725983X � 16.5523

� �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2

79.96 � �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2 � y � 79.96

� ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2

y� � ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

X �206

� 3.3

X

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 573

如例題 10-13 所示。

步驟 4

代入公式並且求解:tα/2 = 2.776,d.f.=6−2=4 以及針對 95% 的信心。

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 573

10–41

Example 10–14 For the data in Example 10–12, find the 95% prediction interval for the monthlymaintenance cost of a machine that is 3 years old.

Solution

Step 1 Find �x, �x2, and .

�x � 20 �x2 � 82

Step 2 Find y� for x � 3.

y� � 55.57 � 8.13x

� 55.57 � 8.13(3) � 79.96

Step 3 Find sest.

sest � 6.48

as shown in Example 10–13.

Step 4 Substitute in the formula and solve: ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%.

79.96 � (2.776)(6.48)(1.08) � y � 79.96 � (2.776)(6.48)(1.08)79.96 � 19.43 � y � 79.96 � 19.43

60.53 � y � 99.39

Hence, you can be 95% confident that the interval 60.53 � y � 99.39 containsthe actual value of y.

Applying the Concepts 10–3

Interpreting Simple Linear RegressionAnswer the questions about the following computer-generated information.

Linear correlation coefficient r � 0.794556Coefficient of determination � 0.631319Standard error of estimate � 12.9668Explained variation � 5182.41Unexplained variation � 3026.49Total variation � 8208.90Equation of regression lineLevel of significance � 0.1Test statistic � 0.794556Critical value � 0.378419

1. Are both variables moving in the same direction?

2. Which number measures the distances from the prediction line to the actual values?

y� � 0.725983X � 16.5523

� �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2

79.96 � �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2 � y � 79.96

� ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2

y� � ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

X �206

� 3.3

X

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 573

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 573

10–41

Example 10–14 For the data in Example 10–12, find the 95% prediction interval for the monthlymaintenance cost of a machine that is 3 years old.

Solution

Step 1 Find �x, �x2, and .

�x � 20 �x2 � 82

Step 2 Find y� for x � 3.

y� � 55.57 � 8.13x

� 55.57 � 8.13(3) � 79.96

Step 3 Find sest.

sest � 6.48

as shown in Example 10–13.

Step 4 Substitute in the formula and solve: ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%.

79.96 � (2.776)(6.48)(1.08) � y � 79.96 � (2.776)(6.48)(1.08)79.96 � 19.43 � y � 79.96 � 19.43

60.53 � y � 99.39

Hence, you can be 95% confident that the interval 60.53 � y � 99.39 containsthe actual value of y.

Applying the Concepts 10–3

Interpreting Simple Linear RegressionAnswer the questions about the following computer-generated information.

Linear correlation coefficient r � 0.794556Coefficient of determination � 0.631319Standard error of estimate � 12.9668Explained variation � 5182.41Unexplained variation � 3026.49Total variation � 8208.90Equation of regression lineLevel of significance � 0.1Test statistic � 0.794556Critical value � 0.378419

1. Are both variables moving in the same direction?

2. Which number measures the distances from the prediction line to the actual values?

y� � 0.725983X � 16.5523

� �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2

79.96 � �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2 � y � 79.96

� ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2

y� � ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

X �206

� 3.3

X

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 573

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 573

10–41

Example 10–14 For the data in Example 10–12, find the 95% prediction interval for the monthlymaintenance cost of a machine that is 3 years old.

Solution

Step 1 Find �x, �x2, and .

�x � 20 �x2 � 82

Step 2 Find y� for x � 3.

y� � 55.57 � 8.13x

� 55.57 � 8.13(3) � 79.96

Step 3 Find sest.

sest � 6.48

as shown in Example 10–13.

Step 4 Substitute in the formula and solve: ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%.

79.96 � (2.776)(6.48)(1.08) � y � 79.96 � (2.776)(6.48)(1.08)79.96 � 19.43 � y � 79.96 � 19.43

60.53 � y � 99.39

Hence, you can be 95% confident that the interval 60.53 � y � 99.39 containsthe actual value of y.

Applying the Concepts 10–3

Interpreting Simple Linear RegressionAnswer the questions about the following computer-generated information.

Linear correlation coefficient r � 0.794556Coefficient of determination � 0.631319Standard error of estimate � 12.9668Explained variation � 5182.41Unexplained variation � 3026.49Total variation � 8208.90Equation of regression lineLevel of significance � 0.1Test statistic � 0.794556Critical value � 0.378419

1. Are both variables moving in the same direction?

2. Which number measures the distances from the prediction line to the actual values?

y� � 0.725983X � 16.5523

� �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2

79.96 � �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2 � y � 79.96

� ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2

y� � ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

X �206

� 3.3

X

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 573

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 573

10–41

Example 10–14 For the data in Example 10–12, find the 95% prediction interval for the monthlymaintenance cost of a machine that is 3 years old.

Solution

Step 1 Find �x, �x2, and .

�x � 20 �x2 � 82

Step 2 Find y� for x � 3.

y� � 55.57 � 8.13x

� 55.57 � 8.13(3) � 79.96

Step 3 Find sest.

sest � 6.48

as shown in Example 10–13.

Step 4 Substitute in the formula and solve: ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%.

79.96 � (2.776)(6.48)(1.08) � y � 79.96 � (2.776)(6.48)(1.08)79.96 � 19.43 � y � 79.96 � 19.43

60.53 � y � 99.39

Hence, you can be 95% confident that the interval 60.53 � y � 99.39 containsthe actual value of y.

Applying the Concepts 10–3

Interpreting Simple Linear RegressionAnswer the questions about the following computer-generated information.

Linear correlation coefficient r � 0.794556Coefficient of determination � 0.631319Standard error of estimate � 12.9668Explained variation � 5182.41Unexplained variation � 3026.49Total variation � 8208.90Equation of regression lineLevel of significance � 0.1Test statistic � 0.794556Critical value � 0.378419

1. Are both variables moving in the same direction?

2. Which number measures the distances from the prediction line to the actual values?

y� � 0.725983X � 16.5523

� �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2

79.96 � �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2 � y � 79.96

� ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2

y� � ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

X �206

� 3.3

X

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 573

Section 10–3 Coefficient of Determination and Standard Error of the Estimate 573

10–41

Example 10–14 For the data in Example 10–12, find the 95% prediction interval for the monthlymaintenance cost of a machine that is 3 years old.

Solution

Step 1 Find �x, �x2, and .

�x � 20 �x2 � 82

Step 2 Find y� for x � 3.

y� � 55.57 � 8.13x

� 55.57 � 8.13(3) � 79.96

Step 3 Find sest.

sest � 6.48

as shown in Example 10–13.

Step 4 Substitute in the formula and solve: ta�2 � 2.776, d.f. � 6 � 2 � 4 for 95%.

79.96 � (2.776)(6.48)(1.08) � y � 79.96 � (2.776)(6.48)(1.08)79.96 � 19.43 � y � 79.96 � 19.43

60.53 � y � 99.39

Hence, you can be 95% confident that the interval 60.53 � y � 99.39 containsthe actual value of y.

Applying the Concepts 10–3

Interpreting Simple Linear RegressionAnswer the questions about the following computer-generated information.

Linear correlation coefficient r � 0.794556Coefficient of determination � 0.631319Standard error of estimate � 12.9668Explained variation � 5182.41Unexplained variation � 3026.49Total variation � 8208.90Equation of regression lineLevel of significance � 0.1Test statistic � 0.794556Critical value � 0.378419

1. Are both variables moving in the same direction?

2. Which number measures the distances from the prediction line to the actual values?

y� � 0.725983X � 16.5523

� �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2

79.96 � �2.776��6.48� �1 �16

�6�3 � 3.3�2

6�82� � �20�2 � y � 79.96

� ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2

y� � ta�2sest�1 �1n

�n�x � X�2

n �x2 � ��x�2 � y � y�

X �206

� 3.3

X

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 573

因此,你可以有 95% 的信心認為區間 60.53<y<99.39 包含真正的 y。

觀念應用 10-3

解讀簡單線性迴歸

使用以下的電腦報表回答問題。

線性相關係數 r =0.794556

決定係數=0.631319

估計的標準誤=12.9668

可解釋的變異=5182.41

無法解釋的變異=3026.49

總變異=8208.90

迴歸線方程式 y′=0.725983X+16.5523

顯著水準=0.1

檢定統計量=0.794556

臨界值=0.378419

1. 這兩個變數朝同一個方向改變嗎?

2. 哪一個數字測量預測線和真實數值之間的距離?

3. 哪一個數字是迴歸線的斜率?

4. 哪一個數字是迴歸線的 y 截距?

Page 39: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

505

5. 可以在表格求出哪一個臨界值?

6. 哪一個數字是犯型 I 錯誤的允許風險?

7. 哪一個數字測量迴歸解釋的變異?

8. 哪一個數字測量數據點在迴歸線四周的散佈程度?

9. 虛無假設為何?

10. 為了知道是否應該拒絕虛無假設,會和臨界值比較哪一個數字?

11. 虛無假設應該被拒絕嗎?

答案在第 509 頁。

1. 可解釋的變異是什麼意思?如何計算?

2. 無法解釋的變異是什麼意思?如何計算?

3. 總變異是什麼意思?如何計算?

4. 如何求出決定係數?

5. 如何求出無決定係數?

針對練習題 6 到 8,求出決定係數和無決定係

數,並且解釋之。

6. r= 0.75

7. r= 0.42

8. r= 0.91

9. 計算第 10-1 節練習題 7 估計的標準誤。迴歸

線方程式請參考第 10-2 節練習題 7 的結果。

10. 計算第 10-1 節練習題 8 估計的標準誤。迴歸

線方程式請參考第 10-2 節練習題 8 的結果。

11. 針對第 10-1 節和第 10-2 節練習題 7 以及第

10-3 節練習題 9 的數據,求出當 x =200 的

90% 預測區間。

12. 針對第 10-1 節和第 10-2 節練習題 8 以及第

10-3 節練習題 10 的數據,求出當 x =4 的

90% 預測區間。

練習題 10-3

��結語

真實世界的變數間有太多關係了。決定是不是有某種線性關係的方法是使

用已知的統計技術:相關與迴歸。一般是使用相關係數測量線性關係的

強度與方向。它的值介於−1 和+1 之間。相關係數的值愈接近−1 或是

+1,變數間的線性關係強度愈強。如果是−1 或是+1,表示一種完美的

線性關係。兩變數的正關係表示小的獨立變數會跟著小的依變數;大的獨

立變數會跟著大的依變數。兩變數的負關係表示小的獨立變數會跟著大的

依變數;大的獨立變數會跟著小的依變數。(10-1)

記住兩變數間的顯著關係不必然代表一個變數直接引起另一個變數。對某

Page 40: Simple Linear Regression from Blumen in Chinese

統計學

506

些例子而言,這是真的,但是也應該考慮其他可能性,諸如包含其他變數

(或許未知的)的複雜關係;和兩個變數交互作用的第三個變數,或是純

粹只是機會巧合。(10-1)

關係可能是線性的也可能是曲線的。為了決定形狀,可以繪製變數間的散

佈圖。如果關係是線性的,可以用一條直線近似數據。這一條直線叫做迴

歸線或是最適線。相關係數 r 愈接近−1 或+1,數據和迴歸線愈靠近。

(10-2)

殘差圖可以用來決定迴歸線是否適合用來預測。(10-3)

以決定係數指示線性關係強度會比相關係數來得好用。因為它指出依變數

的變異有多少百分比直接是因為獨立變數的變異。透過相關係數取平方並

且換成百分比而求出決定係數。(10-3)

另一個相關與迴歸會使用的統計量是估計的標準誤,它是 y 值關於 y′ 值

之標準差的估計。估計的標準誤可以用來建構某個特定 x 值的預測區間。

(10-3)

相關係數的公式:

Review Exercises 585

10–53

adjusted R2 579

coefficient of determination 569

correlation 534

correlation coefficient 539

dependent variable 535

extrapolation 556

independent variable 535

influential point orobservation 557

least-squares line 567

lurking variable 547

marginal change 555

multiple correlationcoefficient 578

multiple regression 575

multiple relationship 535

negative relationship 535

Pearson product momentcorrelation coefficient 539

population correlationcoefficient 543

positive relationship 535

prediction interval 572

regression 534

regression line 551

residual 567

residual plot 568

scatter plot 536

simple relationship 535

standard error of theestimate 570

Formula for the correlation coefficient:

Formula for the t test for the correlation coefficient:

The regression line equation:

y� � a � bx

where

Formula for the standard error of the estimate:

or

sest � ��y2 � a �y � b �xyn � 2

sest � ��(y � y�)2

n � 2

b �n(�xy) � (�x)(�y)n(�x2) � (�x)2

a �(�y)(�x2� � (�x)(�xy)

n(�x2) � (�x)2

t � r� n � 21 � r 2 d.f. � n � 2

r �n(�xy) � (�x)(�y)

2[n(�x2) � (�x)2][n(�y2) � (�y)2]

Formula for the prediction interval for a value y�:

d.f. � n � 2

Formula for the multiple correlation coefficient:

Formula for the F test for the multiple correlationcoefficient:

with d.f.N � n � k and d.f.D � n � k � 1.

Formula for the adjusted R2:

R2adj � 1 �

(1 � R2)(n � 1)n � k � 1

F �R2/k

(1 � R2)/(n � k � 1)

R � �r 2yx1

� r 2yx2

� 2ryx1� ryx2

� rx1x2

1 � r 2x1x2

� y� � tA/2sest �1 �1n

�n(x �X)2

n �x2 � (�x)2

y� � tA /2sest�1 �1n

�n(x �X)2

n �x2 � (�x)2 � y

Important Terms

Important Formulas

For Exercises 1 through 7, do a complete regressionanalysis by performing the following steps.

a. Draw the scatter plot.b. Compute the value of the correlation coefficient.c. Test the significance of the correlation coefficient ata � 0.01, using Table I.

d. Determine the regression line equation.e. Plot the regression line on the scatter plot.f. Predict y� for a specific value of x.

1. Passengers and Airline Fares The U.S. Departmentof Transportation Office of Aviation Analysisprovides the weekly average number of passengersper flight and the average one-way fare in dollarsfor common commercial routes. Randomly selectedflights are listed below with the reported data. Isthere evidence of a relationship between these twovariables? (10–1)(10–2)

Review Exercises

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 585

相關係數 t 檢定的公式:

Review Exercises 585

10–53

adjusted R2 579

coefficient of determination 569

correlation 534

correlation coefficient 539

dependent variable 535

extrapolation 556

independent variable 535

influential point orobservation 557

least-squares line 567

lurking variable 547

marginal change 555

multiple correlationcoefficient 578

multiple regression 575

multiple relationship 535

negative relationship 535

Pearson product momentcorrelation coefficient 539

population correlationcoefficient 543

positive relationship 535

prediction interval 572

regression 534

regression line 551

residual 567

residual plot 568

scatter plot 536

simple relationship 535

standard error of theestimate 570

Formula for the correlation coefficient:

Formula for the t test for the correlation coefficient:

The regression line equation:

y� � a � bx

where

Formula for the standard error of the estimate:

or

sest � ��y2 � a �y � b �xyn � 2

sest � ��(y � y�)2

n � 2

b �n(�xy) � (�x)(�y)n(�x2) � (�x)2

a �(�y)(�x2� � (�x)(�xy)

n(�x2) � (�x)2

t � r� n � 21 � r 2 d.f. � n � 2

r �n(�xy) � (�x)(�y)

2[n(�x2) � (�x)2][n(�y2) � (�y)2]

Formula for the prediction interval for a value y�:

d.f. � n � 2

Formula for the multiple correlation coefficient:

Formula for the F test for the multiple correlationcoefficient:

with d.f.N � n � k and d.f.D � n � k � 1.

Formula for the adjusted R2:

R2adj � 1 �

(1 � R2)(n � 1)n � k � 1

F �R2/k

(1 � R2)/(n � k � 1)

R � �r 2yx1

� r 2yx2

� 2ryx1� ryx2

� rx1x2

1 � r 2x1x2

� y� � tA/2sest �1 �1n

�n(x �X)2

n �x2 � (�x)2

y� � tA /2sest�1 �1n

�n(x �X)2

n �x2 � (�x)2 � y

Important Terms

Important Formulas

For Exercises 1 through 7, do a complete regressionanalysis by performing the following steps.

a. Draw the scatter plot.b. Compute the value of the correlation coefficient.c. Test the significance of the correlation coefficient ata � 0.01, using Table I.

d. Determine the regression line equation.e. Plot the regression line on the scatter plot.f. Predict y� for a specific value of x.

1. Passengers and Airline Fares The U.S. Departmentof Transportation Office of Aviation Analysisprovides the weekly average number of passengersper flight and the average one-way fare in dollarsfor common commercial routes. Randomly selectedflights are listed below with the reported data. Isthere evidence of a relationship between these twovariables? (10–1)(10–2)

Review Exercises

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 585

重要公式

coefficient of determination 決定係數 499

correlation 相關 468correlation coefficient 相關係數 473dependent variable 依變數 468extrapolation 外插 489independent variable 獨立變數 468influential point or observation 影響點或影響觀察值 490

least-squares line 最小平方線 497

lurking variable 潛伏變數 482marginal change 邊際變化 489multiple relationship 複關係 468negative relationship 負關係 469Pearson product moment correlation

coefficient Pearson 動差相關係數 473

population correlation coefficient 母體相關係數 478

positive relationship 正關係 468

prediction interval 預測區間 503regression 迴歸 468regression line 迴歸線 485residual 殘差 497residual plot 殘差圖 497scatter plot 散佈圖 470simple relationship 簡單關係 468standard error of the estimate 估計的標準誤 500

重要詞彙

Page 41: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

507

針對練習題 1 到 4,透過執行以下每一步驟完成

迴歸分析。

a. 繪製一張散佈圖。

b. 計算相關係數。

c. 使用表 I 在 α = 0.01之下檢定相關係數的

顯著性。

d. 決定迴歸線方程式。

e. 在散佈圖上畫出迴歸線。

f. 對於特定值 x,預測 y′ 值。

1. 旅客數與機票價格 美國交通部的航空分析

局提供每週每班飛機的平均旅客數以及經濟

艙單程機票的平均價格。隨機挑選的班機和

它們的數據如下所示。有證據支持這兩個變

數之間有關係嗎?(10-1)(10-2)

資料來源:www.fedstats.gov

航班

平均旅客數

單程機票的

平均價格

Pittsburgh–Washington, DC 310 $236Chicago–Pittsburgh 1388 105Cincinnati–New York City 750 339Denver–Phoenix 3019 96Denver–Los Angeles 2151 176Houston–Philadelphia 1104 180

2. 達陣次數和四分衛排名 以下數據顯示一組

NFL 隨機樣本的四分衛排名和其在球季的達

陣次數。這兩個變數之間有顯著的關係嗎?

(10-1)(10-2)

資料來源:New York Times Almanac.

586 Chapter 10 Correlation and Regression

10–54

Avg. no. of Avg. one-Flight passengers x way fare y

Pittsburgh–Washington, DC 310 $236Chicago–Pittsburgh 1388 105Cincinnati–New York City 750 339Denver–Phoenix 3019 96Denver–Los Angeles 2151 176Houston–Philadelphia 1104 180

Source: www.fedstats.gov

2. Elementary and Secondary Schools Schooldistrict information was examined for a random

selection of states. The data below show the numberof elementary schools and the number of secondaryschools for each particular state. Is there a significantrelationship between the variables? Predict the numberof secondary schools when the number of elementaryschools is 300. (10–1)(10–2)

Elementary 201 766 148 218 519 396 274

Secondary 50 280 27 41 108 82 63

Source: World Almanac.

3. Touchdowns and QB Ratings Listed below are thenumber of touchdown passes thrown in the season andthe quarterback rating for a random sample of NFLquarterbacks. Is there a significant linear relationshipbetween the variables? (10–1)(10–2)

TDs 34 21 15 22 34 26 23

QB rating 106 89 82 81 96 91 86

Source: New York Times Almanac.

4. Driver’s Age and Accidents A study is conductedto determine the relationship between a driver’s age

and the number of accidents he or she has over a 1-yearperiod. The data are shown here. (This informationwill be used for Exercise 8.) If there is a significantrelationship, predict the number of accidents of a driverwho is 28. (10–1)(10–2)

Driver’s age x 16 24 18 17 23 27 32

No. of accidents y 3 2 5 2 0 1 1

5. Typing Speed and Word Processing A researcherdesires to know whether the typing speed of a

secretary (in words per minute) is related to thetime (in hours) that it takes the secretary to learn touse a new word processing program. The data areshown.

Speed x 48 74 52 79 83 56 85 63 88 74 90 92

Time y 7 4 8 3.5 2 6 2.3 5 2.1 4.5 1.9 1.5

If there is a significant relationship, predict the time itwill take the average secretary who has a typing speedof 72 words per minute to learn the word processingprogram. (This information will be used for Exercises 9and 11.) (10–1)(10–2)

6. Protein and Diastolic Blood Pressure A studywas conducted with vegetarians to see whether the

number of grams of protein each ate per day was relatedto diastolic blood pressure. The data are given here.(This information will be used for Exercises 10 and 12.)If there is a significant relationship, predict the diastolicpressure of a vegetarian who consumes 8 grams ofprotein per day. (10–1)(10–2)

Grams x 4 6.5 5 5.5 8 10 9 8.2 10.5

Pressure y 73 79 83 82 84 92 88 86 95

7. Medical Specialties and Gender Although moreand more women are becoming physicians each year, it

is well known that men outnumber women in manyspecialties. Randomly selected specialties are listedbelow with the numbers of male and female physiciansin each. Can it be concluded that there is a significantrelationship between the two variables? Predict thenumber of male specialists when there are 2000 femalespecialists. (10–1)(10–2)

Specialty Female x Male y

Dermatology 3,482 6,506Emergency medicine 5,098 20,429Neurology 2,895 10,088Pediatric cardiology 459 1,241Radiology 1,218 7,574Forensic pathology 181 399Radiation oncology 968 3,215

Source: World Almanac.

8. For Exercise 4, find the standard error of the estimate.(10–3) 1.417* For calculation purposes only. No regressionshould be done.

9. For Exercise 5, find the standard error of the estimate. (10–3) 0.468* (TI value 0.513)

10. For Exercise 6, find the standard error of the estimate. (10–3) 2.89 (TI value 2.845)

11. For Exercise 5, find the 90% prediction interval fortime when the speed is 72 words per minute. (10–3)3.34 � y � 5.10*

12. For Exercise 6, find the 95% prediction intervalfor pressure when the number of grams is 8. (10–3)79 � y � 93

13. (Opt.) A study found a significant relationship among aperson’s years of experience on a particular job x1, thenumber of workdays missed per month x2, and theperson’s age y. The regression equation is y� � 12.8 �2.09x1 � 0.423x2. Predict a person’s age if he or she hasbeen employed for 4 years and has missed 2 workdays amonth. (10–4) 22.01*

14. (Opt.) Find R when ryx1� 0.681 and ryx2

� 0.872 andrx1x2

� 0.746. (10–4) R � 0.873

15. (Opt.) Find R2adj when R � 0.873, n � 10, and k � 3.

(10–4) R2adj � 0.643*

*Answers may vary due to rounding.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 586

586 Chapter 10 Correlation and Regression

10–54

Avg. no. of Avg. one-Flight passengers x way fare y

Pittsburgh–Washington, DC 310 $236Chicago–Pittsburgh 1388 105Cincinnati–New York City 750 339Denver–Phoenix 3019 96Denver–Los Angeles 2151 176Houston–Philadelphia 1104 180

Source: www.fedstats.gov

2. Elementary and Secondary Schools Schooldistrict information was examined for a random

selection of states. The data below show the numberof elementary schools and the number of secondaryschools for each particular state. Is there a significantrelationship between the variables? Predict the numberof secondary schools when the number of elementaryschools is 300. (10–1)(10–2)

Elementary 201 766 148 218 519 396 274

Secondary 50 280 27 41 108 82 63

Source: World Almanac.

3. Touchdowns and QB Ratings Listed below are thenumber of touchdown passes thrown in the season andthe quarterback rating for a random sample of NFLquarterbacks. Is there a significant linear relationshipbetween the variables? (10–1)(10–2)

TDs 34 21 15 22 34 26 23

QB rating 106 89 82 81 96 91 86

Source: New York Times Almanac.

4. Driver’s Age and Accidents A study is conductedto determine the relationship between a driver’s age

and the number of accidents he or she has over a 1-yearperiod. The data are shown here. (This informationwill be used for Exercise 8.) If there is a significantrelationship, predict the number of accidents of a driverwho is 28. (10–1)(10–2)

Driver’s age x 16 24 18 17 23 27 32

No. of accidents y 3 2 5 2 0 1 1

5. Typing Speed and Word Processing A researcherdesires to know whether the typing speed of a

secretary (in words per minute) is related to thetime (in hours) that it takes the secretary to learn touse a new word processing program. The data areshown.

Speed x 48 74 52 79 83 56 85 63 88 74 90 92

Time y 7 4 8 3.5 2 6 2.3 5 2.1 4.5 1.9 1.5

If there is a significant relationship, predict the time itwill take the average secretary who has a typing speedof 72 words per minute to learn the word processingprogram. (This information will be used for Exercises 9and 11.) (10–1)(10–2)

6. Protein and Diastolic Blood Pressure A studywas conducted with vegetarians to see whether the

number of grams of protein each ate per day was relatedto diastolic blood pressure. The data are given here.(This information will be used for Exercises 10 and 12.)If there is a significant relationship, predict the diastolicpressure of a vegetarian who consumes 8 grams ofprotein per day. (10–1)(10–2)

Grams x 4 6.5 5 5.5 8 10 9 8.2 10.5

Pressure y 73 79 83 82 84 92 88 86 95

7. Medical Specialties and Gender Although moreand more women are becoming physicians each year, it

is well known that men outnumber women in manyspecialties. Randomly selected specialties are listedbelow with the numbers of male and female physiciansin each. Can it be concluded that there is a significantrelationship between the two variables? Predict thenumber of male specialists when there are 2000 femalespecialists. (10–1)(10–2)

Specialty Female x Male y

Dermatology 3,482 6,506Emergency medicine 5,098 20,429Neurology 2,895 10,088Pediatric cardiology 459 1,241Radiology 1,218 7,574Forensic pathology 181 399Radiation oncology 968 3,215

Source: World Almanac.

8. For Exercise 4, find the standard error of the estimate.(10–3) 1.417* For calculation purposes only. No regressionshould be done.

9. For Exercise 5, find the standard error of the estimate. (10–3) 0.468* (TI value 0.513)

10. For Exercise 6, find the standard error of the estimate. (10–3) 2.89 (TI value 2.845)

11. For Exercise 5, find the 90% prediction interval fortime when the speed is 72 words per minute. (10–3)3.34 � y � 5.10*

12. For Exercise 6, find the 95% prediction intervalfor pressure when the number of grams is 8. (10–3)79 � y � 93

13. (Opt.) A study found a significant relationship among aperson’s years of experience on a particular job x1, thenumber of workdays missed per month x2, and theperson’s age y. The regression equation is y� � 12.8 �2.09x1 � 0.423x2. Predict a person’s age if he or she hasbeen employed for 4 years and has missed 2 workdays amonth. (10–4) 22.01*

14. (Opt.) Find R when ryx1� 0.681 and ryx2

� 0.872 andrx1x2

� 0.746. (10–4) R � 0.873

15. (Opt.) Find R2adj when R � 0.873, n � 10, and k � 3.

(10–4) R2adj � 0.643*

*Answers may vary due to rounding.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 586

達陣次數 (TD)

四分衛排名

3. 打字速度和文件處理 有一位研究員很想知

道祕書的打字速度(每分鐘幾個字)和學習

新文件處理程式的時間(以小時計)是不是

有關係?數據如下所示。

586 Chapter 10 Correlation and Regression

10–54

Avg. no. of Avg. one-Flight passengers x way fare y

Pittsburgh–Washington, DC 310 $236Chicago–Pittsburgh 1388 105Cincinnati–New York City 750 339Denver–Phoenix 3019 96Denver–Los Angeles 2151 176Houston–Philadelphia 1104 180

Source: www.fedstats.gov

2. Elementary and Secondary Schools Schooldistrict information was examined for a random

selection of states. The data below show the numberof elementary schools and the number of secondaryschools for each particular state. Is there a significantrelationship between the variables? Predict the numberof secondary schools when the number of elementaryschools is 300. (10–1)(10–2)

Elementary 201 766 148 218 519 396 274

Secondary 50 280 27 41 108 82 63

Source: World Almanac.

3. Touchdowns and QB Ratings Listed below are thenumber of touchdown passes thrown in the season andthe quarterback rating for a random sample of NFLquarterbacks. Is there a significant linear relationshipbetween the variables? (10–1)(10–2)

TDs 34 21 15 22 34 26 23

QB rating 106 89 82 81 96 91 86

Source: New York Times Almanac.

4. Driver’s Age and Accidents A study is conductedto determine the relationship between a driver’s age

and the number of accidents he or she has over a 1-yearperiod. The data are shown here. (This informationwill be used for Exercise 8.) If there is a significantrelationship, predict the number of accidents of a driverwho is 28. (10–1)(10–2)

Driver’s age x 16 24 18 17 23 27 32

No. of accidents y 3 2 5 2 0 1 1

5. Typing Speed and Word Processing A researcherdesires to know whether the typing speed of a

secretary (in words per minute) is related to thetime (in hours) that it takes the secretary to learn touse a new word processing program. The data areshown.

Speed x 48 74 52 79 83 56 85 63 88 74 90 92

Time y 7 4 8 3.5 2 6 2.3 5 2.1 4.5 1.9 1.5

If there is a significant relationship, predict the time itwill take the average secretary who has a typing speedof 72 words per minute to learn the word processingprogram. (This information will be used for Exercises 9and 11.) (10–1)(10–2)

6. Protein and Diastolic Blood Pressure A studywas conducted with vegetarians to see whether the

number of grams of protein each ate per day was relatedto diastolic blood pressure. The data are given here.(This information will be used for Exercises 10 and 12.)If there is a significant relationship, predict the diastolicpressure of a vegetarian who consumes 8 grams ofprotein per day. (10–1)(10–2)

Grams x 4 6.5 5 5.5 8 10 9 8.2 10.5

Pressure y 73 79 83 82 84 92 88 86 95

7. Medical Specialties and Gender Although moreand more women are becoming physicians each year, it

is well known that men outnumber women in manyspecialties. Randomly selected specialties are listedbelow with the numbers of male and female physiciansin each. Can it be concluded that there is a significantrelationship between the two variables? Predict thenumber of male specialists when there are 2000 femalespecialists. (10–1)(10–2)

Specialty Female x Male y

Dermatology 3,482 6,506Emergency medicine 5,098 20,429Neurology 2,895 10,088Pediatric cardiology 459 1,241Radiology 1,218 7,574Forensic pathology 181 399Radiation oncology 968 3,215

Source: World Almanac.

8. For Exercise 4, find the standard error of the estimate.(10–3) 1.417* For calculation purposes only. No regressionshould be done.

9. For Exercise 5, find the standard error of the estimate. (10–3) 0.468* (TI value 0.513)

10. For Exercise 6, find the standard error of the estimate. (10–3) 2.89 (TI value 2.845)

11. For Exercise 5, find the 90% prediction interval fortime when the speed is 72 words per minute. (10–3)3.34 � y � 5.10*

12. For Exercise 6, find the 95% prediction intervalfor pressure when the number of grams is 8. (10–3)79 � y � 93

13. (Opt.) A study found a significant relationship among aperson’s years of experience on a particular job x1, thenumber of workdays missed per month x2, and theperson’s age y. The regression equation is y� � 12.8 �2.09x1 � 0.423x2. Predict a person’s age if he or she hasbeen employed for 4 years and has missed 2 workdays amonth. (10–4) 22.01*

14. (Opt.) Find R when ryx1� 0.681 and ryx2

� 0.872 andrx1x2

� 0.746. (10–4) R � 0.873

15. (Opt.) Find R2adj when R � 0.873, n � 10, and k � 3.

(10–4) R2adj � 0.643*

*Answers may vary due to rounding.

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 586

速度 x

時間 y

如果有顯著的關係,請預測打字速度每分鐘

72 個字的祕書需要多少時間學習新的文件處

理程式?(10-1)(10-2)

4. 醫藥專家與性別 雖然每一年的女醫師愈來

愈多,但是在許多科別中,男醫師的數目還

是高出很多。以下顯示隨機挑選的科別以及

從業的男醫師和女醫師人數。可以認為這兩

個數字之間有某種顯著的關係嗎?當女醫師

複 習 題

迴歸線方程式:

Review Exercises 585

10–53

adjusted R2 579

coefficient of determination 569

correlation 534

correlation coefficient 539

dependent variable 535

extrapolation 556

independent variable 535

influential point orobservation 557

least-squares line 567

lurking variable 547

marginal change 555

multiple correlationcoefficient 578

multiple regression 575

multiple relationship 535

negative relationship 535

Pearson product momentcorrelation coefficient 539

population correlationcoefficient 543

positive relationship 535

prediction interval 572

regression 534

regression line 551

residual 567

residual plot 568

scatter plot 536

simple relationship 535

standard error of theestimate 570

Formula for the correlation coefficient:

Formula for the t test for the correlation coefficient:

The regression line equation:

y� � a � bx

where

Formula for the standard error of the estimate:

or

sest � ��y2 � a �y � b �xyn � 2

sest � ��(y � y�)2

n � 2

b �n(�xy) � (�x)(�y)n(�x2) � (�x)2

a �(�y)(�x2� � (�x)(�xy)

n(�x2) � (�x)2

t � r� n � 21 � r 2 d.f. � n � 2

r �n(�xy) � (�x)(�y)

2[n(�x2) � (�x)2][n(�y2) � (�y)2]

Formula for the prediction interval for a value y�:

d.f. � n � 2

Formula for the multiple correlation coefficient:

Formula for the F test for the multiple correlationcoefficient:

with d.f.N � n � k and d.f.D � n � k � 1.

Formula for the adjusted R2:

R2adj � 1 �

(1 � R2)(n � 1)n � k � 1

F �R2/k

(1 � R2)/(n � k � 1)

R � �r 2yx1

� r 2yx2

� 2ryx1� ryx2

� rx1x2

1 � r 2x1x2

� y� � tA/2sest �1 �1n

�n(x �X)2

n �x2 � (�x)2

y� � tA /2sest�1 �1n

�n(x �X)2

n �x2 � (�x)2 � y

Important Terms

Important Formulas

For Exercises 1 through 7, do a complete regressionanalysis by performing the following steps.

a. Draw the scatter plot.b. Compute the value of the correlation coefficient.c. Test the significance of the correlation coefficient ata � 0.01, using Table I.

d. Determine the regression line equation.e. Plot the regression line on the scatter plot.f. Predict y� for a specific value of x.

1. Passengers and Airline Fares The U.S. Departmentof Transportation Office of Aviation Analysisprovides the weekly average number of passengersper flight and the average one-way fare in dollarsfor common commercial routes. Randomly selectedflights are listed below with the reported data. Isthere evidence of a relationship between these twovariables? (10–1)(10–2)

Review Exercises

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 585

其中

Review Exercises 585

10–53

adjusted R2 579

coefficient of determination 569

correlation 534

correlation coefficient 539

dependent variable 535

extrapolation 556

independent variable 535

influential point orobservation 557

least-squares line 567

lurking variable 547

marginal change 555

multiple correlationcoefficient 578

multiple regression 575

multiple relationship 535

negative relationship 535

Pearson product momentcorrelation coefficient 539

population correlationcoefficient 543

positive relationship 535

prediction interval 572

regression 534

regression line 551

residual 567

residual plot 568

scatter plot 536

simple relationship 535

standard error of theestimate 570

Formula for the correlation coefficient:

Formula for the t test for the correlation coefficient:

The regression line equation:

y� � a � bx

where

Formula for the standard error of the estimate:

or

sest � ��y2 � a �y � b �xyn � 2

sest � ��(y � y�)2

n � 2

b �n(�xy) � (�x)(�y)n(�x2) � (�x)2

a �(�y)(�x2� � (�x)(�xy)

n(�x2) � (�x)2

t � r� n � 21 � r 2 d.f. � n � 2

r �n(�xy) � (�x)(�y)

2[n(�x2) � (�x)2][n(�y2) � (�y)2]

Formula for the prediction interval for a value y�:

d.f. � n � 2

Formula for the multiple correlation coefficient:

Formula for the F test for the multiple correlationcoefficient:

with d.f.N � n � k and d.f.D � n � k � 1.

Formula for the adjusted R2:

R2adj � 1 �

(1 � R2)(n � 1)n � k � 1

F �R2/k

(1 � R2)/(n � k � 1)

R � �r 2yx1

� r 2yx2

� 2ryx1� ryx2

� rx1x2

1 � r 2x1x2

� y� � tA/2sest �1 �1n

�n(x �X)2

n �x2 � (�x)2

y� � tA /2sest�1 �1n

�n(x �X)2

n �x2 � (�x)2 � y

Important Terms

Important Formulas

For Exercises 1 through 7, do a complete regressionanalysis by performing the following steps.

a. Draw the scatter plot.b. Compute the value of the correlation coefficient.c. Test the significance of the correlation coefficient ata � 0.01, using Table I.

d. Determine the regression line equation.e. Plot the regression line on the scatter plot.f. Predict y� for a specific value of x.

1. Passengers and Airline Fares The U.S. Departmentof Transportation Office of Aviation Analysisprovides the weekly average number of passengersper flight and the average one-way fare in dollarsfor common commercial routes. Randomly selectedflights are listed below with the reported data. Isthere evidence of a relationship between these twovariables? (10–1)(10–2)

Review Exercises

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 585

估計的標準誤公式:

Review Exercises 585

10–53

adjusted R2 579

coefficient of determination 569

correlation 534

correlation coefficient 539

dependent variable 535

extrapolation 556

independent variable 535

influential point orobservation 557

least-squares line 567

lurking variable 547

marginal change 555

multiple correlationcoefficient 578

multiple regression 575

multiple relationship 535

negative relationship 535

Pearson product momentcorrelation coefficient 539

population correlationcoefficient 543

positive relationship 535

prediction interval 572

regression 534

regression line 551

residual 567

residual plot 568

scatter plot 536

simple relationship 535

standard error of theestimate 570

Formula for the correlation coefficient:

Formula for the t test for the correlation coefficient:

The regression line equation:

y� � a � bx

where

Formula for the standard error of the estimate:

or

sest � ��y2 � a �y � b �xyn � 2

sest � ��(y � y�)2

n � 2

b �n(�xy) � (�x)(�y)n(�x2) � (�x)2

a �(�y)(�x2� � (�x)(�xy)

n(�x2) � (�x)2

t � r� n � 21 � r 2 d.f. � n � 2

r �n(�xy) � (�x)(�y)

2[n(�x2) � (�x)2][n(�y2) � (�y)2]

Formula for the prediction interval for a value y�:

d.f. � n � 2

Formula for the multiple correlation coefficient:

Formula for the F test for the multiple correlationcoefficient:

with d.f.N � n � k and d.f.D � n � k � 1.

Formula for the adjusted R2:

R2adj � 1 �

(1 � R2)(n � 1)n � k � 1

F �R2/k

(1 � R2)/(n � k � 1)

R � �r 2yx1

� r 2yx2

� 2ryx1� ryx2

� rx1x2

1 � r 2x1x2

� y� � tA/2sest �1 �1n

�n(x �X)2

n �x2 � (�x)2

y� � tA /2sest�1 �1n

�n(x �X)2

n �x2 � (�x)2 � y

Important Terms

Important Formulas

For Exercises 1 through 7, do a complete regressionanalysis by performing the following steps.

a. Draw the scatter plot.b. Compute the value of the correlation coefficient.c. Test the significance of the correlation coefficient ata � 0.01, using Table I.

d. Determine the regression line equation.e. Plot the regression line on the scatter plot.f. Predict y� for a specific value of x.

1. Passengers and Airline Fares The U.S. Departmentof Transportation Office of Aviation Analysisprovides the weekly average number of passengersper flight and the average one-way fare in dollarsfor common commercial routes. Randomly selectedflights are listed below with the reported data. Isthere evidence of a relationship between these twovariables? (10–1)(10–2)

Review Exercises

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 585

Review Exercises 585

10–53

adjusted R2 579

coefficient of determination 569

correlation 534

correlation coefficient 539

dependent variable 535

extrapolation 556

independent variable 535

influential point orobservation 557

least-squares line 567

lurking variable 547

marginal change 555

multiple correlationcoefficient 578

multiple regression 575

multiple relationship 535

negative relationship 535

Pearson product momentcorrelation coefficient 539

population correlationcoefficient 543

positive relationship 535

prediction interval 572

regression 534

regression line 551

residual 567

residual plot 568

scatter plot 536

simple relationship 535

standard error of theestimate 570

Formula for the correlation coefficient:

Formula for the t test for the correlation coefficient:

The regression line equation:

y� � a � bx

where

Formula for the standard error of the estimate:

or

sest � ��y2 � a �y � b �xyn � 2

sest � ��(y � y�)2

n � 2

b �n(�xy) � (�x)(�y)n(�x2) � (�x)2

a �(�y)(�x2� � (�x)(�xy)

n(�x2) � (�x)2

t � r� n � 21 � r 2 d.f. � n � 2

r �n(�xy) � (�x)(�y)

2[n(�x2) � (�x)2][n(�y2) � (�y)2]

Formula for the prediction interval for a value y�:

d.f. � n � 2

Formula for the multiple correlation coefficient:

Formula for the F test for the multiple correlationcoefficient:

with d.f.N � n � k and d.f.D � n � k � 1.

Formula for the adjusted R2:

R2adj � 1 �

(1 � R2)(n � 1)n � k � 1

F �R2/k

(1 � R2)/(n � k � 1)

R � �r 2yx1

� r 2yx2

� 2ryx1� ryx2

� rx1x2

1 � r 2x1x2

� y� � tA/2sest �1 �1n

�n(x �X)2

n �x2 � (�x)2

y� � tA /2sest�1 �1n

�n(x �X)2

n �x2 � (�x)2 � y

Important Terms

Important Formulas

For Exercises 1 through 7, do a complete regressionanalysis by performing the following steps.

a. Draw the scatter plot.b. Compute the value of the correlation coefficient.c. Test the significance of the correlation coefficient ata � 0.01, using Table I.

d. Determine the regression line equation.e. Plot the regression line on the scatter plot.f. Predict y� for a specific value of x.

1. Passengers and Airline Fares The U.S. Departmentof Transportation Office of Aviation Analysisprovides the weekly average number of passengersper flight and the average one-way fare in dollarsfor common commercial routes. Randomly selectedflights are listed below with the reported data. Isthere evidence of a relationship between these twovariables? (10–1)(10–2)

Review Exercises

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 585

預測區間的公式:

Review Exercises 585

10–53

adjusted R2 579

coefficient of determination 569

correlation 534

correlation coefficient 539

dependent variable 535

extrapolation 556

independent variable 535

influential point orobservation 557

least-squares line 567

lurking variable 547

marginal change 555

multiple correlationcoefficient 578

multiple regression 575

multiple relationship 535

negative relationship 535

Pearson product momentcorrelation coefficient 539

population correlationcoefficient 543

positive relationship 535

prediction interval 572

regression 534

regression line 551

residual 567

residual plot 568

scatter plot 536

simple relationship 535

standard error of theestimate 570

Formula for the correlation coefficient:

Formula for the t test for the correlation coefficient:

The regression line equation:

y� � a � bx

where

Formula for the standard error of the estimate:

or

sest � ��y2 � a �y � b �xyn � 2

sest � ��(y � y�)2

n � 2

b �n(�xy) � (�x)(�y)n(�x2) � (�x)2

a �(�y)(�x2� � (�x)(�xy)

n(�x2) � (�x)2

t � r� n � 21 � r 2 d.f. � n � 2

r �n(�xy) � (�x)(�y)

2[n(�x2) � (�x)2][n(�y2) � (�y)2]

Formula for the prediction interval for a value y�:

d.f. � n � 2

Formula for the multiple correlation coefficient:

Formula for the F test for the multiple correlationcoefficient:

with d.f.N � n � k and d.f.D � n � k � 1.

Formula for the adjusted R2:

R2adj � 1 �

(1 � R2)(n � 1)n � k � 1

F �R2/k

(1 � R2)/(n � k � 1)

R � �r 2yx1

� r 2yx2

� 2ryx1� ryx2

� rx1x2

1 � r 2x1x2

� y� � tA/2sest �1 �1n

�n(x �X)2

n �x2 � (�x)2

y� � tA /2sest�1 �1n

�n(x �X)2

n �x2 � (�x)2 � y

Important Terms

Important Formulas

For Exercises 1 through 7, do a complete regressionanalysis by performing the following steps.

a. Draw the scatter plot.b. Compute the value of the correlation coefficient.c. Test the significance of the correlation coefficient ata � 0.01, using Table I.

d. Determine the regression line equation.e. Plot the regression line on the scatter plot.f. Predict y� for a specific value of x.

1. Passengers and Airline Fares The U.S. Departmentof Transportation Office of Aviation Analysisprovides the weekly average number of passengersper flight and the average one-way fare in dollarsfor common commercial routes. Randomly selectedflights are listed below with the reported data. Isthere evidence of a relationship between these twovariables? (10–1)(10–2)

Review Exercises

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 585

Page 42: Simple Linear Regression from Blumen in Chinese

統計學

508

是非題。如果答案是「非」,請解釋之。

1. 兩變數間的負關係意味著,在大部分時候,

當 x 變數遞增,y 變數也會遞增。

2. 即使相關係數很高(接近+ 1)或是很低

(接近−1),它還是有可能不顯著。

3. 顯著的相關係數不會純粹只是機會造成的。

選擇題

4. 哪一個數字顯示兩屬量變數間的線性關係強

度?

a. r

b. a

c. x

d. sest

5. 相關係數 r 的顯著性檢定的自由度是

a. 1

b. n

c. n−1

d. n−2

6. 決定係數的符號是

a. r

b. r2

c. a

d. b

填充題

7. x 變數叫做     變數。

8. 相關係數 r 的正負號永遠和     的正

負號一致。

9. 如果所有數據點都落在直線上,則相關係數

不是     就是     。

針對練習題 10 到 11,透過執行以下每一步驟完

成迴歸分析。

a. 繪製一張散佈圖。

b. 計算相關係數。

c. 使用表 I 在 α =0.05 之下檢定相關係數的

顯著性。

d. 決定迴歸線方程式。

e. 在散佈圖上畫出迴歸線。

f. 對於特定值 x,預測 y′ 值。10. 年紀和車禍 進行一項研究決定駕駛年紀與

過去一年內車禍次數的關係。數據如下所

示。如果有顯著的關係,預測 64 歲駕駛的

車禍次數。

588 Chapter 10 Correlation and Regression

10–56

15. The sign of r and will always be the same.b (slope)

16. The regression line is called the . Line of best fit

17. If all the points fall on a straight line, the value of r willbe or . �1, �1

For Exercises 18 through 21, do a complete regressionanalysis.

a. Draw the scatter plot.b. Compute the value of the correlation coefficient.c. Test the significance of the correlation coefficient ata � 0.05.

d. Determine the regression line equation.e. Plot the regression line on the scatter plot.f. Predict y� for a specific value of x.

18. Prescription Drug Prices A medical researcherwants to determine the relationship between the price

per dose of prescription drugs in the United States andthe price of the same dose in Australia. The data areshown. Describe the relationship.

U.S. price x 3.31 3.16 2.27 3.13 2.54 1.98 2.22

Australian price y 1.29 1.75 0.82 0.83 1.32 0.84 0.82

19. Age and Driving Accidents A study is conductedto determine the relationship between a driver’s age

and the number of accidents he or she has over a 1-yearperiod. The data are shown here. If there is a significantrelationship, predict the number of accidents of a driverwho is 64.

Driver’s age x 63 65 60 62 66 67 59

No. of accidents y 2 3 1 0 3 1 4

20. Age and Cavities A researcher desires to know ifthe age of a child is related to the number of cavities

he or she has. The data are shown here. If there is a

significant relationship, predict the number of cavitiesfor a child of 11.

Age of child x 6 8 9 10 12 14

No. of cavities y 2 1 3 4 6 5

21. Fat and Cholesterol A study is conducted with agroup of dieters to see if the number of grams of fat

each consumes per day is related to cholesterol level.The data are shown here. If there is a significantrelationship, predict the cholesterol level of a dieter whoconsumes 8.5 grams of fat per day.

Fat grams x 6.8 5.5 8.2 10 8.6 9.1 8.6 10.4

Cholesterol level y 183 201 193 283 222 250 190 218

22. For Exercise 20, find the standard error of the estimate.1.129*

23. For Exercise 21, find the standard error of the estimate.29.5* For calculation purposes only. No regression should be done.

24. For Exercise 20, find the 90% prediction interval of thenumber of cavities for a 7-year-old. 0 � y � 5*

25. For Exercise 21, find the 95% prediction interval ofthe cholesterol level of a person who consumes10 grams of fat. 217.5 (average of y� values is used since thereis no significant relationship)

26. (Opt.) A study was conducted, and a significantrelationship was found among the number of hours ateenager watches television per day x1, the number ofhours the teenager talks on the telephone per day x2,and the teenager’s weight y. The regression equationis y� � 98.7 � 3.82x1 � 6.51x2. Predict a teenager’sweight if she averages 3 hours of TV and 1.5 hours onthe phone per day. 119.9*

27. (Opt.) Find R when � 0.561 and � 0.714 and� 0.625. R � 0.729*

28. (Opt.) Find when R � 0.774, n � 8, and k � 2.R2

adj � 0.439**These answers may vary due to the method of calculation or rounding.

R2adj

rx1x2

ryx2ryx1

Product Sales When the points in a scatter plotshow a curvilinear trend rather than a linear trend,

statisticians have methods of fitting curves rather thanstraight lines to the data, thus obtaining a better fit and abetter prediction model. One type of curve that can be usedis the logarithmic regression curve. The data shown are thenumber of items of a new product sold over a period of15 months at a certain store. Notice that sales rise duringthe beginning months and then level off later on.

Month x 1 3 6 8 10 12 15

No. of items sold y 10 12 15 19 20 21 21

1. Draw the scatter plot for the data.

2. Find the equation of the regression line.

3. Describe how the line fits the data.

4. Using the log key on your calculator, transform the xvalues into log x values.

5. Using the log x values instead of the x values, find theequation of a and b for the regression line.

6. Next, plot the curve y � a � b log x on the graph.

7. Compare the line y � a � bx with the curve y � a �b log x and decide which one fits the data better.

8. Compute r, using the x and y values; then compute r,using the log x and y values. Which is higher?

9. In your opinion, which (the line or the logarithmiccurve) would be a better predictor for the data? Why?

Critical Thinking Challenges

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 588

駕駛年紀 x

車禍次數 y

11. 脂肪和膽固醇 追蹤一群節食者進行一項研

究,看看每日消耗脂肪重量和膽固醇水準有

小試身手

有 2000 人,預測男醫師的從業人數。(10-1)

(10-2)

資料來源:World Almanac.

專科 女醫師人數 x 男醫師人數 y

皮膚科 3,482 6,506急診醫學科 5,098 20,429神經內科 2,895 10,088小兒心臟科 459 1,241放射科 1,218 7,574病理學科 181 399放射腫瘤科 968 3,215

5. 針對練習題 3,求出估計的標準誤。

6. 針對練習題 3,求出當打字速度每分鐘 72 個

字時需要時間的 90% 預測區間。

Page 43: Simple Linear Regression from Blumen in Chinese

相關與迴歸10

509

觀念應用 10-1 煞車距離

1. 獨立變數是每小時幾英哩(MPH)。

2. 依變數是煞車距離(英呎)。

3. 獨立變數每小時幾英哩是連續的屬量變數。

4. 依變數煞車距離是連續的屬量變數。

5. 散佈圖如下所示。

20

100

0

200

300

400y

x

Scatter plot of braking distance vs. mph

40 60

Brak

ing

dist

ance

mph805030 70

煞車距離對 MPH 的散佈圖

(MPH)

煞車距離

6. 兩變數之間可能有某種線性關係,但是這一

組數據似乎有曲線關係。

7. 改變獨立變數數字間的距離,會改變關係看

起來的樣子。

8. 兩變數間的關係是正的  速度愈高,煞車

距離愈長。

9. 兩變數間的強烈關係建議我們可以用 MPH

準確預測煞車距離。不過,我們還是得關心

數據呈現出來的曲線關係。

10. 答案僅供參考。影響煞車距離的變數可能有

道路狀況、駕駛反應時間以及煞車的堪用狀

況。

11. 相關係數是 r=0.966。

12. 相關係數 r =0.966 在 α =0.05 之下是顯著

的。這和兩變數間的強烈正關係是一致的。

觀念應用 10-2 再次探討剎車距離

1. 迴歸線方程式是 y′= −151.90+6.4514x。

2. 由迴歸線斜率得知,每增加 1 MPH,平均而

言煞車距離就需增加 6.45 英呎。y 截距是車

速等於 0 MPH 時的煞車距離,語意上這是無

意義的,但是 y 截距卻是模型的重要部分。

3. y ′ = −151 .90 +6 .4514(45 ) =138 .4。當

MPH =45 的時候,煞車距離大概是 138 英

呎。

4. y′ = −151.90 +6.4514(100) =493.2。當

MPH =100 的時候,煞車距離大概是 493 英

呎。

5. 在 MPH 數據的範圍外預測煞車距離是不恰

當的(比如說,超過 100 MPH),因為我們

並不知道數據範圍外,兩變數之間的關係。

觀念應用 10-3 解讀簡單線性迴歸

1. 是。這兩個變數朝同一個方向改變。換言

之,兩變數間是正向關聯的。

2. 無法解釋的變異 3026.49 測量預測線和真實

數值之間的距離。

3. 迴歸線的斜率是 0.725983。

4. 迴歸線的 y 截距是 16.5523。

5. 可以在表格求出臨界值 0.378419。

6. 犯型 I 錯誤的允許風險是 0.10,即顯著水準。

觀念應用的答案

沒有關係。如果有某種顯著關係,預測某人

每日吃 8.5 公克脂肪的膽固醇水準。

588 Chapter 10 Correlation and Regression

10–56

15. The sign of r and will always be the same.b (slope)

16. The regression line is called the . Line of best fit

17. If all the points fall on a straight line, the value of r willbe or . �1, �1

For Exercises 18 through 21, do a complete regressionanalysis.

a. Draw the scatter plot.b. Compute the value of the correlation coefficient.c. Test the significance of the correlation coefficient ata � 0.05.

d. Determine the regression line equation.e. Plot the regression line on the scatter plot.f. Predict y� for a specific value of x.

18. Prescription Drug Prices A medical researcherwants to determine the relationship between the price

per dose of prescription drugs in the United States andthe price of the same dose in Australia. The data areshown. Describe the relationship.

U.S. price x 3.31 3.16 2.27 3.13 2.54 1.98 2.22

Australian price y 1.29 1.75 0.82 0.83 1.32 0.84 0.82

19. Age and Driving Accidents A study is conductedto determine the relationship between a driver’s age

and the number of accidents he or she has over a 1-yearperiod. The data are shown here. If there is a significantrelationship, predict the number of accidents of a driverwho is 64.

Driver’s age x 63 65 60 62 66 67 59

No. of accidents y 2 3 1 0 3 1 4

20. Age and Cavities A researcher desires to know ifthe age of a child is related to the number of cavities

he or she has. The data are shown here. If there is a

significant relationship, predict the number of cavitiesfor a child of 11.

Age of child x 6 8 9 10 12 14

No. of cavities y 2 1 3 4 6 5

21. Fat and Cholesterol A study is conducted with agroup of dieters to see if the number of grams of fat

each consumes per day is related to cholesterol level.The data are shown here. If there is a significantrelationship, predict the cholesterol level of a dieter whoconsumes 8.5 grams of fat per day.

Fat grams x 6.8 5.5 8.2 10 8.6 9.1 8.6 10.4

Cholesterol level y 183 201 193 283 222 250 190 218

22. For Exercise 20, find the standard error of the estimate.1.129*

23. For Exercise 21, find the standard error of the estimate.29.5* For calculation purposes only. No regression should be done.

24. For Exercise 20, find the 90% prediction interval of thenumber of cavities for a 7-year-old. 0 � y � 5*

25. For Exercise 21, find the 95% prediction interval ofthe cholesterol level of a person who consumes10 grams of fat. 217.5 (average of y� values is used since thereis no significant relationship)

26. (Opt.) A study was conducted, and a significantrelationship was found among the number of hours ateenager watches television per day x1, the number ofhours the teenager talks on the telephone per day x2,and the teenager’s weight y. The regression equationis y� � 98.7 � 3.82x1 � 6.51x2. Predict a teenager’sweight if she averages 3 hours of TV and 1.5 hours onthe phone per day. 119.9*

27. (Opt.) Find R when � 0.561 and � 0.714 and� 0.625. R � 0.729*

28. (Opt.) Find when R � 0.774, n � 8, and k � 2.R2

adj � 0.439**These answers may vary due to the method of calculation or rounding.

R2adj

rx1x2

ryx2ryx1

Product Sales When the points in a scatter plotshow a curvilinear trend rather than a linear trend,

statisticians have methods of fitting curves rather thanstraight lines to the data, thus obtaining a better fit and abetter prediction model. One type of curve that can be usedis the logarithmic regression curve. The data shown are thenumber of items of a new product sold over a period of15 months at a certain store. Notice that sales rise duringthe beginning months and then level off later on.

Month x 1 3 6 8 10 12 15

No. of items sold y 10 12 15 19 20 21 21

1. Draw the scatter plot for the data.

2. Find the equation of the regression line.

3. Describe how the line fits the data.

4. Using the log key on your calculator, transform the xvalues into log x values.

5. Using the log x values instead of the x values, find theequation of a and b for the regression line.

6. Next, plot the curve y � a � b log x on the graph.

7. Compare the line y � a � bx with the curve y � a �b log x and decide which one fits the data better.

8. Compute r, using the x and y values; then compute r,using the log x and y values. Which is higher?

9. In your opinion, which (the line or the logarithmiccurve) would be a better predictor for the data? Why?

Critical Thinking Challenges

lu38582_ch10_533-590.qxd 9/13/10 2:18 PM Page 588

脂肪重量 x

膽固醇 水準 y

12. 針對練習題 11,求出估計的標準誤。

13. 針對練習題 11,求出吃了 10 公克脂肪的某

人,他的膽固醇水準的 95% 預測區間。

Page 44: Simple Linear Regression from Blumen in Chinese

統計學

510

7. 迴歸可解釋的變異是 0.631319,或者大概是

63.1%。

8. 數據點在迴歸線四周的散佈程度是 12.9668,

估計的標準誤。

9. 虛無假設是無相關,H0:ρ = 0。

10. 我們會比較檢定數值 0.794556 和臨界值,決

定是否應該拒絕虛無假設。

11. 因為 0.794556 >0.378419,我們拒絕虛無假

設,並且發現有足夠的證據支持相關係數不

等於 0。