Chapter 6 Introduction to Inferential Statistics Sampling and Sampling Designs

Chapter 6 Introduction to Inferential Statistics

Sampling and Sampling Designs

What are samples?

σ2

Population

母體Sample

樣本

Ѕ2

x

Parameter

參數Statistic

統計量

Sampling

抽樣

Generalization

推論

誤差 Differences between parameters and

statistics=error• sampling error 抽樣誤差• non-sampling error 非抽樣誤差 (also called

measurement error)

Sampling error the degree to which a given sample differs

from the population sampling error tends to be high with small

sample sizes and will decrease as sample size increases

Target Population

group to which you wish to generalize the results of the study

should be defined as specifically as possible

populationsamplingframe

sample

Sampling Techniques

Nonprobability Sampling (nonrandom sampling) 非隨機抽樣

Probability Sampling (random sampling) 隨機抽樣

Nonprobability sampling

Convenience sampling 方便抽樣• getting people who are most conveniently

available• fast & low cost

Volunteers 自願樣本• units are self-selected

Characteristics of nonprobability samples members of the population DO NOT have

an equal chance of being selected

results cannot be generalized beyond the group being tested

Probability Sampling

sample should represent the population

using random selection methods

Types of Probability Sampling

Simple random sampling 簡單隨機抽樣

Systematic sampling 系統式抽樣

Stratified sampling 分層隨機抽樣

Cluster sampling 部落抽樣

Simple Random Sampling

every unit in the population has an equal and known probability of being selected as part of the sample ( 抽籤 )

e.g. in obtaining a sample of 10 subjects from a population of 1,000 people, everyone in the population would have a 1/100 chance of being selected (or p of .01)

亂數表1 2 3 4 5 6 7 8 9 10

1 49486 93775 88744 80091 92732 38532 41506 54131 44804 436372 94860 36746 04571 13150 65383 44616 97170 25057 02212 419303 10169 95685 47585 53247 60900 20097 97962 04267 29283 075504 12018 45351 15671 23026 55344 54654 73717 97666 00730 890835 45611 71585 61487 87434 07498 60596 36255 82880 84381 304336 89137 30984 18842 69619 53872 95200 76474 67528 14870 596287 94541 12057 30771 19598 96069 10399 50649 41909 09994 753228 89920 28843 87599 30181 26839 02162 56676 39342 95045 601469 32472 32796 15255 39636 90819 54150 24064 50514 15194 4145010 63958 47944 82888 66709 66525 67616 75709 56879 29649 07325

Characteristics of simple random sampling Unbiased: 母體內每一個體被抽到的機會

均等

Independence : 母體內某一個個體被抽到不會影響其他個體被抽到的機會

Limitations of simple random samples not practical for large populations

Simple random sampling becomes difficult when we dont have a list of the population

Systematic Sampling 系統性抽樣 a type of probability sampling in which

every kth member of the population is selected

k=N/n

N = size of the population

n = sample size

For example:

You want to obtain a sample of 200 from apopulation of 10,000. You would select every50th (or kth) person from the list.

k = 10000/200=50

Advantages/disadvantages of systematic sampling Assuming availability of a list of population

members

Randomness of the sample depends on randomness of the list • periodicity bias: 當母體個體排序出現某一週

期性或規則時 , systematic sampling 會有週期性誤差 (periodicity bias)

Stratified Random Sample 分層隨機抽樣 Prior to random sampling, the population is

divided into subgroups, called strata, e.g., gender, ethnic groups, professions, etc. 依母體特性將個體分層 (Strata) & 每一個體只屬一層

Subjects are then randomly selected from each strata 再從每一層中隨機抽取樣本(using simple random sampling)

第一層

第二層

第三層.....

第 K 層

Sample

Should select variables that are related to the dependent variable

Homogeneity is very high within the strata.

Heterogeneity is very high between the stratas

Why use stratified samples?

permits examination of subgroups by ensuring sufficient numbers of subjects within subgroups 確保樣本包含母體中各種不同特性的個體，增加樣本的代表性

generally more convenient than a simple random sample

Potential disadvantages

Sometimes the exact composition of the population is often unknown

with multiple stratifying variables, sampling designs can become quite complex

Types of Stratified Sampling

Proportionate Stratified Random Sampling 比例分層隨機抽樣

Disproportionate Stratified Random Sampling 非比例分層隨機抽樣

Proportionate Sampling

strata sample sizes are proportional to population subgroup sizes 按母體比例抽取樣本

• e.g., if a group represents 25% of the population, the stratum representing that group will comprise 25% of the sample

Disproportionate Sampling

strata sample sizes are not proportional to population subgroup sizes 每層抽出之樣本數不能與母體之特徵比例相呼應

may be used to achieve equal sample sizes across strata

For example:

Suppose a researcher plans to conduct a surveyregarding various attitudes of Agricultural College Students at Tunghai U. He wishes to compare perceptionsacross 4 major groups but finds some of the groups are quite small relative to the overall student population. As a result, he decides to over-sample minority students.For example, although Hospitality students only represent 10% of the Agricultural student population, he uses a disproportional stratified sample so that Hospitality students will comprise 25% of his sample.

Cluster Sampling 部落抽樣 used when subjects are randomly sampled

from within a "cluster" or unit (e.g., classroom, school, country, etc)

將母體分為若干部落 (cluster) ，在自所有部落中隨機抽取若干部落樣本並對這些抽取的部落作抽查

Cluster 1

Cluster 4

Cluster k

Cluster 3

Cluster 2

Cluster 5

Cluster 1

Cluster 3

Population Sample

Example

台中市民眾對薛凱莉事件看法將台中市依“里”為部落分成許多里隨機抽取 3 個里然後對此 3 個里的居民

作全面性的訪問 Compare using cluster sampling technique

and simple sampling technique

Why use cluster samples?

They're easier to obtain than a simple random or systematic sample of the same size

Disadvantages of Cluster Sampling Less accurate than other sampling

techniques (selection stages, accuracy)

Generally leads to violation of an assumption that subjects are independent

Sampling Distribution

抽樣分配

For the most part in social science, we want to know about the population. In reality, the parameters are often unknown.

The best thing we can do is to “guess” what our population should be like based on the info we get from a sample

results of a sample=the results of a population???

Sampling Distributions 抽樣分配 The “bridge” b/w information from the

sample to the population

a theoretical, probabilistic distribution of all possible samples of a given size,

在母體中重複抽取固定大小的隨機樣本，所有隨機樣本的統計值的機率分配稱為抽樣分配

Population

Sampling distribution

Sample

The relationship b/w population, sampling distribution, and sample.

= 100

etc. forall possiblesamplesof a givenN from thepopulation

98X

108X

92X

90X

102X

Sampling Distribution 定理當母體為 normal distribution, 我們重複抽

取固定大小的隨機樣本時 , 則此一抽樣分配會趨近 normal distribution 並且有一平均值及標準差

以五名學生的考試成績 (91, 92, 93, 94,95)為母體 , 母體的 mean 為 93 。試比較從5 名學生 ( 母體 ) 中隨機抽取 2 位學生作為樣本 (n=2) 和隨機抽取 3 位學生作為樣本之抽樣分配

When n=2

sample Sample mean sample Sample mean

91,92 91.5 92,94 93

91,93 92 92,95 93.5

91,94 92.5 93,94 93.5

91,95 93 93,95 94

92,93 92.5 94,95 94.5

When n=3

sample Sample mean sample Sample mean

91,92,93 92 91,94,95 93.33

91,92,94 92.33 92,93,94 93

91,92,95 92.67 92,93,95 93.33

91,93,94 92.67 92,94,95 93.67

91,93,95 93 93,94,95 94

Sampling distribution of sample mean• Mean of the sampling distribution = • St.D. of the sampling distribution (Standard

Error ) = σ2/N • Standard error ( 樣本平均數的標準誤 ) 告訴我們樣本平

均數對母體平均數的估計有多準確 N, Standard Error

Central Limit Theorem 中央極限定理

無論母體分配是否為 normal distribution, 當我們重複抽取固定大小的隨機樣本時 ,只要樣本的 N 夠大 (N100) ，則此一抽樣分配也會趨近 normal distribution

If n is sufficiently largeX ~N(, 2/n)

Summary of Sampling Distribution

若母體的分配式常態分配，則樣本平均的抽樣分配亦為常態分配

若母體的分配不是常態，則樣本平均的抽樣分配再樣本夠大時會近似常態分配

樣本平均值的平均會等於母體平均值樣本標準差的平均會比母體標準差小

Exercise

假設王品牛排每位顧客等待主菜的時間呈常態分配，平均等待時間為 10 分鐘，標準差為 2 分鐘。某餐旅研究生作服務品質調查，隨機抽選 16 名顧各瞭解其等待時間，試問該 16 名顧客平均等待時間超過11 分鐘的機率為何 ?

Sampling distribution of sample proportion( )• Mean of the sampling distribution of

= P

• Standard error of the sampling distribution of

=

p̂

p̂

p̂

p̂

p̂ n

pp )1(

Documents

Chapter 6 Introduction to Inferential Statistics Sampling and Sampling Designs