29
Chapter 9: Deciding on the Sampling Strategy 模模 9: 模模模模 抽抽 模模 模模 模模 模模模 / 模模模 ? 模模模模 ? 抽抽抽抽 抽抽 抽抽 抽抽 抽抽 抽抽 抽抽 抽抽

Chapter 9: Deciding on the Sampling Strategy 模块 9: 抽样策略

Embed Size (px)

DESCRIPTION

干预措施 或 政策. 评价 问题. 数据 收集. 设计. 方法. Chapter 9: Deciding on the Sampling Strategy 模块 9: 抽样策略. 抽样. 引言 概念 类型 置信度 / 精确度 ? 样本容量 ?. Introduction 引言. Introduction to Sampling 抽样简介 Types of Samples: Random and Non-Random 样本的类型:随机和非随机 How Confident and Precise Do You Need to Be? - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

Chapter 9: Deciding on the

Sampling Strategy 模块 9: 抽样策略

抽样

引言概念类型置信度 / 精确度 ?样本容量 ?

干预措施或

政策数据收集

设计评价问题

方法

Page 2: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

2

Introduction 引言

• Introduction to Sampling• 抽样简介• Types of Samples: Random and Non-Random• 样本的类型:随机和非随机• How Confident and Precise Do You Need to Be?• 你需要多大的可信度和精确度?• How Large a Sample Do You Need?• 你需要多大的样本?• Where to Find a Sampling Statistician?• 如何找到抽样调查统计员 ?

Page 3: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

3

Sampling 抽样

• Is it possible to collect data from the entire population? (census)

• 收集总体的数据可能吗? (普查)– If so, we can talk about what is true for the entire

population – 如果可以,我们能够说出总体的真实情况– Often we cannot (time/cost) – 经常的情况是我们不能 ( 时间 / 成本 )– If not, we can use a smaller subset: a SAMPLE– 如果不能,我们可以使用一个较小的子集:样本

Page 4: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

4

Concepts概念

• Population • 总体

– the total set of units – 各单元构成的整体

• Sample • 样本

– a subset of the population – 总体的一个子集

• Sampling Frame • 抽样框架

– list from which to select your sample – 一个列表,从中可以选取你要的样本

Page 5: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

5

More Sampling Concepts更多的抽样概念

• Sample Design • 样本设计

– methods of sampling (probability or non-probability) – 抽样的方法(概率抽样或非概率抽样 )

• Parameter • 参数

– characteristic of the population – 总体的特征

• Statistic • 统计

– characteristic of a sample – 样本的特征

Page 6: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

6

Random Sample随机样本

• A random sample allows us to make estimates about the larger population based on what we learn from the subset

• 一个随机样本允许我们基于从该子集(样本)所了解的情况,做出有关一个更大总体的估计

• Lottery, everyone has an equal chance• 博彩,每个人都有相同的机会• Advantages:• 优点:

– eliminates selection bias– 消除选择偏差– able to generalize to the population– 能够推断总体– cost-effective– 节省成本

Page 7: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

7

Types of Random Samples随机样本的类型

• Simple random sample• 简单随机样本• Random interval sample• 随机间隔样本• Stratified random sample• 分层随机样本• Random cluster sample• 随机整群样本• Multi-stage random sample• 多等级随机样本• Combination random sample• 合并随机样本

Page 8: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

8

Simple Random Sample简单随机样本

• Simplest• 最简单的一类• Establish a sample size and proceed to

randomly select units until we reach the sample size

• 先确定样本大小,然后进行随机地抽取直到获得预定数量的样本

• Uses a random number table to select units• 选择一个随机数量的表格来选取单位

Page 9: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

9

Random Interval Sample随机间隔样本

• Used when there is a sequential population that is not already enumerated and would be difficult or time consuming to enumerate

• 用于一个数列型的整体,这个整体还没有被清点清楚,或者清点清楚过于费事且困难

• Uses a random number table to select intervals• 使用一个随机的数目表格来选取间隔

99

Page 10: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

10

Stratified Random Sample分层随机样本

• Use when specific groups must be included that might otherwise be missed by using a simple random sample

• 总体中有若干个特定的子类,样本必须把这些子类都包含近来,但如果使用简单随机样本的话可能会遗漏某些子类。这时要使用分层随机样本。– usually a small proportion of the population– 通常是总体的一小部分

Page 11: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

11

总体

Stratified Random Sample分层随机样本

sub-popula-tion

子总体

Sub-population

子总体

sub-population子总体

simple random sample

简单随机样本

simple random sample

简单随机样本simple random

sample 简单随机样本

Total Population

Page 12: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

121212

Random Cluster Sample随机整体样本

• Another form of random sampling• 另一种随机抽样• Any naturally occurring aggregate of the units that are to be

sampled that are used when:• 任何的自然发生的单位的聚合,它们的样本化在下面情况下得到使用:

– you do not have a complete list of everyone in the population of interest but have a list of the clusters in which they occur or

– 你没有一个完整的名单,但是有一个名单,上面的参与者是连串的

– you have a complete list of everyone, but they are so widely distributed that it would be too time consuming and expensive to send data collectors out to a simple random sample

– 或者你有一个完整的名单,但是他们过于分散,因此给予收集者一个简单的随机样本过于费事且昂贵

Page 13: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

131313

Multi-stage Random Sample随机多层样本

• Combines two or more forms of random sampling• 结合 2 个或者多个种类的随机样本• Most commonly, it begins with random cluster sampling

and then applies sample random sampling or stratified random sampling

• 最经常的情况是,从随机的连串样本开始,然后运用到简单随机样本或者分层随机样本

Page 14: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

14

Combination Random Samples合并随机样本

More than one random sampling technique is used

不只一种随机抽样技巧被使用

Page 15: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

151515

Drawback of Random Cluster and Multi-stage Random Sampling

随机连贯以及多层随机样本的缺陷

• May not yield an accurate representation of the population

• 可能无法精确地描述整体

Page 16: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

16

Summary of Random Sampling Process随机抽样过程概述

Step Process

步骤 过程1 Obtain a complete listing of the entire population

取得总体的完整列表2 Assign each case a number

对总体内的所有个体进行编号3 Randomly select the sample using a random numbers table

使用随机数表随机地抽取样本4 When no numbered listing exists or is not practical to create:

如果不存在一个经过编号的列表或在操作上很难形成这样一个列表,则:• take a random start 随机开始• select every nth case 每隔 n 个个体选取一个作为样本

Page 17: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

17

Non-Random Samples非随机样本

• Can be more focused• 更具有针对性• Can make sure a small sample is

representative• 能够保证一个小样本具有代表性• Cannot make inferences to a larger

population• 无法推断一个更大总体的情况

Page 18: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

181818

Types of Non-random Samples非随机样本的种类

whoever is easiest to contact or whatever is easiest to observe 能联系到的任何人,或能观察到的任何事物

convenience方便

set criteria to achieve a specific mix of participants确定标准,实现特定的参与者的混合

purposeful (judgment)目的明确

ask people who else you should interview询问人们你还能采访谁

Snowball滚雪球效应

Page 19: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

19

Forms of Purposeful Samples有意识的样本的种类

• Typical cases (median)• 典型案例 (中间类型)• Maximum variation (heterogeneity)• 最大变化(异质性)• Quota• 配额• Extreme case• 极端例子• Confirming and disconfirming cases• 确认的以及否认的案例

1919

Page 20: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

20

Bias and Non-random Sampling 偏差和非随机抽样问题

• People selected in a biased way?• 选人的方法是否有偏差?• Are they substantially different from the rest of

the population?• 抽取的样本是否与总体的其它部分有重大的不同?• collect some data to show that the people

selected are fairly similar to the larger population (e.g. demographics)

• 收集一些数据来表明所选择的人与总体非常相似(例如人口统计)

Page 21: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

21

Combinations: Random and Non-Random

合并:随机样本和非随机样本• Example:• 举例:

– Non-randomly select two schools from poorest communities and two from the wealthiest communities

– 从最贫困的社区内选取 2 所学校,并且从最富裕的社区内选取 2 所学校

– Select a random sample of students from these four schools

– 从这 4 所学校中随机选取学生样本

Page 22: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

22

Possibility of Error误差的概率

• Sample different from the population?

• 样本与总体不同?• Statistics: data derived from random

samples

• 样本统计量:从随机样本得出的数据

Page 23: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

23

How confident do you wish to be?你希望要多大的可信度

– confidence level– 可信水平

• E.g., 90% (90% certain your sample results are an estimate of the population as a whole)

•例如 90% (能够 90% 地确定你的样本统计量是总体的估计值)

– the higher confidence level, the larger sample needed

– 可信水平越高,所需要的样本就越大

Page 24: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

24

Confidence Standard标准的可信水平

• Standard is 95%• 标准的可信水平是 95%

– 19 of 20 samples would have found similar results – 20 个样本中有 19 个样本具有相似的样本统计量– we are 95% certain that the population parameter is

somewhere between the lower and upper confidence interval calculated from the sample

– 我们可以 95% 地确定样本统计量是总体的精确估计值

Page 25: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

252525

Confidence Interval可信区间

• Sometimes called sampling error, margin of error, or precision

• 有时也被称为样本错误、错误范围或者精度• Example: • 例如:

– in polls 48% for, 52% against, with (+/- 3%)– 民意测验表明 48%赞成, 52%反对。(误差率正负

3% )– actually means 45% to 51% for and

49% to 55% against– 实际上 45%-51% 的人赞成, 49-55% 的人反对

Page 26: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

26

Sample Size样本容量

• By increasing sample size, you increase accuracy and decrease margin of error

• 通过增大样本容量,你就提高了精确度,同时降低了边际误差• The larger the margin of error, the less precise your results will be• 边际误差越大,样本统计量的精确度就越小• The smaller the population, the smaller the needed sample size for

a given confidence level and margin of error, but the larger the needed ratio of the sample size to the population size.

• 总体越小,在给定可信区间和边际误差的前提下,需要的样本容量就越小,但是样本与总体的比率就越大

• Aim for is a 95% confidence level and a margin of error of +/- 5%• 力求达到 95% 的可信水平和 +/- 5% 的边际误差

Page 27: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

27

Sample Sizes for Large Populations较大总体的样本容量

271

752

1,691

6,765

90%

384

1,067

2,401

9,604

95%

666

1,848

4,144

16,576

99%

5%

3%

2%

1%

Confidence Level可信区间

Precision 精确度

Page 28: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

28

Summary of Sampling Size样本容量的小结

• Accuracy and precision can be improved by increasing the sample size

• 精确性可以通过增加样本大小来提高• The standard to aim for is a 95% confidence level and a margin of

error of +/- 5%• 目标是达到 95% 的可信程度,错误率是正负 5%之间• The larger the margin of error, the less precise the results will be• 错误率越大,结果越不精确• The smaller the population, the larger the needed ratio of the

sample size to the population size• 整体总量越小,样本比率越大

2828

Page 29: Chapter 9:  Deciding on the  Sampling Strategy  模块 9:  抽样策略

29

Where to Find a Sampling Statistician如何找到样本统计师

• American Statistical Association (ASA) directory of statistical consultants

• 美国统计协会( ASA )统计咨询师名录– http://www.amstat.org/consultantdirectory/index.cfm

• Alliance of Statistics Consultants (统计咨询师联合)– http://www.statisticstutors.com/#statistical-analysis

• HyperStat Online– http://davidmlane.com/hyperstat/consultants.html

2929