통계적 추론 (Statistical Inference)

통계적 추론(Statistical Inference)

삼성생명과학연구소통계지원팀

김선우

1

Statistical Inference

• 통계적 방법과 자료 (Sample) 에 근거하여 모집단을 추측하는 것

• Estimation ( 추정 )• Testing ( 검정 )

2

Population vs Sample

• Population ( 모집단 )- 관심 정보를 얻고자 하는 대상의 전체 집합- 관심 정보에 따라 모집단이 다르게 정의되기 때문에

무엇을 알고자 하느냐를 명확히 정의하는게 중요- 시점 , 지역 등 명시

• Sample ( 표본 )- 모집단의 부분 집합

3

Target (Study) population

• The group that we wish to study• The sample is selected from the study population

4

Parameter vs Statistic

• Parameter ( 모수 )- 모집단의 양적 특성- Population mean, Population standard deviation,

Population proportion, …

• Statistic ( 통계량 )- 모수 추정을 위해 표본으로부터 산출되는 양적 특성- Sample mean, Sample standard deviation, Sample proportion, …

5

6

• Example 6.8- Wish to characterize the distribution of

birthweight of all liveborn infants that were born in the US in 1988.

- Parameter of interest: Mean, SD of birthweight• Sample- Statistic: Mean, SD of birthweight from sample

Random sample

• A selection of some members of the population such that each member is independently chosen and has a known non-zero probability of being selected.

• A simple random sample- A random sample in which each group member

has the same probability of being selected.

7

Randomized Clinical Trial(RCT)

• A type of research design for comparing difference treatments, in which the assignment of treatments to patients is by some random mechanisms (randomization).

• Randomization- Assignment of treatment of an individual is

independent to assignment of treatment of other individuals.

- The types of patients assigned to different treatment modalities will be similar if the sample sizes are large. If sample sizes are small, then patient characteristics of treatment groups may not be comparable. Thus, it is customary to present a table of characteristics of different treatment groups in RCTs, to check that the randomization process is working well.

8

Design features of randomized clinical trials

• Block randomization- For comparing two treatments (A, B), a block size

of 2n is determined in advance, where for every 2n patients entering the study, n patients are randomly assigned to treatment A and the remaining n patients are assigned to treatment B. A similar approach can be used in clinical trials with more than 2 treatment groups.

9

10

• Stratification- Patients are subdivided into subgroups, or strata,

according to characteristics that are thought to be important for patient outcome. Separate randomization lists are maintained for each stratum to ensure that there are comparable patient populations within each stratum. Either random selection or block randomization might be used for each stratum.

11

• Blinding- Double blind if neither the physician nor the

patient know what treatment he or she is getting.- Single blind if the patient is blinded as to

treatment assignment but the physician is not (vise versa).

- Blinding is always preferable to prevent biased reporting of outcome by the patient and/or the physician. However, it is not always feasible in all research setting

- Double dummy: Target drug + Placebo of standard drug, Standard drug + Placebo of target drug

Estimation

• Point estimation ( 점추정 )• Interval estimation ( 구간추정 )

12

Point estimation

• A natural estimator to use for estimating the population mean is the sample mean.

• This estimator has desirable properties (unbiased, minimum variance).

• Sampling distribution of the sample mean- The distribution of values of sample mean over all

possible samples of same size that could have been selected from the study population. (Figure 6.1)

13

14

• Standard Error of the Mean (SEM)- The standard deviation of the sample means- A quantitative measure of the variability of sample

means obtained from repeated random samples of size n drawn from the same population.

- The standard deviation of population(σ)/√n- The larger the sample size, the more accurate an

estimator of mean will be.

• Example 6.23- SEM is given by SD of

sample(s)/√n=22.44/√10=7.09

15

• Each estimator (ex: sample variance, sample proportion, etc) has its own standard error.

Ex) SE of sample proportion (p) = p(1-p)/√n

• The larger the sample size, the more accurate an estimator of corresponding parameter will be.

16

• SD vs SEM

• SD- Variability of raw data

• SEM- Variability of the sample means

Interval estimation

• 점 추정값은 주어진 표본으로부터 산출된 값으로 , 표본이 달라지면 점 추정값이 또한 달라질 수 있으므로 , 그 자체 variability 를 갖고 있음 ( 예 : SEM).

• Ex) Sample mean of cholesterol in SMC = 192Sample mean of cholesterol in SNU = 181Sample mean of cholesterol in CMC = 185…

• 따라서 관심 모수가 속해 있을 구간을 추정해 볼 필요가 있음 .

17

18

• 95% Confidence interval for μ (population mean)- Over the collection of all 95% confidence intervals

that could be constructed from repeated random samples of size n, 95% will contain the parameter μ. (figure 6.6)

• Factors affecting the length of a CI- As the sample size increases, the length of CI

decreases.- As the SD, which reflects the variability of individual

observations, increases, the length of CI increases.- As the confidence desired (ex; 95%) increases, the

length of CI increases.

19

• Example 6.30, 6.32, 6.33• Body temperature

• 95% CI for the mean = (sample mean-1.96*SE, sample mean+1.96*SE) ( SE = standard error of the mean = SD/√n)

• Sample mean=97.2, SD=0.2 for n=10• 95% CI for the mean = (97.08, 97.32)



20

• Confidence interval can be estimated using either asymptotic method or exact method.

• For binary outcome, Asymptotic method for 95% CI for the proportion; (sample proportion-1.96*SE, sample

proprotion+1.96*SE) ( SE = Standard error of the proportion = √p(1-p)/n)

Example 6.45 p=0.04, n=10,000 95% CI for proportion = (0.036, 0.044)

21

In the case of np(1-p)<5, exact interval should be estimated using binomial distribution.

• Example 6.47 n=20, p=0.1; np(1-p)=1.8<5 Exact 95% CI for p = (0.01, 0.32)

Testing

• Research objective• Research question• Research hypothesis

22

Hypothesis ( 가설 )

• 연구목적이 추상적 기술인 반면 , 가설은 실제 연구 수행( 설계 ~ 보고 ) 이 가능하도록 구체적이고 명확히 기술됨

• 연구목적과 부합• 분석 대상 포함• 비교 군이 명확히 포함• 비교 변수가 실제 측정 변수를 사용하여 기술• 기대하는 바가 반영되어야 함• 직접 통계적 검정이 가능하도록 작성되어야 함

23

24

• 연구목적 : 새로운 항암제와 기존 항암제간 유효성 비교• 가설 ?• 새로운 항암제와 기존 항암제간 효과가 다르다 . Wrong!

• 4 기 유방암 환자에서 새로운 항암제 사용군과 기존 항암제 사용군간 3 개월 반응율이 차이가 있다 .

귀무가설 vs 대립가설

• Alternative hypothesis ( 대립가설 ) (H1)- 연구자가 입증하고자 하는 바를 기술한 것

• Null hypothesis ( 귀무가설 ) (H0)- 대립가설과 반대되는 가설

25

26

• 대립가설 : 4 기 유방암 환자에서 새로운 항암제 사용군과 기존 항암제 사용군간 3 개월 반응율이 차이가 있다 . (Non-equality test) 입증하고자 하는 바

• 귀무가설 : 4 기 유방암 환자에서 새로운 항암제 사용군과 기존 항암제 사용군간 3 개월 반응율이 차이가 없다 .

• 대립가설 : 전립선 수술환자에서 open surgery 와 robot surgery 방법간 3 개월째 PSA 수치 비정상 비율이 차이가 없다 . (Equivalence test) 입증하고자 하는 바

• 귀무가설 : 전립선 수술환자에서 open surgery 와 robot surgery 방법간 3 개월째 PSA 수치 비정상 비율이 차이가 있다 .

Statistical testing

• 통계적 방법과 자료를 가지고 귀무가설 기각 여부에 대한 판정을 내리는 것

• 귀무가설이 ‘참’이라는 가정하에 검정을 수행하는 것으로 , 귀무가설이 기각될만큼의 충분한 증거가 있을 때에만 귀무가설을 기각

• If the null hypothesis is rejected, there is sufficient evidence to reject the null hypothesis, and the alternative hypothesis can be proved.

• If the null hypothesis is not rejected, the null hypothesis may be true or the evidence of the alternative hypothesis is not sufficient to reject the null hypothesis even though the alternative hypothesis is true. Thus, if the null hypothesis is not rejected, we can not say that the null hypothesis is true.

27

28

• 대립가설 : 4 기 유방암 환자에서 새로운 항암제 사용군과 기존 항암제 사용군간 3 개월 반응율이 차이가 있다 . (Non-equality test) 입증하고자 하는 바

• 귀무가설 : 4 기 유방암 환자에서 새로운 항암제 사용군과 기존 항암제 사용군간 3 개월 반응율이 차이가 없다 .

• 귀무가설이 기각될 경우 ; 두 군간 3 개월 반응율이

차이가 있다 .• 귀무가설이 기각되지 못할 경우 ; 두 군간 3 개월

반응율이 차이가 있다고 할 수 없다 .

29

• 대립가설 : 전립선 수술환자에서 open surgery 와 robot surgery 방법간 3 개월째 PSA 수치 비정상 비율이 차이가 없다 . (Equivalence test) 입증하고자 하는 바

• 귀무가설 : 전립선 수술환자에서 open surgery 와 robot surgery 방법간 3 개월째 PSA 수치 비정상 비율이 차이가 있다 .

• 귀무가설이 기각될 경우 , 3 개월째 PSA 수치 비정상

비율이 두 군간 차이가 없다 .• 귀무가설이 기각되지 못할 경우 , 3 개월째 PSA 수치

비정상 비율이 두 군간 같다고 말할 수 없다 .

30

• Four possible outcomes in hypothesis testingTruth

Decision from testing

Null hypothesis Alternative hypothesis

Do not reject null Correct Incorrect(Type II error; False negative error)

Reject null Incorrect(Type I error; False positive error)

Correct

31

• The probability of a type I error is usually denoted by α and is commonly referred to as the significance level of a test. (false positive error 의 감내할 수 있는 최대 크기 )

• The probability of a type II error is usually denoted by β.

• The power of a test is defined as 1-β.

• The general aim in hypothesis testing is to use statistical tests that make α and β as small as possible. This goal requires compromise, since making α small involves rejecting the null hypothesis less often, whereas making β small involves accepting the null hypothesis less often. Contradictory; that is, as α increases, β will decrease vice versa. General strategy is to fix α at some specific level, (ex; 0.1, 0.05, 0.01, etc) and to use the test that minimizes β (or maximizes the power).

One-sided test vs Two-sided test

• A one-sided test is a test in which the values of the parameter being studied (ex. Population mean, μ) under the alternative hypothesis are allowed to be either greater than or less than the values of the parameter under the null hypothesis but not both.

• Example 7.2, 7.10 (SD=25)- H0: μ=120, H1: μ<120- Sample mean=115- How sample mean is small in order to reject H0? Need the rejection region (the range of values of

sample mean for which H0 is rejected)

32

33

• Use the probability of type I error (α).• α = P(reject H0 | H0 is true) = P(sample mean < C |

μ=120) ( 표준화 필요 Z = (sample mean – μ) / (σ/√n),

Z~N(0,1), 즉 Z 는 평균 0, 분산 1 인 표준정규분포를 따름 )• With SD=25, n=100, α=0.05, 0.05 = P(Z < (C-120)/(25/√100)) (C-120)/(25/√100) = Z0.05 = -1.645 C=115.89• Reject H0 if sample mean < 115.89 under α = 0.05.

• From sample data, sample mean=115, reject H0 under α = 0.05

This approach depends on the size of type I error (α) to decide whether the null is rejected.

34

• Significance tests can be effectively performed at

all α levels by obtaining the p-value for the test.

• P-value- Probability; (0, 1)- 자료가 귀무가설을 지지하는 정도- 귀무가설이 맞다고 가정했을 때 , 자료로부터 산출한

통계값 ( 예 : 표본평균 ) 보다 더 극단적인 결과 ( 즉 대립가설에 유리하게 나오는 것 ) 가 나올 확률

35

• P-value = P( 표본평균 <115 | μ=120) = P( 표준화된 표본평균 < (115-120)/(25/√100)) = P(Z<-2.0) = 0.02275

• 귀무가설이 맞다고 가정함으로써 귀무가설 (μ=120) 을 기준으로 하여 , 관찰된 통계값 (ex; 115) 이 거기서 얼마나 떨어져있나를 보는 것 .

• 멀리 떨어져 있으면 p-value 가 작아 귀무가설을 부정하게되고 , 가까우면 p-value 가 커서 귀무가설을 부정하지 않게됨

36

• Significance level (α); a pre-chosen probability• P-value; a probability calculated after a given

study

• P-value 는 표본의 크기가 크면 임상적 , 실제적으로 의미없는 차이에도 작게 산출되어 통계적으로 유의하다고 할 수 있으므로 , 통계적 유의성이 곧 임상적 , 실제적 유의성을 보장하지는 않음

• 따라서 , p-value 와 신뢰구간을 함께 제시하는 것이 바람직함

37

• A two-sided test is a test in which the values of the parameter being studied under the alternative hypothesis are allowed to be either greater than or less than the values of the parameter under the null hypothesis.

• Example 7.19- H0: μ=190, H1: μ≠190

• A reasonable decision rule to test for alternative on either side of the null mean is to reject H0 if sample mean is either too small or too large.

38

• α = P(reject H0 | H0 is true) = P(sample mean < C1 or > C2 | H0 is true) = P(sample mean < C1 | H0 is true) + P(sample mean > C2 | H0 is true)

• For comparison of two means, half of the type I error is arbitrarily assigned to each of the probabilities.

• P(sample mean < C1 | H0 is true) = P(sample mean > C2 | H0 is true) = α/2

39

• Sample mean given the data = 181.52• P-value = 2*P(sample mean < 181.52 | μ=190) = 2*P(Z < (181.52-190)/(40/√100)) = 2*P(Z<-2.12) = 2*0.017=0.034

Relationship between hypothesis testing and

confidence interval• For two-sided cases, H0 (μ=μ0) is rejected with a

two-sided level α test if and only if the two-sided 100%*(1-α) confidence interval for parameter does not contain μ0.

• Example 7.40 H0 (μ=190)- 95% CI for cholesterol mean = (sample mean-1.96*σ/√n, sample

mean+1.96*σ/√n) = (181.52-1.96*40/ √100, 181.52+1.96*40/√100) = (173.68, 189.36)- P-value was 0.034.

40

Documents

통계적 추론 (Statistical Inference)