8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 1/45
SticiGui
Statistics 21
University of California at Berkeley
©1997–2013. P.B. Stark. All rigts reserve!.
1
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 2/45
Preface
Content" Piloso#y an! $oals
%vervie& of te %nline 'e(t)ook
Prere*+isites'ecnical ,esign Criteria an! -#leentation
A!vantages of /avaScri#t over stan!ar! Statistical Packages
S+ggestions for val+ating te aterials
A)o+t te A+tor
Ackno&le!gents
2
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 3/45
PrefaceContent" Piloso#y" an! $oals
%vervie&
Prere*+isites
,esign criteria an! i#leentation
A!vantages of /avascri#t over #ro#rietary Statistics Packages
S+ggestions for eval+ating te aterials
A)o+t te A+tor
Ackno&le!gents
Content, Philosophy and Goals
'is te(t &as &ritten for a terinal45
intro!+ctory class in Statistics s+ita)le st+!ents in B+siness" Co+nications"conoics" Psycology" Social Science" or li)eral arts6 tat is" tis is te first an!last class in Statistics for ost st+!ents &o take it. -t also covers logic an!reasoning at a level s+ita)le for a general e!+cation co+rse. Accor!ingly" te te(t isnot geare! to&ar! teory" n+erical analysis" or so#isticate! for+lae6 neiter!oes it contain a )estiary of tecni*+es or nae! #ro)a)ility !istri)+tions. ater" -o#e to el# st+!ents to tink logically a)o+t *+antitative evi!ence an! to translatereal8&orl! sit+ations into ateatical *+estions6 an! to e(#ose st+!ents to a fe&i#ortant statistical an! #ro)a)ilistic conce#ts an! to soe of te !iffic+lties"s+)ective !ecisions" an! #itfalls" in analy:ing !ata an! aking inferences fron+)ers. 'e te(t !evelo#s #ro)a)ility" estiation" an! inference +sing co+ntingarg+ents; tere is no calc+l+s involve!.
- o#e tat st+!ents &o st+!y fro tese aterials &ill;
< ea! te ne&s#a#er &it ne& eyes; )ecoe skille!" circ+s#ect
cons+ers of *+alitative an! *+antitative inforation.< =no& tat #ro)a)ility in #artic+lar" an! n+)ers in general" can
)e +se! to o!el soe feat+res of te #ysical &orl! an! +an
)eavior.< -#rove teir skills in critical tinking an! logical reasoning.< A##reciate te role Statistics #lays in any fiel!s" fro )+siness
to econoics" la&" #olitics" science an! e!icine.
3
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 4/45
< =no& tat !ata can )e ani#+late! to tell any inconsistent
stories" tat !ata analysis is not clear c+t" an! tat any s+)ective
+!gents are involve! in analy:ing real !ata.< =no& i#ortant *+estions to ask &en face! &it a *+antitative
arg+ent>)e a)le to analy:e arg+ents an! fin! teir strengts an!
&eaknesses.< Un!erstan! tat +nt+tore! int+ition ten!s to #ro!+ce fa+lty
#ro)a)ility +!gents an! kno& o& to reason a)o+t #ro)a)ility.< A##reciate soe of te #iloso#ical !iffic+lties in ascri)ing
eaning to #ro)a)ility an! in inferring ca+sal relationsi#s fro !ata.< Be #re#are! for ore a!vance! co+rses in Statistics>even
to+g tey igt not take any.
'e te(t starts &it reasoning an! fallacies" &ic is #era#s a )it +n+s+al for a
Statistics te(t)ook>)+t logical reasoning is key to )ot teoretical an! e#irical
&ork. 'e te(t goes f+rter &it co+nting arg+ents an! co)inatorics tan osteleentary te(t)ooks !o6 it also goes f+rter &it logic an! &it !ata analysis. 'e
tools incor#orate! into te aterials ena)le st+!ents to analy:e real !atasets ?te
largest as 913 o)servations of @ varia)les &ito+t te #e!agogical overea! of
teacing st+!ents to +se a #ro#rietary statistics #ackage. St+!ents also re#ro!+ce
n+erical e(#erients tat !eonstrate key conce#ts" s+c as sa#ling
!istri)+tions" confi!ence intervals" an! te a& of arge +)ers. Using /avaScri#t
)ase! tools also eliinates te nee! to teac st+!ents to rea! arcane ta)les
associate! &it !ifferent !istri)+tions6 instea!" st+!ents ty#e te relevant
#araeters into te(t)o(es" igligt a range of val+es" an! rea! off te #ro)a)ility.
- ave trie! to e#asi:e to#ics tat can )e ta+gt ost effectively &it tis sort of
interactive online tool. - ave so+gt to #rovi!e eno+g variety in te aterial tat
instr+ctors can #ick an! coose fro aong te ca#ters to fin! ateriala##ro#riate to te level at &ic tey !esire to teac. 'e ost tecnical aterial is
in footnotes an! si!e)ars" so tat it !oes not interr+#t te flo&. any of te
e(a#les an! !atasets for e(ercises are real>tey arose in y cons+lting &ork" in
e(#erients - a failiar &it" or tey are in te #+)lic !oain ?for e(a#le" !ata
on $A' scores" +n!ergra!+ate $PA" an! BA $PA.
any of te inference #ro)les are real" too. Dor e(a#le" te =assel ,o&sing
(#erient is a real test of te a)ility of !o&sers to !eterine &eter &ater is
r+nning in a )+rie! #i#e6 te !erivation of DiserEs e(act test is in te conte(t of
!eterining &eter targete! Fe) a!vertising &orks" a #ro)le - ave st+!ie! for a
cons+lting client6 te case st+!ies a)o+t e#loyent !iscriination an! teft of
tra!e secrets !erive fro y &ork as an e(#ert &itness.
- ave trie! to otivate any of te co#+tations )y inference #ro)les. Pro)a)ility"
y#otesis testing" ran!oi:ation" an! sa#ling error" are &oven into te !isc+ssion
of e(#erients an! sa#le s+rveys. Dor soe intro!+ctory co+rses" te #ro)a)ility
in tose sections &ill s+ffice. Dor instr+ctors &o !esire a ore *+antitative te(t"
tere are a!!itional ca#ters on #ro)a)ility !istri)+tions" !iscrete ran!o varia)les"
an! e(#ectation.
'e )ook !oes not !isc+ss contin+o+s !istri)+tions; 'e noral c+rve" St+!entEs t8
c+rve" an! te ci8s*+are c+rve a##ear as a##ro(iations to te #ro)a)ility
G
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 5/45
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 6/45
< -t is easy for st+!ents to +se; 'e te(t is accessi)le tro+g
stan!ar! &e) )ro&sers. 'ere is none of te start8+# cost associate!
&it learning to +se a #ro#rietary statistical soft&are #ackage.< -t is easy for instr+ctors to +se; Assignent !+e !ates an!
enrollent lists are controlle! over te Fe) +sing a )ro&ser. -nstr+ctors
!o not ave to &rite *+i::es" collect *+i::es" recor! *+i: gra!es" orret+rn *+i::es to st+!ents. oreover" instr+ctors !o not ave to teac
st+!ents to +se s#eciali:e! soft&are.
'e soft&are e#o&ers st+!ents to re#ro!+ce n+erical e(#erients teselves"
&ito+t aving to learn a statistical lang+age ?+sing instea! a stan!ar! Fe)
)ro&ser" &ic enco+rages e(#loration an! in*+iry8)ase! learning. 'e te(t +ses
te #o&er of te -nternet in any &ays" incl+!ing te follo&ing;
< inks to a glossary of ters.
< ,ynaic e(a#les an! self8test e(ercises tat cange every tiea st+!ent visits a ca#ter. Soe self8test e(ercises #arse st+!ent in#+t
to !eterine &eter st+!ent for+lae are correct>a +c ore
so#isticate! notion of correctness tan +lti#le coice or n+erical
res#onses.< online" acine8gra!e! assignents" constr+cte! so tat eac
st+!ent gets a !ifferent version of te assignent. $ra!es are #oste!
a+toatically to te class &e)site" an! sol+tions are availa)le online
after te !+e !ate.< eference an! #ractice aterials to #re#are for e(as.
Prerequisites
'ese aterials !o not ass+e tat te rea!er as any #revio+s kno&le!ge of
statistics or #ro)a)ility. Lo&ever" te rea!er nee!s to )e coforta)le &it
#ercentages" e(#onentiation an! s*+are roots" an! scientific notation ?n+)ers
ties #o&ers of ten. Assignent 0 is a revie& an! *+i: covering te #rere*+isite
aterial. 'e +ltiate calc+lations are all si#le" )+t te logical reasoning nee!e! to
re!+ce te #ro)les to tose si#le calc+lations are soeties s+)tle.
Soe of te footnotes an! si!e)ars rely on eleentary calc+l+s to fin! stationary
#oints of conve(" contin+o+sly !ifferentia)le f+nctions. Dor e(a#le" te ean is
caracteri:e! as te n+)er fro &ic te rs of te resi!+als is sallest" an! te
regression line is caracteri:e! as te least8s*+ares line. 'ose !erivations can )e
ski##e! &it i#+nity.
Technical Design Criteria and !ple!entation
'ese aterials are co#rise! of ML'" CSS" an! /avaScri#t. As of 19 /+ne
2011" tey consiste! of 217 ML' files containing over 13N"000 lines of ML'
an! /avaScri#t" N3 /ava classes containing a)o+t 1N"000 lines of co!e" 27
N
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 7/45
/avaScri#t li)raries containing a)o+t 1G"900 lines of co!e" 3G !ata files containing
a)o+t @"N00 recor!s" fo+r casca!ing style seets &it a)o+t 2"200 lines" an! a
an!f+l of .#g an! .gif files. 'e coice to +se ML'" CSS" /ava an! /avaScri#t
&as otivate! )y tese !esign criteria;
1. a(ii:e accessi)ility an! #orta)ility. ecent )ro&sers allo& tis
aterial to )e accesse! fro alost any&ere in te &orl!" &ito+ta!!ing #l+g8ins to te )ro&ser" an! &ito+t )+ying any #ro#rietary
soft&are. -f a #l+g8in &ere re*+ire!" !o&nloa!ing te #l+g8in &o+l!
itself #resent a consi!era)le )arrier to soe st+!ents. 'e soft&are r+ns
+n!er every aor o#erating syste ?+ni(" lin+(" Fin!o&s 9( an! '"
ac %S" )eca+se o:illa" %#era an!Oor icrosoft ave versions of teir
)ro&sers tat r+n on tose o#erating systes. Bro&sers coe installe!
on all ne& #ersonal co#+ters" so tis aterial is ie!iately
accessi)le to ne& co#+ter o&ners. - ave a!e a consi!era)le effort
to ake te aterials f+nction &ell &it screen rea!er soft&are for
vis+ally i#aire! st+!ents>)+t tere is ore &ork to )e !one.Converting te ateatics to at +sing at/a( is +n!er&ay.
2. a(ii:e interactivity an! inii:e tecnological )arriers.
St+!ents so+l! )e a)le to e(#lore !ata an! to ask an! ans&er &at8if
*+estions" &ito+t nee!ing to learn o& to +se a conventional statistical
soft&are #ackage. 'ools so+l! ave a #oint8an!8click interface &ose
+se &as fairly o)vio+s 88 no i!!en en+s" consistent $U-" etc ..3. inii:e )an!&i!t an! a(ii:e s#ee!. Using /avaScri#t
allo&s te fig+res an! #lots to )e generate! on te client8si!e. 'e co!e
an! !ata !o&nloa! to te client" ten te client co#+tes an! creates
te fig+res. 'is is )y far te ost efficient &ay to get !ynaicinteraction &it te !ata. %ter&ise" every tie te +ser cange! a
#araeter val+e" te client &o+l! nee! to sen! a essage to te server"
an! te server &o+l! ave to co#+te te ne& fig+re" an! sen! te
res+lting fig+re over te -nternet to te client. -nteractive real8tie !ata
e(#loration &o+l! not )e #ossi)le. 'ere are a sall n+)er of fig+res
tat are store! as $-D or /P$ files6 alost all te fig+res are co#+te!
)y te client. Sen!ing +st te !ata an! te r+les ?#rogras for
generating fig+res fro te !ata s+)stantially re!+ces te tie it takes
#ages to loa!. any of te fig+res are #+re CSS" &ic not only is very
ligt&eigt" )+t allo&s te fig+res to re8si:e elegantly if te +ser canges
te !iensions of te #age" an! !is#lays &ell even on o)ile !evices.G. ake it easy to +se te aterials in lect+res. Beca+se te
soft&are is free8stan!ing ?it !oes not nee! a server for co#+tations" it
is easy to !is#lay te content in te classroo &ito+t an -nternet
connection. 'at allo&s te instr+ctor to !eonstrate conce#ts an! te
+se of te aterials in class.
7
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 8/45
@. ake it easy for instr+ctors to set !+e !ates for assignents an!
anage a co+rse" an! for st+!ents to track teir o&n #rogress. ,+e
!ates are controlle! )y te instr+ctor over te -nternet6 siilarly" te
instr+ctor can o!ify gra!es" co#+te co+rse scores" enter e(tra cre!it
assignents" etc ." +sing a )ro&ser. Perl8cgi ro+tines +#!ate te
!ata)ase &en a st+!ent s+)its an assignent" an! allo& st+!ents
an! instr+ctors to *+ery te !ata)ase for gra!es over te -nternet. As an
alternative to #erl cgi" assignent !+e !ates can )e controlle! )y
o!ifying a si#le ascii te(t file tat is *+erie! )y an A/AM call" an!
assignent gra!es can )e re!irecte! to an eail a!!ress instea! of a
!ata)ase.N. a(ii:e #orta)ility on te server si!e. ater tan +se
#ro#rietary sol+tions" s+c as icrosoft active server #ages ?.as# #ages
or server8si!e /avaScri#t" te )ack en! is a collection of #erl cgi
ro+tines tat access a stan!ar! !ata)ase file. 'e server only nee!s to
s+##ort #erl scri#ts an! serve static ML'" /avaScri#t" te(t" an! CSS.
'is akes te #ackage #latfor8in!e#en!ent on te server si!e as &ell
as te client si!e; tere is #+)lic8!oain soft&are allo&ing #erl cgi to r+n
&it every coon &e) server" an! tere are #+)lic8!oain #erl
i#leentations for all #o#+lar o#erating systes. -nstalling te
aterials on a ne& server is si#le.Using ML' &it /avaScri#t an! CSS allo&e! e to ake te content !ynaic;
any of te e(a#les an! e(ercises in te te(t cange &enever te #age is
reloa!e!" so st+!ents can get +nliite! #ractice at certain kin!s of #ro)les.
Siilarly" eac st+!ent gets a !ifferent version of eac assignent an! e(a" )+t
can see te sol+tions to isOer version after te !+e !ate.
"dvantages of #ava$cript over standard $tatistical Packages
'ere are a n+)er of a!vantages to +sing /avaScri#t rater tan an integrate!
statistical #ackage;
1. 'e aterial can )e accesse! fro any co#+ter &it an -nternet
connection an! a &e) )ro&ser. 'e co#+ter !oes not nee! to ave
any #ro#rietary soft&are installe!. St+!ents terefore can access te
aterial fro +niversity an! #+)lic li)raries" -nternet cafes" oe" etc .
St+!ents ave even s+)itte! oe&ork +sing Fe)'H.2. Soe of te !eonstrations &o+l! )e e(treely !iffic+lt" if not
i#ossi)le" to co!e in a stan!ar! statistical #ackage. Dor e(a#le" see
te Henn ,iagra tool.3. -f a stan!ar! #ackage &ere +se!" te
fig+resO!eonstrationsOcalc+lations co+l! not )e e)e!!e! in te te(t
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 9/45
an! te assignents. 'e st+!ent &o+l! ave to navigate aong
#rogras to see !eonstrations or solve #ro)les.G. 'e intellect+al start8+# cost to te st+!ent is lo&er tan it &o+l!
)e for a general8#+r#ose #ackage. ac tool ill+strates a single conce#t"
all te controls are visi)le" an! te interface is as int+itive as - ave )een
a)le to ake it. 'e st+!ent !oes not nee! to learn +c to get starte!.@. - fin! it #refera)le #e!agogically to +se tools &it a single
f+nction" &it all te controls visi)le.N. 'e onetary cost to te st+!ent is inii:e!.
$uggestions for %valuating the &aterials
- &o+l! recoen! tat instr+ctors &o &is to eval+ate tese aterials for
#ossi)le a!o#tion look first at C'"PT%( ), &*+T"("T% D"T" "-D
$C"TT%(P+OT$ " C'"PT%( ., CO((%+"TO- "-D "$$OC"TO- " an!
C'"PT%( /, (%G(%$$O- . 'ose ca#ters ill+strate several as#ects of te te(t;!ynaic e(ercises" te +se of real !ata in e(a#les an! e(ercises" te istogra
an! scatter#lot tools" an! te gra!+al intro!+ction of ne& f+nctionality ?)+ttons an!
!is#laye! statistics into te tools as st+!ents learn ne& conce#ts. Dor e(a#le"
&en te scatter#lot tool arrives in C'"PT%( ), &*+T"("T% D"T" "-D
$C"TT%(P+OT$ " its only controls cange te varia)les #lotte!" list te !ata" so&
+nivariate statistics of te varia)les in te !ataset ?s+ary statistics covere! in te
first t&o ca#ters" an! !is#lay te coor!inates of te c+rsor. ?Selecting a ro& in te
!ata listing igligts te corres#on!ing #oint in te scatter#lot. -n C'"PT%( .,
CO((%+"TO- "-D "$$OC"TO- " te scatter#lot tool ac*+ires te correlation
coefficient" an! a )+tton to so& gra#ically te stan!ar! !eviations of te t&o
varia)les #lotte!6 it is also invoke! to !is#lay ran!oly generate! !ata tat attain a
given val+e of te correlation coefficient. -t also starts to allo& st+!ents to a!! #oints
)y clicking on te #lot" to see te effect of a!!itional !ata on te correlation
coefficient. -n C'"PT%( /, (%G(%$$O- " te sae tool gains )+ttons to so& te
gra# of averages" te S, line" an! te regression line.
After tose ca#ters" - &o+l! recoen! looking at te collection of interactive tools
to see o& vario+s conce#ts are #resente! gra#ically6 in #artic+lar" )e s+re to see
te tools for Henn !iagras" sa#ling !istri)+tions" confi!ence intervals" an! te
a& of arge +)ers. 'o see o& ta)les of #ro)a)ilities are eliinate!" see te
tools for te oral ,istri)+tion" St+!entEs t8,istri)+tion" an! te Ci8s*+are
,istri)+tion. - &o+l! recoen! ten looking at C'"PT%( 01, T'% 2+%T3$
&"4% " D%"+2 5&O-T6 '"++7 P(O8+%& " C'"PT%( 0/, P(O8"8+T6
&%%T$ D"T" " C'"PT%( 91, DO%$ T(%"T&%-T '"% "- %::%CT;" an!
C'"PT%( 9/, T%$T-G %<*"+T6 O: T=O P%(C%-T"G%$ . -nstr+ctors &it
an interest in logic or &o teac general e!+cation co+rses igt enoy 'e first of
tose as e(ercises tat #arse logical e(#ressions te st+!ents ty#e in.
9
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 10/45
"bout the "uthor
Pili# B. Stark is Professor an! Cair of Statistics at te University of California"
Berkeley" &ere e as )een on te fac+lty since 19. Le receive! is )acelorEs
!egree in Piloso#y fro Princeton University in 190" an! is P, in art
Science fro te Scri##s -nstit+tion of %ceanogra#y in 19N. Le receive! a
ational Science Do+n!ation Post!octoral Dello&si# in ateatical Sciences in197 an! te Presi!ential Io+ng -nvestigator A&ar! in 199. Le &as electe! a
Dello& of te -nstit+te of Pysics in 1999. Pili# !ro##e! o+t of ig scool an! la&
scool. Le as serve! on te e!itorial )oar!s of o+rnals in a##lie! ateatics"
geo#ysics" an! statistics" an! as given over 130 invite! lect+res at conferences
an! +niversities in 17 co+ntries. Le is te a+tor or co8a+tor of over 100
#+)lications. Pili# as !one researc in astro#ysics" icro&ave cosology"
eart*+ake #re!iction" geoagnetis" geoceistry" seisic toogra#y" signal
recovery" constraine! confi!ence estiation" #ro)a)ility !ensity estiation"
s#ectr+ estiation" inforation retrieval" inverse #ro)les" election a+!iting"
a!+sting te U.S. Cens+s" ca+sal inference" an! +an earing.
Le s#eciali:es in #ro)les &it very large !atasets6 soft&are &ritten )y i an! is
st+!ents #erfors #art of te ro+tine !ata re!+ction for a geoagnetic satellite an!a net&ork of solar telesco#es. Pili# as cons+lte! in -C ask an+fact+ring" oil
e(#loration" &ater treatent" #re!icting e8ail s#ool fill" electrical activity of te
)rain" an! targete! -nternet a!vertising. Le as serve! as an e(#ert &itness in
litigation an! legislation on to#ics ranging fro nat+ral reso+rces to agric+lt+ral
i#ort restrictions" !isaster relief" fairness in len!ing" te U.S. Cens+s" te Cil!
%nline Protection Act ?sa#ling te -nternet an! testing content filters" &ic
involve! te controversial s+)#oena of searc recor!s an! in!e(e! &e)#ages fro
$oogle" IaooQ an! S" cons+er #rotection" conteste! elections" e#loyent
!iscriination" ins+rance" #ro!+ct lia)ility" #ro#erty ta( assessent" tr+t in
a!vertising" arketing" e*+al #rotection" tra!e secrets" intellect+al #ro#erty" risk
assessent" &age an! o+r !is#+tes" an! anti8tr+st.Le as testifie! to te U.S. Lo+se of e#resentatives S+)coittee on te Cens+s"
to te California State Senate" te California State Asse)ly" an! te California
,e#artent of Dis an! $ae. Le as cons+lte! for te U.S. ,e#artent of /+stice"
te U.S. ,e#artent of Agric+lt+re" te U.S. Cens+s B+rea+" te U.S. ,e#artent of
Lo+sing an! Ur)an ,evelo#ent" te U.S. AttorneyEs %ffice of te ortern ,istrict
of California" te U.S. ,e#artent of Heterans Affairs" te De!eral 'ra!e
Coission" te os Angeles Co+nty S+#erior Co+rt" te ational Solar
%)servatory" te California Secretary of State" te Colora!o Secretary of State"
#+)lic +tilities" aor cor#orations" an! n+ero+s la& firs" incl+!ing a)o+t alf of
te 2@ largest. Le &on te CancellorEs A&ar! for P+)lic Service for esearc in te
P+)lic -nterest in 2011 for is &ork on election a+!iting" an! e is c+rrently &orking
&it te Secretary of State of California an! te Secretary of State of Colora!o toi#leent risk8liiting election a+!its.
Pili# &as te Dac+lty Assistant for !+cational 'ecnology at 'e University of
California" Berkeley" fro 2001–2003 an! caire! te U.C. Berkeley !+cational
'ecnology Coittee fro 2001–200@. Le ta+gt UC BerkeleyEs first official online
co+rse" in 2007" an! !evelo#e! one of te first co+rses to )e offere! tro+g UC
%nline !+cation" an! co8!evelo#e! ?&it Ani A!ikari te first intro!+ctory
statistics co+rse offere! tro+g e!M.
10
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 11/45
Pili# !oes not like to )e calle! Pil. Le likes o#en8so+rce soft&are" !islikes eail
attacents" r+ns 100 ile en!+rance trail races" roasts is o&n coffee" an! tinks
tis )ook is #roof tat o)sessive8co#+lsive !isor!er is a o) *+alification. Pili#
lives in Berkeley" California" &it a la#to#" an iPone" a Bacci s#resso" Porle( an!
Rassena+s an! ills" +!!y oea!e +araces" an! an e)arrassing
n+)er of , flasligts.
"cknowledg!ents
'is #roect &o+l! not ave )een #ossi)le &ito+t %fer ict" &o gave co#etent"
intelligent" an! congenial ans&ers to y ten8noo)ie *+estions a)o+t /ava an!
/avaScri#t" #ointe! e to lots of +sef+l aterial" an! &o &rote te original server8
si!e Perl cgi scri#ts for gra!ing oe&ork an! *+erying te gra!e !ata)ase.
,+ncan 'e#le ang &as also a el#f+l an! sy#atetic reso+rce regar!ing te
intricacies an! n+isances of /ava 1.06 in a!!ition" e is #riarily res#onsi)le for te
+lti8trea!e! !ata server +se! to loa! large !ata sets ?oe8gro&n tecnology
tat antici#ates Aa(. - a gratef+l to +!y $+erra" &o co8&rote an earlier version
of te ca#ter on co+nting an! te assignent on e(#erients. y frien! an!entor ,avi! A. Dree!an" an! te e(cellent !ea!8tree )ook Statistics )y
Dree!an" Pisani" an! P+rves" &ere ins#irational. Aviva Sielan &orke! all te
e(ercises an! &as instr+ental in strealining te #rose an! vis+al style. ,eir!re
ync a!e several val+a)le s+ggestions regar!ing te +ser interface" an! Sy!ney
/ones &as e(treely el#f+l in i!entifying #ro)les &it flo&" organi:ation"
consistency" an! #rose. Dac+lty &o ta+gt fro tese aterials an! a!e val+a)le
s+ggestions incl+!e Ani A!ikari" . /ay Citron" Ie+!a =lein" ark in!ean" Aviva
Sielan an! argaret Sit. - a gratef+l for teir el# an! teir co+rage to
e(#erient &it a ne& e!i+ an! ne& o!e of teacing. any st+!ents ave
fo+n! ty#os over te years; 'ank yo+Q
'is )ook is !e!icate! to Alessan!ra an! aoi an! to ,avi! Dree!an.
11
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 12/45
< U T L% T 'M' ' AB %D C%''S T %- C'US
T ASS-$'S T C ACUA'%
T '%%S ,%S V B-%-A L-S'%$A V C ACUA'% V CL-8SKUA ,-S'-BU'-% V C%'%-$ D% HA-ABS V C%D-,C -'HAS V C%A'-% A, $SS-% V L-S'%$A V AF %D A$ UBS
V %A APP%M-A'-% '% , A'A V %A CUH V %A P%BAB--'-S V P%BAB--'I C ACUA'% V S AP-$ ,-S'-BU'-%S V SCA''P%'S V S'U,'ES ' ,-S'-BU'-% V H ,-A$A ?2 SUBS'SV H ,-A$A ?3 SUBS'S
T H-F T $%SSAI
T B-B-%$APLI T SIS' KU-'S T AU'L%ES L%PA$
ntroduction
'ow to use these !aterialsIo+ are #ro)a)ly +sing eiter Direfo(" $oogle Croe" -nternet
(#lorer " Safari or %#era to vie& tese aterials. 'ose are te ost
#o#+lar &e) )ro&sers. 'o +se all te feat+res of Stici$+i©" yo+ nee!
an +#8to8!ate )ro&ser tat s+##orts fraes" casca!ing style seets
?CSS" L' G.01" an! /avaScri#t 1.. /avaScri#t +st )e ena)le! in
12
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 13/45
te )ro&ser6 for assignents to &ork" te )ro&ser +st also acce#t
cookies fro te originating server. Dor a variety of reasons" - strongly
recoen! tat yo+ +se Direfo( an! tat yo+ not +se -nternet
(#lorer. ecent versions of Croe" Safari an! %#era also &ork>
ostly. 'e aterials ave )een teste! ost toro+gly &it Direfo(.
A link is a #iece of te(t yo+ can click to see anoter !oc+ent. Io+
#lace te c+rsor over te link" ten #+s te o+se )+tton ?left o+se
)+tton on a PC co#ati)le co#+ter to follo& te link. inks in
tese aterials are generally in )l+e ty#e.
Clicking a link can eiter re#lace te !oc+ent in one of te fraes
yo+ are vie&ing ?not al&ays te frae tat as te link" or o#en a
ne& &in!o& tat !is#lays te ne& !oc+ent. -f yo+ ave not cange!
te !efa+lt settings in yo+r )ro&ser" links to te glossary &ill )e in
green ty#e6 tose take yo+ to te rigt #lace in te glossary in te
)otto frae" so tat yo+ !onEt lose yo+r #lace in te )ook &en yo+look +# a ter. inks to oter aterials are in )l+e ty#e6 tose
ty#ically re#lace te contents of te frae yo+ are rea!ing.
Io+ so+l! failiari:e yo+rself &it o& yo+r )ro&ser &orks to learn
to navigate aong &in!o&s. %n te rigt si!e of te )ro&ser &in!o&"
yo+ so+l! see a scroll )ar. -f tere is a scroll)ar" tat eans tere is
ore to see>!rag te sli!er !o&n to see ore of te te(t. %n soe
co#+ters" tere is a #age !o&n )+tton yo+ can click to see te ne(t
screenf+l of te(t.
-n te assignents" te navigation )+ttons are s+##resse! to leave
ore roo on te screen for te(t an! gra#ics. Io+ can still get a#o#8+# en+ tat &ill allo& yo+ to go )ack to te #revio+s
!oc+ent. Lo& yo+ get te en+ !e#en!s on te )ro&ser an! te
o#erating syste. -n icrosoft Fin!o&s" yo+ get te #o#8+# en+ )y
rigt8clicking in te o#en )ro&ser &in!o&. ost )ro&sers ave te
a)ility to searc &itin a !oc+ent to fin! a &or! or #rase &itin a
#age. Io+ igt fin! tat feat+re +sef+l to searc for a &or! in te
glossary or a ca#ter of te te(t.
'ere are gra#ical !ata analysis an! vis+ali:ation tools tro+go+t
te te(t an! assignents. very ca#ter as e(ercises to ceck yo+r
+n!erstan!ing. - strongly recoen! tat yo+ !o all of te. ost of
te e(ercises call for an ans&er in a )o(. After yo+ ty#e in yo+r
ans&er" strike te ret+rn or enter key. 'e sy)ol ne(t to te
*+estion &ill cange fro a *+estion ark eiter to a green
13
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 14/45
ceck ?if yo+r ans&er is rigt or to a re! M ?if yo+r
ans&er is &rong. +lti#le8coice *+estions a+toatically so& te
green ceck or re! M &en yo+ select an ans&er. +lti#le8+lti#le8
coice *+estions ?select all tat a##ly are follo&e! )y a )+tton yo+can click to ceck yo+r ans&er. Clicking te ark after te *+estion
?te *+estion ark" ceck" or M &ill e(#an! a )o( containing te
correct ans&er" if yo+ ave alrea!y atte#te! te #ro)le. Io+ can
ans&er eac *+estion as any ties as yo+ like" or see te correct
ans&er. ore !etaile! sol+tions to soe of te e(ercises are availa)le
too6 tere is a link to te !etaile! ans&ers after tose *+estions. Io+r
ans&ers to te e(ercises in te te(t are not recor!e!" an! te
e(ercises !o not contri)+te to yo+r gra!e. any of te #ro)les are
generate! ran!oly>reloa!ing or revisiting te #age &ill give yo+ ane& set of #ro)les" so yo+ can get +nliite! #ractice.
ost of te ca#ters ave a corres#on!ing assignent covering te
aterial in te ca#ter. 'e assignents are gra!e! )y a co#+ter"
an! ay contri)+te to yo+r gra!e ?yo+r instr+ctor &ill tell yo+. Be s+re
to click te )+tton la)ele! S+)it for $ra!ing after yo+ ans&er te
*+estions in eac assignent. After te !+e !ate of te #ro)le set"
yo+ can see yo+r score )y filling o+t a for. Io+ can also see te
sol+tions after te !+e !ate )y ret+rning to te assignent. Dor
!etaile! instr+ctions a)o+t te assignents" incl+!ing )ro&ser8relate!
iss+es" see te oe&ork oe#age.
/+# to ca#ter;
W Preface W -ntro!+ction W 2 W 3 W G W @ W N W 7 W W 9 W 10 W 11 W 12 W 13 W 1G W
1@ W 1N W 17 W 1 W 19 W 20 W 21 W 22 W 23 W 2G W 2@ W 2N W 27 W 2 W 29 W 30 W
31 W 32 W 33 W©1997–2013. P.B. Stark. All rigts reserve!.
ast generate! 2O2NO201G N;2;@7 P. Content last o!ifie! 22 /+ne 2013 1@;37
P,'.
1G
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 15/45
< U T L% T 'M' ' AB %D C%''S T %- C'US
T ASS-$'S T C ACUA'%
T '%%S ,%S V B-%-A L-S'%$A V C ACUA'% V CL-8SKUA ,-S'-BU'-% V C%'%-$ D% HA-ABS V C%D-,C -'HAS V C%A'-% A, $SS-% V L-S'%$A V AF %D A$ UBS
V %A APP%M-A'-% '% , A'A V %A CUH V %A P%BAB--'-S V P%BAB--'I C ACUA'% V S AP-$ ,-S'-BU'-%S V SCA''P%'S V S'U,'ES ' ,-S'-BU'-% V H ,-A$A ?2 SUBS'SV H ,-A$A ?3 SUBS'S
T H-F T $%SSAI
T B-B-%$APLI T SIS' KU-'S T AU'L%ES L%PA$
ntroduction
'ow to use these !aterialsIo+ are #ro)a)ly +sing eiter Direfo(" $oogle Croe" -nternet
(#lorer " Safari or %#era to vie& tese aterials. 'ose are te ost
#o#+lar &e) )ro&sers. 'o +se all te feat+res of Stici$+i©" yo+ nee!
an +#8to8!ate )ro&ser tat s+##orts fraes" casca!ing style seets
?CSS" L' G.01" an! /avaScri#t 1.. /avaScri#t +st )e ena)le! in
1@
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 16/45
te )ro&ser6 for assignents to &ork" te )ro&ser +st also acce#t
cookies fro te originating server. Dor a variety of reasons" - strongly
recoen! tat yo+ +se Direfo( an! tat yo+ not +se -nternet
(#lorer. ecent versions of Croe" Safari an! %#era also &ork>
ostly. 'e aterials ave )een teste! ost toro+gly &it Direfo(.
A link is a #iece of te(t yo+ can click to see anoter !oc+ent. Io+
#lace te c+rsor over te link" ten #+s te o+se )+tton ?left o+se
)+tton on a PC co#ati)le co#+ter to follo& te link. inks in
tese aterials are generally in )l+e ty#e.
Clicking a link can eiter re#lace te !oc+ent in one of te fraes
yo+ are vie&ing ?not al&ays te frae tat as te link" or o#en a
ne& &in!o& tat !is#lays te ne& !oc+ent. -f yo+ ave not cange!
te !efa+lt settings in yo+r )ro&ser" links to te glossary &ill )e in
green ty#e6 tose take yo+ to te rigt #lace in te glossary in te
)otto frae" so tat yo+ !onEt lose yo+r #lace in te )ook &en yo+look +# a ter. inks to oter aterials are in )l+e ty#e6 tose
ty#ically re#lace te contents of te frae yo+ are rea!ing.
Io+ so+l! failiari:e yo+rself &it o& yo+r )ro&ser &orks to learn
to navigate aong &in!o&s. %n te rigt si!e of te )ro&ser &in!o&"
yo+ so+l! see a scroll )ar. -f tere is a scroll)ar" tat eans tere is
ore to see>!rag te sli!er !o&n to see ore of te te(t. %n soe
co#+ters" tere is a #age !o&n )+tton yo+ can click to see te ne(t
screenf+l of te(t.
-n te assignents" te navigation )+ttons are s+##resse! to leave
ore roo on te screen for te(t an! gra#ics. Io+ can still get a#o#8+# en+ tat &ill allo& yo+ to go )ack to te #revio+s
!oc+ent. Lo& yo+ get te en+ !e#en!s on te )ro&ser an! te
o#erating syste. -n icrosoft Fin!o&s" yo+ get te #o#8+# en+ )y
rigt8clicking in te o#en )ro&ser &in!o&. ost )ro&sers ave te
a)ility to searc &itin a !oc+ent to fin! a &or! or #rase &itin a
#age. Io+ igt fin! tat feat+re +sef+l to searc for a &or! in te
glossary or a ca#ter of te te(t.
'ere are gra#ical !ata analysis an! vis+ali:ation tools tro+go+t
te te(t an! assignents. very ca#ter as e(ercises to ceck yo+r
+n!erstan!ing. - strongly recoen! tat yo+ !o all of te. ost of
te e(ercises call for an ans&er in a )o(. After yo+ ty#e in yo+r
ans&er" strike te ret+rn or enter key. 'e sy)ol ne(t to te
*+estion &ill cange fro a *+estion ark eiter to a green
1N
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 17/45
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 18/45
< U T L% T 'M' ' AB %D C%''S T %- C'US
T ASS-$'S T C ACUA'%
T '%%S ,%S V B-%-A L-S'%$A V C ACUA'% V CL-8SKUA ,-S'-BU'-% V C%'%-$ D% HA-ABS V C%D-,C -'HAS V C%A'-% A, $SS-% V L-S'%$A V AF %D A$ UBS
V %A APP%M-A'-% '% , A'A V %A CUH V %A P%BAB--'-S V P%BAB--'I C ACUA'% V S AP-$ ,-S'-BU'-%S V SCA''P%'S V S'U,'ES ' ,-S'-BU'-% V H ,-A$A ?2 SUBS'SV H ,-A$A ?3 SUBS'S
T H-F T $%SSAI
T B-B-%$APLI T SIS' KU-'S T AU'L%ES L%PA$
Chapter >
$tatisticsStatistics is te science of !ra&ing concl+sions fro !ata. 'is
ca#ter intro!+ces a ro+g ta(onoy of !ata" as &ell as tools for
#resenting" s+ari:ing" an! !is#laying !ata; ta)les" fre*+ency
ta)les" istogras" an! #ercentiles. 'e tools are ill+strate! +sing
!atasets fro tra!e secret litigation an! geo#ysics.
1
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 19/45
Data-n its )roa!est sense" Statistics is te science of !ra&ing concl+sions
a)o+t te &orl! fro !ata. ,ata are o)servations ?eas+reents of
soe *+antity or *+ality of soeting in te &orl!. ,ata is a #l+ral
no+n6 te sing+lar for is !at+. %+r lives are fille! &it !ata; te&eater" &eigts" #rices" o+r state of ealt" e(a gra!es" )ank
)alances" election res+lts" an! so on. ,ata coe in any fors" ost
of &ic are n+)ers" or can )e translate! into n+)ers for analysis.
-n tis ca#ter" &e &ill see several ty#es of !ata" an! tools for
s+ari:ing !ata.
'ere are several i#ortant *+estions to kee# in in! &en yo+
eval+ate *+antitative evi!ence;
< Are te !ata relevant to te *+estion aske!J< Fas te !ata collection fair" or igt tere ave )een soe
conscio+s or +nconscio+s 8"$ tat infl+ence! te res+lts or a!e soecases less likely to )e o)serve!J
< ,o te !ata ake senseJ
'e ans&ers to tese *+estions are cr+cial to !ra&ing concl+sions
fro !ata.
%xa!ple >?0@ Data of littlerelevanceA
'ri!entX s+garless g+ +se! toa!vertise tat G o+t of @ !entistss+rveye! recoen! 'ri!entXs+garless g+ for teir #atients &oce& g+.
S+c a s+rvey says little a)o+t&eter 'ri!entX g+ is )etter foryo+r teet tan oter g+" &it or
&ito+t s+gar. 45
-t &o+l! )e ore relevant to st+!y teeffect on teet of ce&ing !ifferentkin!s of g+" not te o#inions of!entists &o igt not ave
19
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 20/45
con!+cte! ?or even rea! anye#irical researc on te effects of!ifferent kin!s of g+.
%xa!ple >?9@ Data with
inadvertent biasA
Co+rse eval+ation fors often askst+!ents *+estions a)o+t teeffectiveness of te instr+ctor. At UCBerkeley" any st+!ents are a)sentfro class &en eval+ation fors are#asse! o+t an! collecte!. -f st+!ents&o !o not fin! lect+res el#f+l are
ore likely to ski# class" teeval+ation for !ata &ill ten! to )e)iase!; on average" te fors &ill ten!to re#ort tat te instr+ctor is oreeffective tan st+!ents really tink e
really is.45
Dor ore on tese to#ics" see Looke ?193" L+ff ?1993 an! 'ale)
?2007.
ariables
A "("8+% is a val+e or caracteristic tat can !iffer fro in!ivi!+al
to in!ivi!+al. ,ata are generally recor!e! val+es of varia)les.
<*"-TT"T% "("8+%$ take n+erical val+es &ose si:e is
eaningf+l. <*"-TT"T% varia)les ans&er *+estions s+c as o&
anyJ or o& +cJ Dor e(a#le" it akes sense to a!!" to
s+)tract" an! to co#are t&o #ersonsE &eigts" or t&o failiesE
incoes; 'ese are *+antitative varia)les. K+antitative varia)les
ty#ically ave eas+reent +nits" s+c as #o+n!s" !ollars" years"volts" gallons" ega)ytes" inces" !egrees" iles #er o+r" #o+n!s
#er s*+are inc" B'Us" an! so on.
Soe "("8+%$ " s+c as social sec+rity n+)ers an! :i# co!es"
take n+erical val+es" )+t are not *+antitative; 'ey are <*"+T"T%
or C"T%GO(C"+ varia)les. 'e s+ of t&o :i# co!es or social
sec+rity n+)ers is not eaningf+l. 'e average of a list of :i# co!es
20
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 21/45
is not eaningf+l. <*"+T"T% an! C"T%GO(C"+ varia)les ty#ically
!o not ave +nits. K+alitative or categorical varia)les>s+c as
gen!er" air color" or etnicity>gro+# in!ivi!+als. K+alitative an!
categorical varia)les ave neiter a si:e nor" ty#ically" a nat+ral
or!ering to teir val+es. 'ey ans&er *+estions s+c as &ic kin!J
'e val+es categorical an! *+alitative varia)les take are ty#ically
a!ectives ?for e(a#le" green" feale" or tall. Aritetic &it
<*"+T"T% varia)les +s+ally !oes not ake sense" even if te
varia)les take n+erical val+es. C"T%GO(C"+ varia)les !ivi!e
in!ivi!+als into categories" s+c as gen!er" etnicity" age gro+#" or
&eter or not te in!ivi!+al finise! ig scool.
%xa!ples of qualitative,
quantitative, and categorical
variables<ualitative
Y LotOFarOCol!Y Po#+lation !ensity; lo&Oe!i+OigY Leigt; sortOe!i+OtallY Un!er @E" @E–NE" %ver NEY Slen!erOAverageO%ver&eigtY Io+ngOi!!le8age!O%l!Y Social class; lo&erOi!!leO+##er Y Daily si:e; fe&er tan 3" 3–@" @ or oreYY
CategoricalY 'e#erat+re; #leasantO+n#leasantY +ralOUr)an areaY en!oor#Oesoor#Oectoor#Y 'y#e of cliateY $en!er Y tnicityY Ri# co!eY Lair color Y Co+ntry of origin
<
ua
n
t
i
t
a
21
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 22/45
t
i
v
e
Y 'e#er
at+re in ZCY Po#+la
tion !ensity;
#eo#le #er
s*+are ileY Leigt
in incesY Leigt
in centietersY Bo!y
ass in!e(
?B-Y Age in
secon!sY -ncoe
in !ollarsY Daily
si:e ?[#eo#leY
'e !istinction )et&een tese ty#es of varia)les is soe&at )l+rry.
Dor e(a#le" &e igt gro+# ages into categories s+c as +n!er @
years ol!" )et&een @ an! 1@" )et&een 1@ an! 2@" )et&een 2@ an! G0"
an! over G0. Siilarly" &eter gen!er or cliate ty#es are *+alitative
or categorical varia)les is not clear8c+t. $enerally" if tere is an i#licit
or!ering of te val+es te varia)le can take ?ot is &arer tan &ar"
&ic is &arer tan col!" tere is a ten!ency to call a varia)le
*+alitative rater tan categorical6 soe #eo#le call s+c varia)les
O(D-"+ . -t is coon to code categorical an! *+alitative varia)les
+sing n+)ers" for e(a#le" 1 for ale an! 0 for feale. The fact
that a category is labeled with a number does not make the
variable quantitative! 'e real iss+e is &eter aritetic &it teval+es akes sense.
-n!ivi!+als nee! not )e #eo#le6 for e(a#le" &e igt )e co#aring
icrocliates in te San Drancisco Bay Area" +sing varia)les s+c as
< ann+al rainfall in inces ?*+antitative< ann+al n+)er of s+nny !ays ?*+antitative" !iscrete
22
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 23/45
< A classification into very foggy" soe&at foggy" an! s+nny.
?*+alitative" or!inal< ann+al average te#erat+re in !egrees Dareneit. ?*+antitative
Siilarly" te in!ivi!+als co+l! )e a single in!ivi!+al at !ifferent
ties; A varia)le igt )e te #rice of a sare of icrosoft stock at
!ifferent ties.
-t is soeties +sef+l to !ivi!e <*"-TT"T% varia)les f+rter into
D$C(%T% an! CO-T-*O*$ varia)les. ?'is !ivision is soeties
rater artificial. 'e set of #ossi)le val+es of a D$C(%T% varia)le is
CO*-T"8+% .45
(a#les of !iscrete varia)les incl+!e ages eas+re! to te nearest
year" te n+)er of #eo#le in a faily" an! stock #rices on te e&Iork Stock (cange. -n te first t&o of tese e(a#les" te varia)lecan take only soe #ositive integers as val+es. -n all tree e(a#les"tere is a ini+ s#acing )et&een te #ossi)le val+es. ost!iscrete varia)les are like tis>tey are c+nky. Haria)les tat co+nttings are al&ays !iscrete.
(a#les of contin+o+s varia)les incl+!e tings like te e(act ages or
eigts of in!ivi!+als" te e(act te#erat+re of soeting" etc . 'ere
is no ini+ s#acing )et&een te #ossi)le val+es of a contin+o+s
varia)le. 'e #ossi)le val+es of !iscrete varia)les !onEt necessarily
ave a ini+ s#acing. ?Dor e(a#le" te set of fractions>rationaln+)ers>is CO*-T"8+% " )+t tere is no ini+ s#acing )et&een
fractions. %ne reason te !istinction )et&een !iscrete an!
contin+o+s varia)les is soe&at vag+e is tat in #ractice tere is
al&ays a liit to te #recision &it &ic &e can eas+re any
varia)le. 'e liit !e#en!s on te instr+ent &e +se to ake te
eas+reent" o& +c tie &e take to ake te eas+reent"
an! so on. Dor ost #+r#oses" te !istinction )et&een contin+o+s an!
!iscrete varia)les is not i#ortant.
'e follo&ing e(ercise cecks yo+r +n!erstan!ing of te !ifferences
aong ty#es of varia)les. 'e e(ercise &ill tell yo+ ie!iately
&eter yo+ are rigt or &rong; ac *+estion is follo&e! )y an
iage. -nitially" te iage is a *+estion ark. -f yo+ ans&er te
*+estion correctly" te *+estion ark is re#lace! )y a ceck ark. -f
yo+ ans&er te *+estion incorrectly" te *+estion ark is re#lace! )y
an M. %nce yo+ atte#t te e(ercise" yo+ can see te correct ans&er
23
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 24/45
)y clicking te iage. Clicking te iage again &ill i!e te ans&er.
Clicking te 4Sol+tion5 link ?&en tere is one reveals a ore
!etaile! ans&er.
%xercise >?0A -!entify te ty#es oftese varia)les;
4Sol+tion5
$a!ple Data $ets
'ro+go+t tis )ook" as &e learn ne& tecni*+es &e sall a##ly
te to real8&orl! !ata fro )+siness" !eogra#y" e!+cation" la&"e!icine" an! #ysics. A##lying te tecni*+es to !ata &ill el# +s to
+n!erstan! te tecni*+es an! to i!entify circ+stances in &ic te
tecni*+es are a##ro#riate. 'e follo&ing sections intro!+ce !ata &e
sall +se to ill+strate an! to #ractice +sing ta)les" fre*+ency ta)les"
istogras" an! #ercentiles.
2G
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 25/45
Trade $ecret Data
'e first !ata set is te 'ra!e Secret ,ata" &ic arose fro a la&s+it
alleging te teft of a c+stoer list. 'e naes of te #eo#le an!
firs ave )een cange!" )+t oter&ise" te facts are state! as -
+n!erstan! te.
%n 1 ay 199@" t&o forer e#loyees of FeeBee Lar!&are ?FBL"
a fir tat sells co#+ter co#onents to co#+ter asse)lers an!
retailers" o#ene! te !oors of a ne& co#any" Feasel ,rives ?F,.
%ne of te forer e#loyees a! &orke! at FBL +# to te !ay
)efore F, o#ene! its !oors6 te oter a! sto##e! &orking for FBL
a)o+t 1 onts #revio+sly. Bot firs are in te greater San
Drancisco Bay Area.
Dro te tie F, starte! )+siness" it sol! essentially te sae kin!s
of co#+ter co#onents tat FBL !i!" ostly to forer c+stoers of
one of te forer e#loyees" at essentially te sae #rices an! &itessentially te sae cre!it ters. -n!ee!" in te first t&o !ays F,
&as in )+siness" one of te forer e#loyees a! calle! te to#
!o:en of er FBL acco+nts. -n its first ont of )+siness" F, sol!
a)o+t \1 illion of e*+i#ent to forer c+stoers of FBL6 tat
ao+nt increase! to a)o+t \2 illion #er ont in te co+rse of a fe&
onts.
'e #rinci#als of FBL so+gt an in+nction against F, to #revent it
fro selling to c+stoers of FBL" alleging tat teir c+stoer list &as
a tra!e secret an! a! )een isa##ro#riate! )y its forer e#loyees.
45
-t is &ell esta)lise! tat a c+stoer list can *+alify as a tra!e secret;
-t as econoic val+e" an! !erives its val+e fro not )eing generally
kno&n. C+stoer lists can )e te #ro!+ct of years of soliciting ne&
)+siness )y a!vertising an! col!8calling tens of to+san!s of
#otential c+stoers an! &inno&ing tat list !o&n to a fe& +n!re! or
a fe& to+san! &o act+ally !o )+y te kin! of e*+i#ent te fir
sells" &o &ill )+y it fro tat fir" an! &o #ay #ro#tly. Fit
kno&le!ge of a firEs list of c+stoers" a co#etitor co+l! avoi! tetie an! e(#ense of soe a!vertising" col!8calling" cecking cre!it
references" )a! !e)t" an! so on.
-n res#onse to FBLEs re*+est for an in+nction" F, asserte!;
< 'ey fo+n! te naes of te c+stoers in #+)lic so+rces" s+c
as C,8%s tat contain lists of )+sinesses" an! fro co#+ter
2@
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 26/45
aga:ines in &ic tose c+stoers a!vertise" not fro teir kno&le!ge
of FBLEs c+stoers.< S+c a large overla# &it FBLEs c+stoer list &as inevita)le"
)eca+se FBL a! so any c+stoers.
A California Co+rt of A##eals !ecision ?ABBA +))er Co. v. Sea*+ist
2N Cal. #tr. at @2 esta)lises tat a rea!ily ascertaina)le )y#ro#er eans affirative !efense to a clai of isa##ro#riation is
a##ro#riate +n!er certain circ+stances;
-5f te !efen!ants can convince te
fin!er of fact ] ?1 tat it is a virt+al
certainty tat anyone &o
an+fact+res certain ty#es of
#ro!+cts +ses r+))er rollers" ?2 tat
te an+fact+rers of tose #ro!+cts
are easily i!entifia)le" an! ?3 tat te
!efen!antsE kno&le!ge of te #laintiffEs
c+stoers resulted fro! that
identification process and not fro!
the plaintiff3s records" ten te
!efen!ants ay esta)lis a !efense to
te isa##ro#riation clai.
ABBA +))er Co." 2N Cal. #tr. at @29" ftnt. 9.
F, &o+l! t+s )e in te clear if tey co+l! so& tat tey i!entifie!
te c+stoers tey calle! fro te C,8%s an!Oor aga:ines
&ito+t +sing teir kno&le!ge of FBLEs c+stoer list. - &as retaine!as an e(#ert &itness to calc+late te #ro)a)ility tat certain s+)sets of
F, c+stoers &o+l! overla# &it analogo+s s+)sets of te active
FBL c+stoer list to te e(tent tat tey !o" an! tat F, &o+l!
#lace as large a n+)er of calls to FBL c+stoers as tey !i!" +n!er
vario+s ass+#tions. 'e #laintiffEs la& fir atce! te !efen!antsE
c+stoer list against te #laintiffEs" an! against a!vertiseents in te
aga:ines fro &ic F, claie! tey o)taine! ost of teir
c+stoers. 'e #laintiff agree! ?sti#+late! tat essentially all te
naes in *+estion &ere in te C,8%s. 'e #laintiffEs la& fir also
&ent tro+g te !efen!antsE tele#one recor!s an! i!entifie! calls to
FBL c+stoers an! oters. %nly local toll calls an! long !istance
calls res+lt in tele#one recor!s" so calls to FBL c+stoers &o are
close to F, co+l! not )e i!entifie!.
FBL a! 3310 active c+stoers at te tie in *+estion6 F, a! 132.
'ey a! 93 c+stoers in coon. F, claie! to ave fo+n! te
naes of 27 of teir c+stoers in local tra!e aga:ine
2N
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 27/45
a!vertiseents" an! to ave fo+n! te naes of 31 of teir c+stoers
in te C,8%s. A total of GN9 #otential )+yers of te kin! of
e*+i#ent F, sells a!vertise! in te aga:ines in *+estion6 1@2 of
te &ere FBL c+stoers. %f te 27 c+stoers F, claie! to ave
fo+n! in te aga:ines" 2N &ere c+stoers of FBL. %f te 31
c+stoers F, claie! to ave fo+n! in te C,8%s" 22 &ere
c+stoers of FBL. %f te 3310 FBL c+stoers" 17N9 &ere o+tsi!e
te San Drancisco Bay Area. %f te 132 F, c+stoers" &ere
o+tsi!e te San Drancisco Bay Area. All of te F, c+stoers
o+tsi!e te Bay Area &ere also c+stoers of FBL. %ter e(#erts
estiate! tat tere &ere ore tan 90"000 #otential )+yers of te
kin!s of e*+i#ent FBL an! F, sell in te U.S. as a &ole" an!
ore tan N0"000 o+tsi!e te San Drancisco Bay Area ?incl+!ing
Silicon Halley. 'ere &ere 290N FBL c+stoers to &o calls )y
F, &o+l! ave res+lte! in #one recor!s" an! N F, c+stoers for&o tere &ere #one recor!s" of &o @3 &ere c+stoers of
FBL. -n te ont of ay" 199@" F, #lace! a total of 10@0 calls tat
#ro!+ce! #one recor!s" an! 100N of te &ere to te @3 c+stoers
of FBL.
Presenting te !ata in a narrative is e(treely ar! to follo&. -t is
+c easier to +n!erstan! te !ata +sing a ta)le;
Table >?0@ $iBes of groups of purchasers and
potential purchasers of co!puter equip!ent of
the type sold by =8' and =DA
Table >?9@ Phone calls in =D phone records, &ay
0//)A
27
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 28/45
'ese !ata are *+antitative an! !iscrete ?tey co+nt vario+s tings.
-n C'"PT%( 9, ("-DO& "("8+%$ "-D D$C(%T% D$T(8*TO-$ " &e
sall +se tese !ata to test F,Es clai tat te large overla# of te
c+stoer lists &as inevita)le given te n+)er of c+stoers FBL
a!.
ea!ing ta)les is an e(treely i#ortant skill. 'e follo&ing e(ercises
ay give yo+ val+a)le #ractice. ?-f yo+ nee! a calc+lator" click te
Calc+lator link in te !ro#8!o&n en+ at te to# left of te screen.
%xercise >?9A Fat fraction of F,c+stoers are also c+stoers of
FBLJ
4Sol+tion5
%xercise >?>A 'e !ecial fraction ofF, c+stoers o+tsi!e te Bay Area is
4Sol+tion5
%xercise >?A Fat fraction of F,c+stoers o+tsi!e te Bay Area are
c+stoers of FBLJ
4Sol+tion5
%xercise >?)A Dill in te issingn+)ers in te follo&ing ta)le;
2
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 29/45
4Sol+tion5
Gravity Data
'e secon! set of !ata is a collection of eas+reents of g " te
acceleration !+e to gravity" a!e at Pi^on Dlat %)servatory in 199
?!ay 229" )et&een @;29;@2# an! @;G;0#. Io+ igt ree)er
fro a #ysics class tat if yo+ !ro# an o)ect" it falls faster an! faster
?it accelerates" +ntil it its te gro+n!. 'e rate at &ic it &o+l!
accelerate" in te a)sence of air resistance" is g . At artEs s+rface" g
is a)o+t 9. eters #er secon! #er secon! ?Os2. 'at is" eac
secon! an o)ect falls" it gains a)o+t 9. eters #er secon! of s#ee!.
A eter #er secon! ?Os is a)o+t 2.2G iles #er o+r ?#" so te
acceleration !+e to artEs gravity is a)o+t(9.8 m/s2 )×(2.24 mph/(m/s)) = 22 miles per hour per second.
-f yo+ go )+ngee +#ing fro ig eno+g tat yo+ fall for 2 secon!s
)efore te )+ngee starts to stretc" yo+ &ill )e going a)o+t
(22 miles per hour per second)×(2 seconds) = 44 miles per hour.
'is calc+lation neglects air resistance" &ic &o+l! slo& yo+ !o&n a
)it.
'e ta)le lists te D%"TO-$ of te 100 eas+reents fro a )ase
val+e of 9.7923 Os2" ties 10. 'at is" eac entry in te ta)le is
100000000×(measured !alue o" g in m/s2 −9.792838).
?ote tat 10 _ 10`10`10`10`10`10`10`10 _ 100"000"000. -f yo+
nee! to revie& e(#onential notation" see Assignent 1.
'e e(#eriental a##arat+s +se! to collect tese !ata is #retty slick;
-t +ses a laser an! an acc+rate tie reference to !eterine te
!istance a irrore! corner of a c+)e falls in a vac++ ca)er as a
f+nction of tie. 'e c+)e is !ro##e! in a vac++ to avoi! air
resistance" &ic &o+l! ake te eas+reents systeatically too
29
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 30/45
sall. 'ese eas+reents &ere a!e at Pi^on Dlat %)servatory )y
$len Sasaga&a an! ark R+)erge of te Scri##s -nstit+tion of
%ceanogra#y in a /olla" California. 'iny fl+ct+ations in gravity" like
tose tis instr+ent can eas+re" allo& geo#ysicists to learn a)o+t
te !istri)+tion of ass &itin te art" a)o+t oveents of te
art associate! &it te ti!es" an! &it stresses tat lea! to
eart*+akes.
Table >?>@ One hundred !easure!ents of g, the
acceleration due to gravity, at PiEon :lat
ObservatoryA The entries are 01 ti!es the
deviations of g fro! a reference value of /A./91>1
!Fs9A
Lere" te ta)+lar re#resentation !oes not ean anyting s#ecial6 it is
+st a &ay of &riting a list.
A reasona)le ateatical o!el for te o)servations is tat
(o#ser!ed !alue o" g) = (true !alue o" g) $ error
&ere te error ten!s to )e !ifferent for eac eas+reent.
Fy ake so any eas+reentsJ
< 'o increase acc+racy; rrors in !ifferent eas+reents ten! to
average o+t to soe e(tent" so one can estiate g )etter +sing an
average of a large n+)er of eas+reents tan one can +sing a single
eas+reent.< 'o assess +ncertainty; 'e varia)ility of te re#eate!
eas+reents gives an estiate of te +ncertainty of a single
eas+reent" or of te average of te eas+reents.
30
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 31/45
< 'o onitor te e(#erient; -f te fl+ct+ations get larger" or a
eas+reent is +n+s+al" soeting igt )e going &rong &it te
e(#erient.
-n later ca#ters" &e &ill ill+strate te first an! secon! #oints +sing
tese !ata.
:requency Tables-t is ar! to learn +c )y looking at tis list6 it &o+l! )e el#f+l to
s+ari:e te val+es in a ore trans#arent &ay. Fe sall )egin )y
constr+cting a :(%<*%-C6 T"8+% . A :(%<*%-C6 T"8+% lists te
fre*+ency ?n+)er or relative fre*+ency ?fraction of o)servations
tat fall in vario+s ranges" calle! C+"$$ -T%("+$ . Fe also nee!
an %-DPO-T CO-%-TO-
to )e a)le to constr+ct a :(%<*%-C6
T"8+% ; -f an o)servation falls on te )o+n!ary )et&een t&o C+"$$
-T%("+$ " in &ic class interval !o &e co+nt te o)servationJ 'e
t&o stan!ar! coices are al&ays to incl+!e te left )o+n!ary an!
e(cl+!e te rigt" e(ce#t for te rigtost class interval" or al&ays to
incl+!e te rigt )o+n!ary an! e(cl+!e te left" e(ce#t for te leftost
class interval.
et +s constr+ct a relative fre*+ency ta)le for te gravity !ata. 'ere
are no ar!8an!8fast r+les for !eterining a##ro#riate C+"$$
-T%("+$ " an! te i#ression one gets of o& te !ata are
!istri)+te! !e#en!s on te n+)er an! location of te intervals ?ore
on tis later in tis ca#ter. Fe sall +se te follo&ing nine class
intervals;
Y −160 (inclusive) to −110 (inclusive)Y −110 (exclusive) to −90 (inclusive)Y −90 (exclusive) to −70 (inclusive)Y −70 (exclusive) to −40 (inclusive)Y −40 (exclusive) to −10 (inclusive)Y −10 (exclusive) to 20 (inclusive)Y 20 ?e(cl+sive to @0 ?incl+siveY @0 ?e(cl+sive to 0 ?incl+sive
Y 0 ?e(cl+sive to 1N0 ?incl+siveote tat te en!#oint convention ere is al&ays to incl+!e te rigt
)o+n!ary an! e(cl+!e te left" e(ce#t for te leftost class interval.
'o constr+ct te fre*+ency ta)le" te ne(t ste# is to co+nt te n+)er
of !ata tat fall in eac C+"$$ -T%("+ . Co+nting is +c easier if
&e sort te !ata. T"8+% >? lists te gravity !ata sorte! into increasing
or!er;
31
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 32/45
Table >?@ $orted gravity dataA
'e first class interval contains te 9 o)servations
{−152, −132, −132, −128, −122, −121, −120, −113,
−112}.
ine is 9 of 100" so te relative fre*+ency of o)servations in te first
class interval is 9. 'e secon! class interval contains te 10
o)servations
{−108, −107, −107, −106, −106, −106, −105, −101, −101,
−99}.
'en is 10 of 100" so te relative fre*+ency of o)servations in tesecon! class interval is 10. 'e last class interval contains te t&o
o)servations b1@0" 1@@. '&o is 2 of 100" so te relative fre*+ency
of o)servations in te last class interval is 2.
'e follo&ing e(ercise cecks yo+r +n!erstan!ing of fre*+ency ta)les.
%xercise >?A Dill in te issingn+)ers in te follo&ing ?relative
fre*+ency ta)le for te gravity !ata;
32
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 33/45
4Sol+tion5
'istogra!s'e fre*+ency ta)le is easier to inter#ret tan te ra& !ata" )+t it is
still ar! to get an overall i#ression of te !ata fro it. 'e
'$TOG("& is an e(cellent tool for st+!ying te !istri)+tion of a list of
33
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 34/45
*+antitative eas+reents. A '$TOG("& is a &ay of vis+ali:ing a
fre*+ency ta)le gra#ically>of aking a #ict+re fro a fre*+ency
ta)le. 'e fraction of !ata in eac class interval is re#resente! )y a
rectangle ?8- &ose )ase is te class interval an! &ose area is te
fraction of !ata ?relative fre*+ency of !ata tat fall in te class
interval;
[o)servations in te
888888888888888888888888888888888888888
total n+)er of o)servations
so tat
:G*(% >?0 is a cartoon of a istogra;
:igure >?0@ %le!ents of a histogra!A
3G
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 35/45
'e key to a '$TOG("& is tat it is te area of te 8- " not te eigt
of te 8- " tat re#resents te relative fre*+ency of !ata in te )in.
'e area of te 8- is #ro#ortional to te relative fre*+ency of
o)servations in te C+"$$ -T%("+ . 'e ori:ontal a(is of a
istogra nee!s a scale &it +nits. 'e vertical a(is of a istogra
al&ays as +nits of #ercent #er +nit of te ori:ontal a(is" so tat te
areas of )ins ave +nits of
(hori%ontal units) × (percent per hori%ontal unit) = percent.
'e scale of te vertical a(is is a+toatically i#ose! )y te fact tat
te total area of te '$TOG("& +st )e 100 ?100 of te !ata fall
soe&ere on te #lot. 'e vertical scale is calle! a D%-$T6 $C"+% .
'e eigt of a 8- is te D%-$T6 of o)servations in te 8- ; te
3@
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 36/45
#ercentage of o)servations in te )in #er +nit of te ori:ontal a(is.
'y#ically it is not te #ercentage of o)servations in te )in.
A '$TOG("& is not te sae as a #ar chart ; -n a )ar cart" te eigt
of a rectangle ?)ar" rater tan te area of te )ar" in!icates te
relative fre*+ency of o)servations. 'e &i!t of te )ar !oes not
atter6 it !oes not even nee! to ave +nits. 'is akes )ar carts
es#ecially +sef+l for !is#laying C"T%GO(C"+ an!<*"+T"T% !ata"
&ere te ori:ontal a(is !oes not ave a scale>it is +st a &ay to
se#arate gro+#s. '$TOG("&$ are ore a##ro#riate for
<*"-TT"T% !ata.
Dor te gravity !ata" te first C+"$$ -T%("+ is from −160 to
−110, and has 9% of the data. he hei!ht of the
corres"ondin! 8- is t+s
?'e +nit is 10−#Os2.
he second class interval has $idth (−90 − (−110)) 20
units, and has 10% of the o&servations, so the hei!ht of the
corres"ondin! &in is
he last class interval has $idth 160−#0 #0 units, and has
2% of the o&servations, so the hei!ht of the last &in is
'e eigt of te last )in is one t&entiet ?.02@O.@ tat of te secon!
)in.
3N
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 37/45
he relative fre'uenc of o&servations in the second class
interval is ve times that of the last class interval (10%
versus 2%), so the area of the second &in is ve times that of
the last &in. he $idth of the second class interval is 1*4 the
$idth of the last class interval (−90−(−110) 20, versus
160−#0 #0+ 20 is 1*4 of #0). hus the second &in is
-420 times taller than the last &in.
:G*(% >?9 is a istogra of te g !eviations corres#on!ing to tese
class intervals ?+lti#lie! )y 10 as )efore;
:igure >?9@ 'istogra! of deviations of
g !easured at PiEon :lat
ObservatoryA81@081008@00@01001@0
Selecte! area; 0
Area fro;
to;
Bins; 9
:G*(% >?9 is te first "PP+%T in tis )ook>tere are any ore to
coe. 'is a##let is a #rogra &it controls yo+ can ani#+late. Dor
e(a#le" try oving te scroll )ars near te )otto of te #lot" or
ty#ing oter n+)ers into te )o(es ne(t to te scroll )ars an! ten#ressing te nter or et+rn key. -f yo+ set te Area fro te(t )o(
lo&er tan te to te(t )o(" #art of te istogra &ill cange color fro
)l+e to yello&" an! te area of te yello& #art &ill )e !is#laye! +n!er
te istogra" as Selecte! area.
$kewness and &odes
'e &or! D$T(8*TO- refers to o& n+erical !ata are
!istri)+te! on te real line. Fe can !iscover *+alitative feat+res of
te D$T(8*TO- of te !ata fro te'$TOG("& . he center ofthe data is around −0 to −40. /ost of the o&servations are
&et$een −110 and 20. he o&servations are not distri&uted
$6&&%T(C"++6 aro+n! te center; 'ey contin+e farter to te rigt
of te center tan to te left of te center. 'e !istri)+tion is sai! to )e
$4%=%D to te rigt" right&ske'ed or to ave a long right tail .
Conversely" &en te !ata are ore s#rea! o+t to te left of te
37
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 38/45
center tan to te rigt" te !istri)+tion is sai! to )e SKEWED to the
le"t " le"t&ske'ed or to ave a long le"t tail .
,istri)+tions of #rices an! incoes ten! to )e ske&e! to te rigt. Dor
e(a#le" consi!er o+se #rices. ost oes cost +n!er \100"000 to
\200"000 ?!e#en!ing on te locality" )+t a relatively sall n+)er of
oes sell for tens of illions of !ollars. Siilarly" ost faily ann+al
incoes are +n!er \N0"000" )+t a sall n+)er of #eo#le ave
ann+al incoes e(cee!ing tens of illions of !ollars. Age !istri)+tions
also ten! to )e ske&e! to te rigt6 for e(a#le" tere is +nlikely to )e
anyone in tis class yo+nger tan 1G years ol!" an! ost are )et&een
17 an! 22" )+t a fe& ret+rning st+!ents are likely to )e in teir 30s"
G0s or ol!er.
'is istogra of te gravity !ata consists of only one )+#; it is
sai! to )e *-&OD"+ . -n general" a '$TOG("& is sai! to )e
&*+T&OD"+ if it as ore tan one )+#" an! in #artic+lar 8&OD"+ if it as t&o )+#s.
'e follo&ing e(ercises ceck yo+r a)ility to +se te istogra a##let
in :G*(% >?9 .
%xercise >?.A he area under thehisto!ram &et$een −120 and 10
is
4Sol+tion5
%xercise >?1A he area under thehisto!ram &et$een −160 and
−4 is
4Sol+tion5
3
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 39/45
Percentiles and <uartiles Anoter &ay to caracteri:e a list of n+)ers is +sing P%(C%-T+%$ .
'e pt #ercentile of a list is te sallest n+)er tat is at least as
large as p% of the num&ers in the list. or exam"le, 10% of
the !ravit data are less than or e'ual to −10#, so −10# isthe 10th "ercentile of the !ravit data. he smallest num&er
that is at least as lar!e as 1% of the data is −106, so −106
is the 1th "ercentile of the data, even thou!h in fact 1% of
the o&servations are less than or e'ual to −106. he 1th
throu!h 1th "ercentiles of these data are all −106. t is much
easier to "ind percentiles "rom the sorted list than "rom the original
Soe #ercentiles ave s#ecial naes" as so&n in 'a)le istogras8
@.
Table >?)@ Co!!on na!ed percentilesA
he lo$er 'uartile is the 2th "ercentile the smallest num&er
that is at least as lar!e as 2% of the data. he median is the
0th "ercentile the smallest num&er that is at least as lar!e
as half the data. 3e ust sa$ that the median of the !ravit
data is −47. he u""er 'uartile is the 7th "ercentile the
smallest num&er that is at least as lar!e as 7% of the data.
5""roximatel half the o&servations are &et$een the lo$er
'uartile and the u""er 'uartile.
'e follo&ing e(ercises verify tat yo+ can calc+late #ercentiles.
%xercise >?/A Dill in te issing
#ercentiles for te gravity !ata;
39
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 40/45
4Sol+tion5
%xercise >?0A Lere is a list of !ata to#ractice &it. very tie yo+ re8visit or re8loa! te #age" te !ata &ill )e!ifferent.
Practice !ataGG 3G 1G 37 2N 0 839 829 830 8G181 21 31 82G 819 81@ 81G 32 3@ 82N 813 G1 8@ 3@ 81G
'e ta)le sorte! into increasing or!eris
G0
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 41/45
Practice !ata8G1 839 830 829 82N 82G 819 81@ 81G 81G 8138@ 81 0 1G 21 2N 31 32 3G 3@ 3@ 37 G1 GG
Dill in te follo&ing ta)le of #ercentiles;
%sti!ating Percentiles fro! 'istogra!s
'o fin! a P%(C%-T+% of a set of eas+reents e(actly" one nee!s
te original !ata. -n #lotting a '$TOG("& " te !ata are gro+#e! into
C+"$$ -T%("+$ " &ic ty#ically akes it i#ossi)le to fin! e(act
P%(C%-T+%$ fro a istogra. A istogra tells yo+ te #ercentage
of !ata eac class interval contains" )+t not &ere in te class interval
eac !at+ is. Lo&ever" one can fin! appro*imate P%(C%-T+%$ fro
G1
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 42/45
a '$TOG("& ; 'e pt P%(C%-T+% is a##ro(iately te #oint on te
ori:ontal a(is s+c tat te area +n!er te '$TOG("& to te left of
te #oint is p. :G*(% >?> is anoter istogra of te Pi^on Dlat g
!ata" &it e*+al8&i!t class intervals;
:igure >?>@ 'istogra! of deviations of g using equal?width binsA,ata; gravity.son Haria)le; !eviation of g
81@081008@00@01001@0
Selecte! area; 0
Area fro;
to;
Bins;
ist ,ata
n_100 ean_8G1.N70 S,[email protected]@
'is istogra as e*+al8&i!t C+"$$ -T%("+$ . Io+ can cange
te n+)er of )ins )y ty#ing a !ifferent val+e into te )o( la)ele!
Bins an! #ressing te et+rn or nter key>)+t !onEt !o tat yet. -f
yo+ click te ist ,ata )+tton" a ne& &in!o& &ill #o# +# &it a listing
of te 100 n+)ers in te gravity !ata set. 'is a##let also !is#layst&o n+)ers tat are !efine! in C'"PT%( , &%"$*(%$ O: +OC"TO-
"-D $P(%"D ; te &%"- ?average an! te $D (standard deviation).
he ran!e −12 to −44.2 is hi!hli!hted $hen ou rst o"en
this "a!e, and the !ure sho$s that the area under the
histo!ram in that ran!e is 0%. ur estimate of the median
from the histo!ram thus $ould &e −44.2. 3e sa$ earlier in
this cha"ter that the median of the data is −47 he estimate
of the median from the histo!ram is o & a &it &ecause the
data have &een !rou"ed into C+"$$ -T%("+$ in te'$TOG("& .
"e −47 into the to )o(" an! #ress et+rn or nter. 'e selecte!
area +n!er te istogra so+l! so& G. 'e !ifference )et&een
G an! @0 is also ca+se! )y te gro+#ing of !ata into C+"$$
-T%("+$ in te istogra.
G2
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 43/45
'e follo&ing e(ercise lets yo+ #ractice estiating #ercentiles fro
istogras.
%xercise >?00A stiate te follo&ing#ercentiles of te gravity !ata fro teistogra;
4Sol+tion5
o& cange te n+)er of )ins fro 9 to 30 )y ty#ing 30 into te
Bins )o( an! #ressing et+rn or nter. 'e istogra is no& ro+ger
>it as ore )+#s or o!es. 'e a##earance of a istogra
!e#en!s cr+cially on o& te class intervals are cosen. -f yo+
estiate #ercentiles fro te istogra &it 30 )ins an! &it 9 )ins"
yo+ &ill get !ifferent ans&ers.
$u!!ary'is ca#ter intro!+ce! !aria#les" an! !istinctions aong varia)les"
accor!ing to te kin!s of val+es te varia)les can take; +uantitati!e"
+ualitati!e" an!categorical . K+antitative varia)les are classifie! f+rter
as eiter discrete or continuous. ,ata>o)serve! val+es of varia)les
>can )e #resente! in any &ays. 'a)les often are easier to
+n!erstan! tan &or!s. Fen te n+)er of !ata is large" looking at
te !ata #rovi!es little insigt" )+t s+aries of te !ata can el#.
G3
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 44/45
K+antitative !ata can )e s+ari:e! +sing "re+uenc, ta#les.
Constr+cting a fre*+ency ta)le re*+ires s#ecifying class inter!als an!
an endpoint con!ention. Dre*+ency ta)les can )e #resente!
gra#ically as histograms" &ic give an i#ression of te !istri)+tion
of te !ata. -n a istogra" relative fre*+ency is re#resente! )y area.
Caracteristics of te !istri)+tion tat can )e gleane! fro a
istogra incl+!e s,mmetr, " ske'ness" an! te n+)er an! location
of modes. Lo&ever" te a##earance of tose caracteristics in a
istogra !e#en!s on te n+)er an! location of te class intervals.
-ercentiles are anoter &ay to s+ari:e te !istri)+tion of a list.
Calc+lating #ercentiles e(actly re*+ires te original !ata" )+t
#ercentiles can )e estiate! a##ro(iately fro istogras.
4ey Ter!s< a##let< )ias< )io!al< )in< categorical varia)le< class interval< contin+o+s< co+nta)le< !ensity< !ensity scale< !eviation< !istri)+tion< !iscrete< en!#oint convention< fre*+ency ta)le< istogra< lo&er *+artile< e!ian< +ltio!al< or!inal varia)le< #ercentile< *+alitative varia)le< *+antitative varia)le< *+artile< ske&e!< syetrically< +nio!al< +##er *+artile< varia)le
GG
8/18/2019 SticiGui [Statistics 21]
http://slidepdf.com/reader/full/sticigui-statistics-21 45/45
/+# to ca#ter;
W Preface W -ntro!+ction W 2 W 3 W G W @ W N W 7 W W 9 W 10 W 11 W 12 W 13 W 1G W
1@ W 1N W 17 W 1 W 19 W 20 W 21 W 22 W 23 W 2G W 2@ W 2N W 27 W 2 W 29 W 30 W
31 W 32 W 33 W©1997–2013. P.B. Stark. All rigts reserve!.
ast generate! 2O2NO201G N;30;3@ P. Content last o!ifie! 11 /+ly 2013 07;G7
P,'.