43
© Tefko Saracevic 11 BIBLIOMETRICS Tefko Saracevic Rutgers University http://www.scils.rutgers.edu/~tefk o

© Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

Embed Size (px)

Citation preview

Page 1: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 11

BIBLIOMETRICS

Tefko SaracevicRutgers Universityhttp://www.scils.rutgers.edu/~tefko

Page 2: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 2

What is?

“… all studies which seek to quantify processes of written communication.”

Pritchard

“… the quantitative treatment of the propertiesd of recorded discourse and behavior pertaining to it.”

Fairthorne

Recorded communication - ‘literature’->quantitative methods

Page 3: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 3

Alan Pritchard 1969

Coined the term "bibliometrics""the application of mathematics and

statistical methods to books and other media of communication“

Journal of Documentation (1969) 25(4):348-349

Page 4: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 4

and other related metrics …

Also used to study broader than books, articles …Scientometrics

covering science in general, not just publications

Infometrics all information objects

Webmetrics or cybermetrics web connections, manifestations using bibliometric techniques to study the

relationship or properties of different sites on the web

Page 5: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 5

Concepts

Basic (primitive) concepts:1. Subject2. Recorded communication ->

document, information object3. Subject literatureBibliometrics related to:

science of sciencesociology of science - numerical methods

Page 6: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 6

Literature studies

Qualitativeoften in humanities, librarianship

Quantitativebibliometrics

Mixed

Page 7: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 7

Reasons for quantitative studies of literature

Analysis of structure and dynamicssearch for regularities - predictions

possibleUnderstanding of patterns

“order out of documentary chaos”verification of models, assumptions

Rationale for policies & design

Page 8: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 8

Why quantitative studies?

Qualitative methods often depend on assertions. ‘authoritative’ statements, anecdotal evidence

Science searches for regularitiesSuccess of statistical methods in social

sciencesNeed for justification & basis for decisionsSomething can be counted - irresistible

Page 9: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 9

Application in ...

History of scienceSociology of scienceScience policy; resource allocationLibrary selection, weeding, policiesInformation organizationInformation management

utilization

Page 10: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 10

Historical note

Bibliometrics long precedes information science

But found intellectual home in information sciencestudy of a basic phenomenon - literature

It is not ‘hot’ lately, but still produces very interesting results

Branched out into web studies (web is a “literature” as well)

Page 11: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 11

What studied?

Governed by data available in documents or information resources in general - that what can be countedauthor(s)origin

organization, country, language

source journal, publisher, patent …

Page 12: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 12

what … more

contents text, parts of text, subject, classes

representationcitations

to a document, in a document, co-citationutilization

circulation, various useslinksany other quantifiable attribute

Page 13: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 13

Tools

Science Citation IndexCompilation of variables from

journals in a subjectUse dataPublication counts from indexes, or

other data basesWeb structures, links

Page 14: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 14

Variable: authors

number in a subject, field, institution, countrygrowth correlation with indicators like GNP, energy etc.productivity e.g. Lotka’s lawcollaboration - co-authorship, associated networksdynamics - productive life, transcience, epidemicspapers/author in a subjectmapping

Page 15: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 15

Variable: origin

Rates of production, size, growth bycountry, institution, language, subject

Comparison between theseCorrelation with economic & other

indicators

Page 16: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 16

Variable: sources

Concentration most often on journalsGrowth, dynamics, numbers

information explosion - exponential lawstime movements, life cycles

Scatter - quantity/yield distributionBradford’s law

Various distributions by subject, language, country

Page 17: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 17

Variable: contents

Analysis of textsdistribution of words – Zipf’s lawwords, phrases in various partssubject analysis, classificationco-word analysis

Page 18: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 18

Variable: representation

frequency of use of index terms, classesdistribution laws - key terms where?thesaurus structure

Page 19: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 19

Variable: citations

Studied a lot; many pragmatic resultsbase for citation indexes, web of science,

impact factors, co-citation studies etcDerived:

number of references in articlesnumber of citations to articles

research front; citation classics

bibliographic coup[ling

Page 20: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 20

citations … more

co-citations author connections, subject structure,

networks, maps

centrality of authors, papers

validation with qualitative methodsimpact

Page 21: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 21

Variable: utilization

frequencydistribution of requests for sources,

titles e.g. 20/80 law

relevance judgement distributionscirculation patternsuse patterns

Page 22: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 22

Variable: links

Development of link-based metricsin-links, out-links

Web structureWeb page depth; updatePageRank vs quality

Page 23: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 23

Examples from classic studies

Comparative publications over centuriesNumber of journals founded over timeNumber of abstracts published over

timeNational share of abstracts in chemistryNational scientific size vs. economy sizeBibliographic coupling and co-citationWeb structures, links

Page 24: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 24

Examples of laws & methods

Lotka’s lawBradford’s lawZipf’s lawImpact factorCitation structuresCo-citation structures

Page 25: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 25

Alfred J. Lotka 1926

Statistics—the frequency distribution of scientific productivity

Purpose: to "determine, if possible, the part which men of different calibre contribute to the progress of science“Looked at Chemical Abstracts Index, then

Geschichtstafeln der Physik J. Washington Acad. Sci. 16:317-325

Page 26: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 26

Lotka’s law: xn • y = C

The total number of authors y in a given subject, each producing x publications, is inversely proportional to some exponential function n of x.

Where: x = number of publications y = no. of authors credited with x

publications n = constant (equals 2 for scientific

subjects) C = constant

inverse square law of scientific productivity

Page 27: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 27

1 publ. 2 publ. 3 publ. 4 publ.

Lotka's Law - scientific publications

xn • y = C

No

. of

auth

ors

Page 28: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 28

Samuel Clement Bradford 1934, 1948

Distribution of quantity vs yield of sources of information on specific subjects he studied journals as sources, but applicable to other what journals produce how many articles in a subject

and how are they distributed? or How are articles in a subject scattered across journals?

Purpose: to develop a method for identification of the most productive journals in a subject & deal with what he called “documentary chaos”

First published in: Engineering (1934) 137:85-86, then in his book Documentation, (1948)

Page 29: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 29

Bradford’s law

"If scientific journals are arranged in order of decreasing productivity of articles on a given subject, they may be divided into a nucleus of periodicals more particularly devoted to the subject and several groups or zones containing the same number of articles as the nucleus, when the numbers of periodicals in the nucleus and succeeding zones will be as a : n : n2 : n3 …"

Page 30: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 30

Bradford's Law of Scattering – an idealized example

No. of source journals

121224

10755

No. of articles per source

60353025986543

Total no. of articles

60703050183260352015

9

27

130

130

1303

Page 31: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 31

Bradford's Law of Scattering – zones

3 sources 130 articles

9 sources 9 sources 130 articles130 articles

27 sources 27 sources 130 articles130 articles Garfield hypothesis

nucleus

Page 32: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 32

George Kingsley Zipf 1935, 1949

The psycho-biology of language: an introduction to dynamic philology (1935)

Human behavior and the principle of least effort: An introduction to human ecology (1949)

Looked, among others, at frequency distributions of words in given textscounted distribution in James Joyces’ Ulysses

Provided an explanation as to why the found distributions happen:

Principle of least effort

Page 33: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 33

Zipf’s law: r • f = c

Where:r = rank (in terms of frequency)f = frequency (no. of times the given word is used in the text)c = constant for the given text

For a given text the rank of a word multiplied by the frequency is a constant

Works well for high frequency words, not so well for low – thus a number of modifications

Page 34: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 34

Charles F. Gosnell 1944 Obsolescence

He studied obsolescence of books in academic libraries via their use

• College Res. Libr. (1994) 5:115-125

But this was extended to study of articles via citations, and other sources

Age of citations in articles in a subject:half life – half of the citations are x year old etc

different subjects have very different half-lives

Page 35: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 35

Curve of obsolescence

Nu

mb

er o

f u

sers

Age at time of use

Page 36: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 36

Eugene Garfield 1955

Focused on scientific & scholarly communication based on citations

• Science (1995) 122:108-111

Founded Institute for Scientific Information (ISI)major proeduct now ISI Web of Knowledge

Impact factor for journals, based on how much is a journal cited

Mapping of a literature in a subjectCitation indexes/web of knowledge

MAJOR resources in bibliometric studies

Page 37: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 37

citedarticle

Citation matrix

citedarticle

citedarticle

article

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

Page 38: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 38

citedarticle

Science Citation Index

citedarticle

citedarticle

article

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

citingarticle

Association-of-ideas index

Page 39: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 39

Co-citation analysis

Articles that cite the same article are likely to both be of interest to the reader of the cited article

article

citingarticle

citingarticle

These two articles are likely to be related

Page 40: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 40

Impact factor (IF)

number of citations received in current year by papers published in the journal in the previous two yearsdivided by

number of papers published in the journal in the previous two years

IF has become over time a crucial indicator of journal quality andgiven ISI a monopoly position in the evaluation of

journal qualityReported in Journal Citation Reports (1976-)

Page 41: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 41

Garfield’s HistCite

“Bibiliographic Analysis and Visualization Software”

Provides citation statistics & graphs for people, journals, institutions …various citations scores, no. of cited references

in articles … various graphs with connections

Example: articles and authors for JASIST (and predecessor names) for 1956-2004includes citations to authors

Page 42: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 42

Conclusion

Bibliometrics, & related scientometrics, infometrics, webmetrics provide insight into a number of properties of information objectssome general, predictive “laws”

formulatedstructures have been exposed, graphedmyriad data collected & analyzed

A good area for research!

Page 43: © Tefko Saracevic11 BIBLIOMETRICS Tefko Saracevic Rutgers University tefko

© Tefko Saracevic 43

Sources used in making this presentation– among others

Ruth Palmquist BibliometricsDonna Bair-Mundy Boolean, bibliometrics, Boolean, bibliometrics,

and beyondand beyond Short set of bibliometric exercises by J. DownieShort set of bibliometric exercises by J. Downie

http://people.lis.uiuc.edu/~jdownie/http://people.lis.uiuc.edu/~jdownie/biblio/biblio/