21
Citation Graph Analysis to Identify Memes in Scientific Literature Tobias Kuhn and Matjaz Perc and Dirk Helbing http://www.tkuhn.ch @txkuhn ETH Zurich Quid Inc. 11 June 2014

Citation Graph Analysis to Identify Memes in Scientific Literature

Embed Size (px)

Citation preview

Page 1: Citation Graph Analysis to Identify Memes in Scientific Literature

Citation Graph Analysis to Identify Memes inScientific Literature

Tobias Kuhn and Matjaz Perc and Dirk Helbing

http://www.tkuhn.ch

@txkuhn

ETH Zurich

Quid Inc.11 June 2014

Page 2: Citation Graph Analysis to Identify Memes in Scientific Literature

Citation Graph of Scientific Publications

Nodes: publicationsEdges: citations (in gray)

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 2 / 21

Page 3: Citation Graph Analysis to Identify Memes in Scientific Literature

Citation Graph of Scientific Publications

Nodes: publicationsEdges: citations (in gray)

Legend:Natural/Agricultural Sciences

(except Physical Sciences)

Physical SciencesEngineering and TechnologyMedical and Health SciencesSocial Sciences / Humanities

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 3 / 21

Page 4: Citation Graph Analysis to Identify Memes in Scientific Literature

Citation Graph of Scientific Publications

Nodes: publicationsEdges: citations (in gray)

Legend:Natural/Agricultural Sciences

(except Physical Sciences)

Physical SciencesEngineering and TechnologyMedical and Health SciencesSocial Sciences / Humanities

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 4 / 21

Page 5: Citation Graph Analysis to Identify Memes in Scientific Literature

Citation Graph of Scientific Publications

Entire giant component (33million nodes) of the citationgraph of Thomson Reuter’sWeb of Science dataset.

Legend:Natural/Agricultural Sciences

(except Physical Sciences)

Physical SciencesEngineering and TechnologyMedical and Health SciencesSocial Sciences / Humanities

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 5 / 21

Page 6: Citation Graph Analysis to Identify Memes in Scientific Literature

Citation Graph: American Physical Society

Citation graph of the Phys-ical Review journals (463knodes).

Legend:A: Atomic, molecular,

optical phys.B: Condensed matter,

materials phys.C: Nuclear phys.D: Particles, fields, gravitation,

cosmologyE: Statistical, nonlinear,

soft matter phys.other journals

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 6 / 21

Page 7: Citation Graph Analysis to Identify Memes in Scientific Literature

Citation Graph: Memes

Specific phrases or “memes”localize to specific regions inthe citation graph.

Legend:quantumfissiongrapheneself-organized criticalitytraffic flow

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 7 / 21

Page 8: Citation Graph Analysis to Identify Memes in Scientific Literature

Scientific Memes

“Meme” was coined by Richard Dawkins:

“Just as genes propagate themselves in the gene pool by leaping from bodyto body via sperm or eggs, so memes propagate themselves in the meme poolby leaping from brain to brain via a process which, in the broad sense, canbe called imitation.” [Dawkins, The Selfish Gene]

Examples of memes:

• Melodies

• Recipes

• Cultural habits

• Scientific concepts

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 8 / 21

Page 9: Citation Graph Analysis to Identify Memes in Scientific Literature

Genes/Memes as Network Patterns!

Dawkins’ Definition of “Gene”:“I am using the word gene to mean a genetic unit that is small enough to lastfor a number of generations and to be distributed around in many copies.”[Dawkins, The Selfish Gene]

Our Working Definition of “Scientific Meme”:

A scientific meme is a short unit of text in a publication that is replicated inciting publications and thereby distributed around in many copies.

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 9 / 21

Page 10: Citation Graph Analysis to Identify Memes in Scientific Literature

Propagation Score

Propagation score P quantifies the degree to which a meme’soccurrence aligns with the citation graph:

Pm =sticking factor

sparking factor=

?

/?

=dm→m

d→m

/dm→�md→�m

To prevent that some infrequent phrases get a high propagation scoreby chance, we can add small amount of controlled noise δ (we useδ = 3):

Pm =dm→m

d→m + δ

/dm→�m + δ

d→�m + δ

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 10 / 21

Page 11: Citation Graph Analysis to Identify Memes in Scientific Literature

Frequency/Propagation Score for APS Datarelative

frequency

10−2

100

102

104

106

10−6

10−4

10−2

100

APS

n = 1,372,365

quantum

fissiongraphene

self-organizedcriticality

traffic flow

propagation score →

density

ofn-grams:

100

101

102

103

104

105

1

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 11 / 21

Page 12: Citation Graph Analysis to Identify Memes in Scientific Literature

Randomized Networkrelative

frequency

10−2

100

102

104

106

10−6

10−4

10−2

100

APSrandomized

(time preserving)

n = 89,356

propagation score →

density

ofn-grams:

100

101

102

103

104

105

1

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 12 / 21

Page 13: Citation Graph Analysis to Identify Memes in Scientific Literature

Meme ScoreMeme score M as the Product of relative frequency f andpropagation score P:

Mm = fmPm

Top 20 Memes:

1. loop quantum cosmology+* 11. dark energy+*2. unparticle+* 12. Rashba3. sonoluminescence+* 13. CuGeO3

+

4. MgB2+ 14. strange nonchaotic

5. stochastic resonance+* 15. in NbSe3

6. carbon nanotubes+* 16. spin Hall+

7. NbSe3+ 17. elliptic flow+*

8. black hole+* 18. quantum Hall+*9. nanotubes+ 19. CeCoIn5

+

10. lattice Boltzmann+* 20. inflation+

+ annotators agreed that this is an interesting and important physics concept

* also found on the list of terms extracted from Wikipedia

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 13 / 21

Page 14: Citation Graph Analysis to Identify Memes in Scientific Literature

Properties of the Meme Score

The meme score has a number of nice properties:

• Can be calculated efficiently and exhaustively even on very largedataset

• No upper limit on the length of n-grams

• No dependence on external linguistic or ontological knowledge

• No stop-word lists or other kinds of arbitrary filters or thresholds

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 14 / 21

Page 15: Citation Graph Analysis to Identify Memes in Scientific Literature

Manual Annotation

• Two annotators (A1, A2): PhD students with physics degree• Annotation with respect to (1) physics concept or not and (2)

linguistic category• Randomly extracted phrases for comparison

physics concept not a physics concept

noun phrase verb adjective or adverb other

meme score

A1A2A1A2

random

A1A2A1A2

weighted random

terms30 60 90 120 150

A1A2A1A2

1

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 15 / 21

Page 16: Citation Graph Analysis to Identify Memes in Scientific Literature

Comparison to Alternative Metrics

0 0.1 0.2 0.3 0.4 0.5

meme score

frequency

max. absolutechange

over time

max. relativechange

over time

max. absolutedifference

across journals

max. relativedifference

across journals

A (area under curve)

101

102

103

0

20

40

60

80

100

top x terms by meme score

pe

rce

nta

ge

of

Wik

ipe

dia

te

rms

40% of top 50 terms are found on Wikipedia list

1

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 16 / 21

Page 17: Citation Graph Analysis to Identify Memes in Scientific Literature

Evolution over Time: Exemplary Memes

0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 105

0

2

4

6

8

10

12

14

publication count

mem

e s

core

(δ =

1)

19

4019

6019

7019

8019

8219

8419

8619

8819

9019

9219

9419

9619

9820

0020

0220

0420

0620

08

quantum

fission

graphene

self−organized criticality

traffic flow

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 17 / 21

Page 18: Citation Graph Analysis to Identify Memes in Scientific Literature

Evolution over Time

0.5 1 1.5 2 2.5 3 3.5 4 4.5

x 105

0

2

4

6

8

10

12

publication count

mem

e sc

ore

1940

1960

1970

1980

1982

1984

1986

1988

1990

1992

1994

1996

1998

2000

2002

2004

2006

2008

grapheneentanglement

MgB2

nanotubescarbon nanotubes

quarkneutrino

Bose−Einsteinquantum Hall

blackC

60Hubbard model

quantum wellsgraphite

reactionsphotoemission

black holetricritical

Kondosuperconducting

fissionMeV

diffuse scattering

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 18 / 21

Page 19: Citation Graph Analysis to Identify Memes in Scientific Literature

Meme Score Calculation

1 Collect all phrases that stick at least once (not counting“free-riding” on larger memes)

2 Calculate sticking and sparking factors for all collected phrases(Mm = fmPm with Pm =

sticking factor

sparking factor=

dm→m

d→m + δ

/dm→�m

+ δ

d→�m+ δ

)

Example

Citing title:covariant effective action for loop quantum cosmology from order reduction

Cited titles:– quantum nature of the big bang– absence of a singularity in loop quantum cosmology– large scale effective theory for cosmological bounces

Sticking phrases: loop quantum cosmology, quantum, effective, forSparking phrases: covariant, covariant effective action, order reduction, ...

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 19 / 21

Page 20: Citation Graph Analysis to Identify Memes in Scientific Literature

Conclusions

Inheritance patterns of memes in the scientific citation graph reveal asimple mathematical regularity.

This regularity can be formalized by the meme score.

Allows for studying memes in an exhaustive manner.

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 20 / 21

Page 21: Citation Graph Analysis to Identify Memes in Scientific Literature

Thank you for your Attention!

Twitter: @txkuhn

Pre-print article:http://arxiv.org/abs/1404.3757

Tobias Kuhn, ETH Zurich Citation Graph Analysis to Identify Memes in Scientific Literature 21 / 21