46
Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern The Déjà vu team (Mounir Errami, Tara Long, Angela George, Johnny Sun, Justin Hicks, David Trusty, Jonathan Wren and Skip Garner), the lab and all those that have

Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Embed Size (px)

Citation preview

Page 1: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Déjà vu – A study of Plagiarism and Duplication in Medline

McDermott Center for Human Growth and Development

Division of Translational Research

UT Southwestern

The Déjà vu team (Mounir Errami, Tara Long, Angela George, Johnny Sun, Justin Hicks, David Trusty, Jonathan Wren and Skip Garner), the lab and all those that have contributed

Page 2: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

eTBLAST – electronic Text Basic Local Alignment and

Similarity Tool eTBLAST is an alternative to PubMed and

other text databases, using full text similarity search rather than keyword

searches to get better results.

Page 3: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

What is eTBLAST?

• eTBLAST is a new search engine available on the web for free. The major difference between other keyword-based search engines and eTBLAST is that eTBLAST takes text (paragraph, etc.) as input without the user having to select keywords for their search, and then it returns the records in the target database that are most similar, not just the ones that contain the keywords.

• eTBLAST will compare your input text to literature databases such as Medline, NIH CRISP, Institute of Physics, NASA, or custom databases (on request) to provide a list of the most similar literature.

• The input for eTBLAST is a set of text (a paper title/abstract, a grant title/abstract, a text description of a study section, and invention disclosure, a description of a legal case), and the output is a user friendly set of web pages.

• eTBLAST can help you:– Find references, collaborators and competitors

– Evaluate the novelty and popularity (timeliness) of a paper, proposal, or invention disclosure

– Find experts as potential reviewers for papers/grant applications or legal cases

– Find the best Journals in which to publish your findings or the best NIH institutes in which to submit your grant applications

– Find plagiarism and duplicate articles (self-plagiarism)

• We also can do custom comparisons on proprietary data to compile collaborative networks (help people find people), help organize and balance grant portfolios or scientific meetings

Page 4: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

eTBLAST has a Google-like interface.

Paste your text in here

Select database to search

And search

Page 5: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Where is eTBLAST?

• It is easy to find, just Google, “eTBLAST”

• It is also linked from Wikipedia

• Or just go to http://invention.swmed.edu/etblast/etblast.shtml

Page 6: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

An example!

• Here we present an example of what you can do on the web. Here we use an abstract for a grant supported by the Komen Foundation (2005-2006, David Gilley, PI, Indiana University).

• We then find the most similar related papers in Medline, and then continue on with the analysis to find recommended reviewers, journals and publication history for the topic.

Detection and Analysis of Telomere Dysfunction in Breast Tumor Tissue. Loss of telomere function is known to result in telomere fusions causing genomic instability via breakage-fusion-bridge cycles during subsequent cell cycles. The goal of this proposal is to establish the extent of telomere dysfunction during breast tumorigenesis. Several recent reports support this hypothesis, yet the extent of telomere dysfunction in human breast cancer has not been directly determined. The research proposed here is innovative because we have developed a method to directly detect and analyze telomere dysfunction from tumor tissue. Using this innovation, we have generated critical preliminary evidence that: 1) breast tumors, but not normal tissue, contain telomere fusions; and 2) telomere fusion junctions contain relatively short fragments of non-telomeric, previously identified fragile DNA regions. Objective/Hypothesis: The objective of this application is to directly determine the extent of telomere dysfunction during breast tumorigenesis. The central hypothesis of this proposal is that telomere dysfunction is one of the key driving forces behind the genomic instability observed in early breast lesions. We propose that telomere capping is disrupted in a small subset of normal breast epithelial cells. This loss of telomere function then results in telomere fusions via recombinational mechanisms, causing genomic instability via breakage-fusion-bridge cycles during subsequent cell cycles. Specific Aim: Here we directly test the hypothesis that telomere dysfunction is an important cause of genomic instability in breast cancer by the following specific aim: 1) determine the extent of telomere fusions in breast tumor tissue. Study Design: We will use our recently developed method to detect and analyze telomere fusions from isolated genomic DNA. Our preliminary results indicate that telomere fusion junctions contain relatively short fragments of non-telomeric, previously identified fragile DNA regions within the telomere-to-telomere fusion junctions. This finding provides us with critical clues regarding possible mechanisms responsible for the formation of these fusion junctions, which are likely formed via recombinational modes. Additionally, using our telomere fusion detection method, we have found three specific classes of fusion junctions that occur both in a human mammary epithelial model system and breast tumor tissue. We will test the stage during tumorigenesis in which fusions occur and the prevalence of telomere dysfunction in multiple breast tumor samples to determine the prognostic significance of these findings. In addition, we will determine whether only specific or any chromosome can form telomere fusions in tumor tissue samples. Potential Outcomes and Benefits of the Research: These studies will be critical for further understanding of the cause and extent of telomere dysfunction in breast cancer. Notably, from our preliminary results, we expect the occurrence of telomere dysfunction to be an early event in the development of breast cancer. Therefore, translational applications for early detection and treatment are expected to be highly relevant.

Page 7: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

eTBLAST results are linked to the full abstract and other tools

Raw self-similarity score of query

Most similar record

Raw similarity score between query and this record

Z-Score probability of this occurringby chance

Page 8: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Abstracts are presented with common keywords highlighted.

Page 9: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Those that publish the most in the area are experts, potential reviewers, collaborators,

competitors

Please note, David Gilley, the PI on the grant which was the query was identified as one of the experts, number 22, having published 2 papers in the area, the last

being in 2005

Page 10: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Journals where many similar manuscripts have been published

are targets for submission

Page 11: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

A historical analysis will tell you if your work is timely

Page 12: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

eTBLAST has more

• There is other functionality, please explore.

• Also, there are new features always being worked on. If you have suggestions, we will try to get them in.

Once we have this text similarity tool working well, there are other things that it can allow us to do:

Page 13: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Become an ethics detective!

Page 14: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Motives for Scientific Misconduct can drive inappropriate behavior.

• Funding and career pressures of the contemporary research environment.

• Inadequate institutional oversight.

• Inappropriate forms of collaborative arrangements between academic scientists and commercial firms.

• Inadequate training in the methods and traditions of science.

• The increasing scale and complexity of the research environment, leading to the erosion of peer review, mentorship, and educational processes in science.

• The possibility that misconduct in science is an expression of a

broader social pattern of deviation from traditional norms.

National Academy of Sciences 1992

Page 15: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Violations of Accepted Practices have been estimated by surveys.

• Faking research data- 0.3%

• Plagiarism- 1.4%

• Multiple publications of the same data- 4.7%

• Removing data- 6%

• Inappropriate inclusion of authors- 10%

• Changed a study design – 15%

• Inadequate record keeping- 27.5%Anonymous questionnaire, sent to 8,000 with 3,234 respondents (Martinson et al. 2005)

Page 16: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

But eTBLAST can directly identify putative plagiarism and duplicate publications (self-plagiarism) by measuring the similarity between a query and the records in a target database.

Page 17: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Known duplicate articles are tagged in Medline, and can be used as a training set

Rank 1 & Rank 2 Scores on 409 PubMed Duplicates

0

10

20

30

40

50

60

70

2 3 4 5 6 7 8 910 20 30 40 50 60 70 80 90 100

200

300

400

500

600

700

800

900

Scores

Cou

nts

Rank 1 Score

Rank 2 Score

Rank 1 & Rank 2 Scores on 5200 PubMed Abstracts

0

200

400

600

800

1000

1200

1400

1600

1800

2000

2 3 4 5 6 7 8 91

02

03

04

05

06

07

08

09

01

00

20

03

00

40

05

00

60

07

00

80

09

00

ScoreC

ou

nts

Rank 1 Score

Rank 2 Score

Duplicate article scores and self-similar article scores are very similar, much more so than what is observed for the average distribution of randomly selected Medline articles

Page 18: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Duplicate publications are enriched in the raw score and ratio space

Using the synthetic random abstract distribution, we compute a z-score. First, we choose a cutoff at z=3 which should capture 95% of the duplicates. Second, we established a Rank 2/ Rank 1 ratio cutoff at 0.56Third, using the known Medline duplicates and these constraints we can estimate the specificity and sensitivity.

0

50

100

150

200

250

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Ratio Rank 2/Rank 1

Ra

nk

2 S

core

Randomly Selected Abstracts

False Positive

Confirmed duplicates from random abstracts set

Z=3

Ratio threshold

Page 19: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

And is problem getting worse or better?

• To determine if the problem of duplicate publications is getting better or worse, and if it correlates with any other factor, such as NIH budget, etc. we conducted a study that spans the last 12 years.

• 5,000 Medline abstracts and titles were selected in each of the last 12 years and eTBLASTed against all of Medline.

• All putative duplicate pairs were verified by eye.

• The overall (inappropriate) duplicate rate appears to be constant and about 0.6% (58,300 in Medline). Plagiarism is about 0.04% (3,500 in Medline). Other forms of duplicate publications are also about 0.68%

0.00

0.10

0.20

0.30

0.40

0.50

0.60

0.70

0.80

0.90

1.00

1995 1997 1999 2001 2003 2005 2007

Year

Rat

e (%

)

Duplicate/SA

Duplicate/DA

Duplicate/Update/(SJ+DJ)

Duplicate/Other

Page 20: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

The time lag between the appearance of the original and duplicate citation is informative

0

10

20

30

40

50

60

70

80

0 2 4 6 8 10 12

Time between publications (year)

Fre

qu

enc

y (c

ou

nt)

DUPLICATE/SA

DUPLICATE/DA

DUPLICATE/UPDATE/DJ+SJ

DUPLICATE/OTHER

TOTAL

Co-submission?

Page 21: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Similarity is obvious. If you are going to cheat, don’t be so lazy, change the title and

abstract more

Page 22: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Are duplicate abstracts representative of duplicate articles?

Yellow represents exact text match of this paper (G Schuller-Levis and E Park, Taurine and its chloramine: modulators of immunity, a mini-review, PMID 14992270) and a paper accepted 5 days after this one was submitted, PMID 14553911)

Page 23: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Another example, a paper by Shashank R. Joshi

• 95% of text is highly similar to 3 other articles• Does not cite any of the original articles• Editor-in-chief of journal in which article was published!

Page 24: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

How many times can you publish the same thing?

Page 25: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

There are multiple offenders

Page 26: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

A Case of Multiple Duplicate

Publication – M Shahrudin

highly similar to Shahrudin article Shahrudin article

highly similar Shahrudin article

Shahrudin Article Similarity Analysis

Ann Surg 223(3), 273-9 (1996)PMID 8604907*

Am Surg 59(11), 736-9 (1993)PMID 8239196**

Hepatogastro 44, 559-63 (1997)PMID 9164537**

J Hep Bil Surg 4, 205-8 (1997)not in Medline**

Int Surg 82(3), 269-74 (1997)PMID 9372373*

Am Surg 61(2), 165-8 (1995)PMID 7856979**

Hepatogastro 44, 519-21 (1997)PMID 9164529**

J Hep Bil Surg 4, 209-11 (1997)not in Medline**

Hepatogastro 44, 441-4 (1997)PMID 9164516

Hepatogastro 44, 284-7 (1997)PMID 9058160**

Ann Saudi Med 17(4), 460-1 (1997)PMID 17353603 - no abstract**

Med J Malaysia 49(2), 172-3 (1994)PMID 8090098

Int Surg 77(3), 219-23 (1992)PMID 1399374**

Med J Malaysia 48(4), 449-52 (1993)PMID 8183172**

Eur J Surg 158(4), 249-50 (1992)PMID 1352142**

J Hep Bil Surg 3(3), 317-8 (1996)not in Medline**

Med J Malaysia 51(1), 159 (1996)PMID 10968002 - no abstract**

Duplicate/DA Duplicate/SA

* duplicates discovered with eTBlast search** duplicates discovered with manual search

Page 27: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

An exhaustive search is impractical

• Infeasible to compare all citations to all citations:– Over 16 million citations - 1014 comparisons

• We noticed that 75% of our “duplicates” was the first article in Pubmed’s related articles list

• However, binary comparisons of all Medline related articles is do-able– ~ 1.6x106 comparisons– 2 weeks on our 240 processor machine

Database vs.

Page 28: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Our binary comparison of related articles has been completed

• 990,909,351 - Total pairs of related articles

• 7,357,473 - Pairs of the top related articles

• 7,064,721 – Pairs calculated successfully

• 257,808 – Unsuccessful pairs (no abstracts)

• Over 70,000 - Non-redundant Pairs loaded to Déjà vu

Page 29: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Access the Déjà vu database at:

spore.swmed.edu/dejavu

Page 30: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

We continuously updating the content via additional computer similarity searches, and also through manual inspection of full text of highly similar citations, we are moving entries from “unverified” to other categories.

Page 31: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Now, on a much larger set, we can start to correlate this with other factors – NIH budget?

All duplicates

0

1000

2000

3000

4000

5000

6000

Year

Co

un

t

Earlier papers

Later papers

Cost of Scientific Productivity

050000

100000150000200000250000300000350000400000450000

19

38

19

41

19

44

19

47

19

50

19

53

19

56

19

59

19

62

19

65

19

68

19

71

19

74

19

77

19

80

19

83

19

86

19

89

19

92

19

95

19

98

20

01

20

04

Year

Infl

ati

on

Ad

jus

ted

NIH

Bu

dg

et

(x$

10

0,0

00

) a

nd

PM

IDs

/ye

ar

Page 32: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

And duplicates with the same authors

Duplicates with shared authors

0

1000

2000

3000

4000

5000

6000

Year

Co

un

t

Earlier papers

Later papers

Page 33: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Duplicates with differing authors

Duplicate with different authors

0

100

200

300

400

500

600

700

800

900

Year

cou

nt

Earlier papers

Later papers

Page 34: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

From Medline tags we can look at the languages involved in the duplications

Duplicates - Languages

0

10000

20000

30000

40000

50000

60000

70000

por dut cze hun pol spa dan ita rus chi fre ger jpn eng

Co

un

ts Earlier paper

Later paper

Page 35: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

…. and we can look at the countries involved in the duplications

Duplicates - Published Country

0

5000

10000

15000

20000

25000

30000

35000

Co

un

try

Earlier paper

Later paper

Duplicates - Same language

60695

10661

0

10000

20000

30000

40000

50000

60000

70000

Same language Different Language

Co

un

t

Page 36: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

With this dataset, a quantitative study of publication ethics can be done

• Via a survey of déjà vu users• Via in depth analysis of special categories

– Duplicate/SA: Duplication from one’s own previous work (1216 entries)

– Duplicate/DA: Duplication of another author’s work (130 entries where full text has been manually analyzed)

– Questionnaire sent out to all stakeholders (authors and journal editors) for all verified Duplicate/DAs

• Via in depth analysis of specific biomedical disciplines• Collaboration with Dr. Loadsman, Senior Staff Specialist Anesthetist

at Royal Prince Alfred Hospital in Camperdown, Australia

Page 37: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Responses to our on-line survey for Déjà vu users

• 203 responses as of April

• 3.4% (7/203) of respondents admitted to publishing Duplicate/SA– In agreement with 2005 study (Martinson, et al.)

– 4.7%

• 9% were unaware that many journals have copyright limitations on the reuse of text and figures

• “What percent of text among two papers do you think would need to be highly similar in order to qualify as duplication?”– Average response: 33%, Standard Deviation:

21%, Range of 1% to 95%

– Average text similarity among Déjà vu Duplicate/DAs: 85%

Page 38: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

The Questionnaire

• Sent out via email to all authors and editors of both publications, asking questions like:– Were you aware of the later article? Is there an explanation for the similarity? Was the earlier

article copyrighted? Were permissions requested/given to re-use the material?

– What do you think of the frequency of duplicate publication/plagiarism? - “I think our awareness of such (and the ability to detect occurrences) has increased, while our tolerance for such has decreased.”

• We have had a remarkable response rate to the questionnaire; at least someone has responded 83% of the time. It is usually the authors of the earlier article that have the combined emotions of surprise, astonishment, and are in general offended and feel violated. 36% of the respondents were editors of the journal in which the duplicate appeared.

• The questionnaire has so far resulted in 9 retractions (16%) and 19 investigations are pending (48%). (We have also had 5 Duplicate/SA retractions)

• In 2007, Medline retracted 112 articles (953 total)– 25 - duplicate publication (SA)

– 16 - plagiarism (DA)

– Many more attributed to falsification of data

• In Déjà vu, we have computer identified 8,439 highly similar citations with different authors, of which we have verified manually 131 by inspecting full text. The questionnaire has been sent only to about a third of the 131 so far. We anticipate sending out thousands.

Page 39: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

All together, our data had lead us to ponder such questions as:

1) Are there too many journals?,

2) Are there too many journals indexed in Medline (and therefore receive equal access in response to searches)?,

3) What are the criteria for being indexed in Medline, or better yet, what would it take to be removed?,

4) Are there too many review articles?,

5) What is the value of authors assigning copyright to journals if journals do not enforce them?,

6) Is the pressure to publish really distorting the real purpose of publication?,

7) How has open access affected these behaviors?,

8) Should a new class of publication be created called an ‘update’ where additional material can be contributed by an author to a previous publication, while still getting credit for the advance without having to restate and republish a large fraction of something previous?,

9) What is the general response to all involved, and what actions are likely, justified, fair and relevant?,

10) Are publication behaviors linked to other ethically questionable behaviors?,

11) What constitutes a retraction?, and most important

12) How often does a clinician unknowingly base a patient diagnosis or therapy upon a plagiarized or otherwise questionable paper, and how does this affect patient care?

Page 40: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Sampling of responses from authors of early paper

• “[My] major concern is that false data will lead to changes in surgical practice regarding procedures.” (#3271)

• “Imitation is the sincerest form of flattery?” (#3179)• “We would have been happy to share our knowledge and experience on the field with

the [duplicate] authors if they would have contacted us in advance of their publication.” (#28461)

• “I have been a research scientist for more than 50 years, and this is the first time I’ve ever experienced such a blatant case of plagiarism. It sure was an eye opener!” (#5296)

• “Haven't really ever seen something like this, including the fact that they seemingly worked with HLA-A2 (!!) expressing monkeys. Must make a hell of a tool for future studies...” (#7719)

• “We were very sorry and somewhat surprised when we found their article. I don't want to accept them as scientists.” (#47257)

• “I have no statement. I cannot prove that this is plagiarism. Even if it is, what can be done?” (#505)

• “Plagiarism is not to be tolerated. It is especially reprehensible when committed by a physician, to whom society has placed great trust as a steward of medical knowledge and patient care.” (#39716)

Page 41: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Sampling of responses from authors of the duplicate paper

• “There are probably only "x" amount of word combinations that could lead to "y" amount of statements . … I have no idea why the pieces are similar, except that I am sure I don not have a good enough memory and  it is certainly not photographic, to have allowed me to have "copied" his piece. … I did in fact review it [the original article] for whatever journal it was published in. …” (#46778)

• “Our main goal was to spread the knowledge into the local investigation community, so it was published in a local journal as a review article.” (#3179)

• “It was a joke, a bad game, an unconscious bet between friends, ten years ago that such things could happened. I deeply regret.” (#53207, author has 6 Duplicate/DAs in Déjà Vu, Vice President of the National Ethics Committee of Gabon, Still publishing- last article came out in Mar 2008)

• “… I had not intention to duplicate ideas, but only to divide previous experience of masters in nephrostomy…Some phrases were similar to that of the previous article written 4 years before by Chiou, Chang and Horan about 37 patients, in fact that article inspired our work. And of that previous work we divided the same ideas and conclusions. I seize the opportunity to congratulate to Chiou, Chang and Horan for their previous and fundamental paper….” (#4671)

• “I would like to offer my apology to the authors of the original paper for not seeking the permission for using some part of their paper. I was not aware of the fact I am required to take such permission.” (#73845)

• “To underline my good faith and my professionalism, I would like to stress that my contribution to the article was limited to the collection of clinical data: [the senior author] alone was responsible for the use of the data provided.” (#25648)

• “I know my careless mistake resulted in a severe ethical issue. I am really disappointed with myself as a researcher.” (#19592)

Page 42: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Sampling of responses from editors of journal that published earlier paper

• “It's my understanding that copying someone else's description virtually word-for-word, as these authors have done, is considered a compliment to the person whose words were copied.” (#23513)

• “This is not very important… Like many such, it is simply a "me-too" of little importance and no novelty.” (#30467)

• “The article by WT Huang and the article by HS Wu and colleagues are the same patients, the figures are the same, and the writing is blatant plagiarism. One of these papers is a false publication. We cannot let this one go unaddressed.“ (#66981)

Page 43: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Sampling of responses from editors of journal that published the later paper

• “Looks like Mavoungou did it again in 2001. This example is a bit more embarrassing because the author of the original paper is Jorg Eichberg, editor of the journal where Mavoungou published the copied work. Looks like we will have to publish 2 retractions.” (#7719)

• “I really appreciate your work and your e-mail has promoted us to exercise more strict control over duplicate publication.” (#508)

• “Believe me, the data in any paper is the responsibility of the authors and not the journal.” (#5296)

Page 44: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Future Plans for eTBLAST and deja vu

• eTBLAST is now running on 2 redundant systems. V2.0 coming soon – Faster, more post processors

• 2 staff members, another faculty and myself are inspecting and classifying all duplicate pairs into various categories….

• Search all of Medline, IOP and NASA for duplicate publications to identify all instances of plagiarism/duplicate publication. Also search NIH CRISP, Komen, DoD grant database for duplicate funding.

• Transmit findings to Medline so they can tag new duplicates.• Establish a more secure service for journal editors, to identify duplicates when

they are submitted. eTBLAST already provides a FLAG on the output if the query is similar to a target document such that the Rank 2 score and Rank 2/Rank 1 ratio are above our established thresholds.

• We know that eTBLAST is used by numerous editors, but we hope that the availability of the service and the exhaustive identification of all plagiarized and duplicate articles would act as a deterrent, ultimately improving the integrity of scientific literature.

Page 45: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

Acknowledgements

I wish to gratefully acknowledge:

The P. O’B. Montgomery Distinguished Chair, the Hudson Foundation and the NIH for support.

National Library of Medicine and the Office for Research Integrity

All of the members of the lab, past and present.And our many collaborators and users.

Page 46: Déjà vu – A study of Plagiarism and Duplication in Medline McDermott Center for Human Growth and Development Division of Translational Research UT Southwestern

The End. Thanks for you attention.

Or visit Garner Labs’ home page:http://innovation.swmed.edu