16
Almaden Services Research © 2009 IBM Corporation COA: Finding Novel Patents through Text Analysis Mohammad Hasan, Scott Spangler, Tom Griffin, Alfredo Alba Scott Spangler IBM Almaden Services Research

Almaden Services Research © 2009 IBM Corporation COA: Finding Novel Patents through Text Analysis Mohammad Hasan, Scott Spangler, Tom Griffin, Alfredo

Embed Size (px)

Citation preview

Almaden Services Research

© 2009 IBM Corporation

COA: Finding Novel Patents through Text AnalysisMohammad Hasan, Scott Spangler, Tom Griffin, Alfredo Alba

Scott SpanglerIBM Almaden Services Research

Almaden Services Research

© 2009 IBM Corporation

The BlackBerry Patents

Five patents on the subject of RF communication with mobile processors

Judge threatened an injunction which would have forced RIM/Blackberry to shut down service

On the surface they appear to read very directly on RIM’s business

But are these patents really what they appear to be?

Almaden Services Research

© 2009 IBM Corporation

Problem Addressed

How do you automatically evaluate the value of Patent claims.

Most existing approaches use field of invention + citation analysis to derive an approximation

Our approach uses analysis of the claim text itself to discover indicators of patent worth.

Almaden Services Research

© 2009 IBM Corporation

Intuition

The most valuable patents are those that are among the first to claim an important technology.

Challenge: How do we discover that part of a patent claims which are most “original”

Almaden Services Research

© 2009 IBM Corporation

Method

Focus on the patent claims section Find all terms occurring in the claims section For the technical area of the patent (patent class),

discover when each of these search terms first occurred in patent claims

Term originality then is defined as small difference between patent date and term first use date

Create a score that ranks highly those patents with “original” terms in their claims

Almaden Services Research

© 2009 IBM Corporation

Description

Build an index of patent claim words associated with time of first occurrence in patent claims

For each patent evaluated– Analyze each 1,2,3-gram in patent claims to see if it is an

original usage or an “early” usage of those words in the patent claim section in that technology “area”

– Look for subsequent usage of that word in more recent patents to calculate “support”

The value of a patent is based on the number of early* words with significant** support. Scored one of two ways:

– Sum of support (# of patents) divided by age (# of days)

– Count of # of terms with support > 2 and Age < 7 years

*early = within 7 years of first occurrence**significant = at least 3 patents use the term

Almaden Services Research

© 2009 IBM Corporation

How we validated this approach

Three easily identifiable metrics that should correlate to patent value

– Citations

– Lapsed Fees

– Internal IBM Attorney Rating

None of these is perfect, but all three should roughly correlate with the intrinsic value of the patents

Almaden Services Research

© 2009 IBM Corporation

Results

Citations are roughly correlated with COA scores

Lapsed patents have lower COA scores on average than do other patents

Patents rated 1 (by IBM attorneys) have on average significantly better COA scores then those rated 3.

Almaden Services Research

© 2009 IBM Corporation

Claims Originality of Blackberry Patent

All five patents have very lengthy, extensive claim language, around electronic mail devices

Very little text in these claims is original.

Taking context into consideration, the technical merit of these patents is questionable.

$120M / patent licensed an appropriate valuation?

Term First OccurredDifference in Days

Support

application programs stored 7/25/1995 644 14

information added 8/20/1991 2079 50

interface stores 4/29/1997 0 30

network storing 10/15/1991 2023 167

information network 12/25/1990 2317 86

network information 12/25/1990 2317 135

mail systems 5/23/1995 707 66

destination transmits 7/25/1995 644 20

processors occurs 4/29/1997 0 3

information accessible 11/6/1990 2366 79

electronic mail 7/11/1995 658 133

interface switch 7/25/1995 644 18

network switch 7/30/1991 2100 91

gateway switch 7/25/1995 644 27

transmitting originated 1/1/1991 2310 39

stored originated 7/25/1995 644 19

interface receiving 9/28/1993 1309 243

Almaden Services Research

© 2009 IBM Corporation

SIMPLE Implementation

Usage: 572 Invocations of COA as of 6/15/2009

Almaden Services Research

© 2009 IBM Corporation

Success stories from SIMPLE to date:

VOIP analysis:– Started from 13 original patents to more than 20 eventually licensed. – This drove nearly $8M in licensing revenue.

Videoconferencing analysis:– Found 2 additional patents, each of which was sold. – This drove upwards of $5M in licensing revenue.

SIMPLE has over 280 active users (both internal and external).

We continue to develop and grow the capabilities.

Almaden Services Research

© 2009 IBM Corporation

If you want to try this out yourself

Go to: https://chemsearch.almaden.ibm.com/simple Username: sb_test8 Password: hello2You Click Analyze / Claims Originality Enter one or more patent numbers Click Analyze button Tell us what you think! (email: [email protected])

Almaden Services Research

© 2009 IBM Corporation

Potential Future Application: Tracing the Source in Web Content

Almaden Services Research

© 2009 IBM Corporation

Credibility Scoring (“net cred”)

Almaden Services Research

© 2009 IBM Corporation

Conclusions and Future Work

We have demonstrated how text analysis in the patent space can help provide context far more effectively than manual methods

We feel these methods generalize to other types of unstructured information

The ability to provide better information context and validation will be important to individuals and organizations in a world where a smaller and smaller percentage of information comes from “authoritative” sources.

Almaden Services Research

© 2009 IBM Corporation

http://www.miningthetalk.com