Legal Informatics Research Today: Implications for Legal Prediction, 3D Printing, & eDiscovery

Preview:

DESCRIPTION

Presentation at CICL 2013: Conference on Innovation and Communications Law, 16 May 2013, Glen Arbor, Michigan.

Citation preview

Legal Informatics Research Today:

Implications for Legal Prediction, 3D Printing & eDiscovery

Robert Richards

Penn State University

CICL 2013: Conference on Innovation and Communications Law

Agenda

Legal Informatics: Overview

eDiscovery: Methods, Recent Research

3D Printing: How legal tech could apply

Legal Prediction Methods, Recent Research

Legal Informatics: Definition

Legal informatics is: (1) the study of legal information / communication systems(2) the application of ICT (information / communication technology) to legal information

ICT

Legal Information

What is legal information?

Structured data that express: 1. Legal Rules 2. Information about Legal Rules (1st,

2nd, 3rd, etc. order legal metadata) 3. Evidence

Non-legal data used to support an assertion about a legal rule

What is a legal information / communication system?

A set of interrelated entities that receive, process, or output legal information

Examples: A law office time/billing system A database of court decisions A statistical model predicting a legal

outcome

Legal Informatics Viewpoint: 4 Levels

In a domainAddressing an application area

From one or more sub-disciplines, by

Employing one or more methodologies

Legal Informatics: Domains

Law PracticeCourtsLegislatureRegulatory

Politics / Civic Computing

Legal Education

BusinessConsumers

Legal Informatics: Application Areas

LitigationCompliancePlanningInterviewing/ Counseling

NegotiationEducationGovernance / Policy making

Legal Informatics: Sub-Disciplines

Artificial Intelligence Information Retrieval Text Processing / NLP Metadata/

Knowledge Representation

Databases / Storage

Linguistics / Communication

Human-Computer Interaction / Information Behavior

Management / Sociology of Info

Legal Informatics: Methodologies

PrototypingStatistics / Probability

ExperimentationNetwork AnalysisSurvey Research

Case StudyCost-Benefit Analysis

Ethnography InterviewingDoctrinal Analysis

ExampleMuch eDiscovery research involves… Law Practice (Domain) Litigation / Evidence (Application Area) Information retrieval + text analysis +

knowledge representation /metadata + management (Sub-Disciplines)

Prototyping + experimentation + statistical analysis + cost-benefit analysis (Methodologies)

4-Level Approach Reveals Relationships Between (Apparently) Dissimilar Research Activities

Scherer, S., Wimmer, M. A., & Markisic, S. (2013). Bridging narrative scenario texts and formal policy modeling through conceptual policy modeling. Artificial Intelligence and Law. doi:10.1007/s10506-013-9142-2

Scherer et al. (2013)

ICT Citizen’s Legal Narrative Doctrine/Rule

Scherer et al.: Public Policy DomainMethodologies:

Prototyping + Case studySub-Disciplines:

Artificial intelligence + Linguistics + Text Analysis + Knowledge Representation

Application area: Translating non-legal language to legal

conceptsDomain:

Public policy (e-Participation)

Scherer et al.: Law Practice DomainMethodologies:

Prototyping + Case studySub-Disciplines:

Artificial intelligence + Linguistics + Text Analysis + Knowledge Representation

Application area: Translating non-legal language to legal

conceptsDomain:

Law practice (Counseling, Interviewing)

Functions of Legal Informatics Approach

Analyze: Processes

Define: Problems

Explain: Causation

Predict: Outcomes

Functions of Legal Informatics Approach (cont’d)

Evaluate: Processes Outcomes

Apply: Diverse approaches and

methods

eDiscovery

DefinitionGoals and MotivationModelsResearch ResultsPredictive CodingFuture Areas of Research

eDiscovery: definition

In litigation, the request for and production of electronically stored information relevant to a claim or count

eDiscovery: Goals

Increase effectiveness of methods

Lower costs

Cost Motivation

Big Data prohibitive costs of traditional relevance- and privilege-review

With data sets of > 106 objects linear manual review and privilege review become unsustainably expensive

EDRM Model

New Models Emerging:Informatics-Based, Elaborating EDRM

EDRM Oard & Webber

Oard & Webber (2013) Production request

Collection

Responsive ESI

Production

--->

Insight

Formulation

Acquisition

Review for

Relevance

Review for

Privilege

Sense-making

©Copyright 2013 Douglas W. Oard and William Webber

TREC & EDI: Key Findings

Initial Search & Second-Step Relevance Feedback: Automated relevance ranking > Boolean

query in re: recall

Interactive Evaluation: Technology-Assisted Review > Manual

Review in re: overall results + precision

High Precision + High Recall are possible with certain topics

TREC Key Findings (cont’d)

Predictive coding produced high recall But most machine learning systems could not correctly

choose correct sample size to maximize precision and recall.

Machine learning systems that yielded highly relevant results also yielded highly material docs

Privilege Review Remains a Key Cost Driver & Is Under-Automated (Pace & Zakaras, 2012) Automated privilege review yielded high recall in one

study (but method was not disclosed)

eDiscovery: Measurement Error

Low rates of inter-assessor agreement Found in TREC & EDI studies

Cooperation between parties on evaluation in tech-assisted review likely to lower measurement error This is an emerging best practice (see, e.g.,

Da Silva Moore)

eDiscovery: Recent Emphases (Baron, 2011)

Process Quality Standards & Best Practices Metrics & certification (DESI IV, 2011)

Cooperation between Parties Sedona Conference (2009)

Improved Search, including Predictive Coding DESI V, 2013 Results of TREC & EDI research

Courts are implementing all of these recommendations

eDiscovery: Recent Emphases: Sub-Disciplines

Process Quality Standards & Best Practices Management

Cooperation between Parties Management, Information Retrieval,

Knowledge Representation Improved Search, including Predictive

Coding Information Retrieval, Text Analysis,

Knowledge Representation, Information Behavior, Management

Predictive Coding: Definition

Machine learning applied to classification of informatione.g., as responsive / non-responsive

Predictive Coding: Diverse Methods

Support Vector Machines

Latent Semantic Analysis

Naïve Bayesian Classifiers

Decision Trees

Neural Networks Association Rule

Learning Rule Induction Genetic Algorithms

Predictive Coding: Courts Reading, Citing, & Applying Legal Informatics Research

Da Silva Moore v. Publicis GroupeEORHB v. HOA HoldingsGlobal Aerospace Inc. v. Landow Aviation

Kleen Products v. Packaging Corp. of America

eDiscovery: Future Research Directions

Evaluation Standards & Certification Threshold point estimates

Relevance threshold Sample size threshold

Confidence level, confidence intervals

Typology of Production Requests Electronic Discovery Institute plans 2nd

study on real e-discovery materials testing TREC conclusions, with higher ecological

validity

eDiscovery: Future Research Directions (cont’d)

Measurement Error: Modeling it & correcting for it

Designing re-usable test collections Automated privilege review

Identifying effective methods Designing test collections to evaluate those

methods

eDiscovery: Future Research Directions (cont’d)

Evaluating de-duplication methods Improved privacy measures to enable

experiments on real-life data sets Apply other sub-disciplines, including

Information behavior Diversify methods, including social

network analysis More research on Early Case

Assessment

3D Printing

Definition Expected Effects Lawyers’ Value-Add Short-Term Application of Legal

Technology Long-Term Application of Legal

Technology

3D Printing: Definition

The generation of physical objects from computer models, by a layering process

Also called Additive Manufacturing (Gibson, Rosen, & Stucker, 2010)

3D Printing: Some Expected EffectsDemocratizing manufacturing

More inventors More innovation

More infringementMore demand for legal compliance services

More demand for patent legal services

Patent Lawyers’ Value-Add for Entrepreneurs / New Inventors

Patent SearchClaim InterpretationCurrency of InformationCustomization of Information to Client’s Circumstances

Strategic Advice (Law + Business)

How Might Legal Informatics Affect 3D Printing?Legal Informatics is likely to interact with 3D Printing in two ways:Short-Term: Unbundling of patent legal services (Mosten, 1994)

Long-Term: Automated patent search & Modeling of claim interpretations incorporated into CAD software

Unbundling of Patent Legal Services

Selling (outdated) patent search results

Selling (outdated) memoranda containing claim interpretations

Offering (remotely) updated & customized search results and counseling for an extra fee

Patent Legal Services Unbundling: 4-Levels

Domain: Business

Application Areas: Compliance, Counseling

Sub-Disciplines: Management, Information Retrieval,

Knowledge Representation

Methodologies: Prototyping, Case Studies, Doctrinal

Analysis, Cost-Benefit Analysis

Automated patent search & modeling of claim interpretations (Hulicki, 2013; Mulligan & Lee, 2012)

User inputs simulation/design/image of invention

CAD software analyzes input, determines domain & patent search parameters

CAD Software executes patent search, retrieves relevant patents in force

CAD software analyzes claims of retrieved patents

Automated patent search & modeling of claim interpretations (cont’d)

CAD Software translates claims into simulation parameters

For each simulation model, CAD software calculates probability of liability for patent infringement & possible exposure

Output displays liability probability + potential exposure

Lawyer offers (remote) legal counseling for an extra fee

Automated Patent Search & Modeling of Claim Interpretations: 4-Levels

Domain: Business

Application Areas: Compliance, Counseling

Sub-Disciplines: Artificial Intelligence, Information Retrieval,

Knowledge Representation, Human-Computer Interaction

Methodologies: Prototyping, Statistical Modeling, Case Studies,

Experimentation, Ethnography, Interviewing

Implications of Both Scenarios

More small-scale inventors/entrepreneurs will have access to legal compliance information at an affordable price

Clients can choose to pay more for higher levels of service

Reform of legal ethics rules may be required to implement either scenario

Legal PredictionDefinition4-Level ViewTemporal DimensionsResearch ResultsPossible EffectsFuture Research Directions

Legal Prediction: Definitions

(1) Methods for calculating the probability of the occurrence or non-occurrence of law-related events or circumstances at a point in time, on the basis of data acquired at an earlier point in time

(2) Methods for inferring law-related attributes of a population from a sample

Legal Prediction: Application Areas

Case Outcome / Litigation Management (Blackman et al., 2012; Ruger et al.,

2004; Ribstein, 2012) Imputing Default Terms in Contracts & Wills (Porat & Strahilevitz, 2013)

Legislative Bill Passage (Tauberer, 2012; Yano et al., 2012)

Legal Prediction: Application Areas (cont’d)

Document Relevance (eDiscovery, Legal research) (Katz, 2013)

Legal Spend (In-House Counsel) (Katz, 2013)

Lawyer Hiring (Law Firms) (Katz, 2013)

Legal Compliance (Clients, In-House Counsel) (Ribstein, 2012)

Legal Prediction: Sub-Disciplines

Artificial IntelligenceInformation RetrievalMetadata / Knowledge Representation

Text Processing

Legal Prediction: Diverse Methods

Bayesian Inference (McShane et al., 2012; Guimerà & Sales-Pardo,

2011)

Stochastic Block Modeling (Guimerà & Sales-Pardo, 2011)

Classification/Decision Trees (Ribstein, 2012; Ruger et al., 2004)

Crowdsourced Prediction Markets (Blackman et al., 2012; Ribstein, 2012)

Legal Prediction: Diverse Methods (cont’d)

Machine Learning (Katz, 2013)

Case-Based Reasoning (Ribstein, 2012)

Surveys (Dimmock & Gerken, 2012; Porat & Strahilevitz,

2013)

Regression, Maximum Likelihood (Dimmock & Gerken, 2012)

Legal Prediction: Model vs. Crowdsourcing

Blackman’s FantasySCOTUS vs. Martin, Ruger et al.

Complementary approaches

Legal Prediction: Three Temporal Dimensions

Synchronic: Inference from sample to parameters of a static population Predictive coding, machine learning Used to collect data set for model

Diachronic Future: Inference from sample at t to observations at t + 1, where t

+ 1 is later than today Forward prediction (Katz) Often performed on the data set gathered using Synchronic

prediction Diachronic Past:

Retrospective prediction Inference from sample at t to observations at t + 1, where t

+ 1 is earlier than today

Legal Prediction: Some Research Results

Decision Tree > Domain Experts (Ruger et al.)

Crowdsourcing > Domain Experts (Blackman et al.)

Crowdsourcing = Decision Tree (Blackman et al.)

Stochastic Block Models > case-content based algorithms (Guimerà & Sales-Pardo)

Stochastic Block Models > Domain Experts (Guimerà & Sales-Pardo)

Legal Prediction: Possible Effects

Lawyer disintermediation (Katz, 2013; Ribstein, 2012)

Client empowerment (Ribstein, 2012) Reduction in legal costs (Katz, 2013;

Ribstein, 2012) Within businesses, distribution of

legal tasks to non-legal personnel (Ribstein, 2012)

Legal Prediction: Future Research Directions

Analogical reasoning: development of improved models (Katz)

Crowdsourced prediction markets for lower-level courts (Blackman et al.)

Automated prediction engines for lower-level courts (Blackman et al.)

References Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, A. I. (1996). Fast discovery

of association rules. Advances in Knowledge Discovery and Data Mining, 12:307–328. Ashley, K. D., & Brüninghaus, S. (2009). Automatically classifying case texts and

predicting outcomes. Artificial Intelligence and Law, 17, 125-165. doi:10.1007/s10506-009-9077-9

Ashley, K. D., & Bridewell, W. (2010). Emerging AI & Law approaches to automating analysis and retrieval of electronically stored information in discovery proceedings. Artificial Intelligence and Law, 18, 311-320. doi:10.1007/s10506-010-9098-4

Barnett, T., Godjevac, S., Renders, J.-M., Privault, C., Schneider, J., & Wickstrom, R. (2009, June). Machine learning classification for document review. Paper presented at the DESI III Global E-Discovery/E-Disclosure Workshop: A Pre-Conference Workshop at the twelfth International Conference on Artificial Intelligence and Law, ICAIL 2009, Barcelona, Spain.

Baron, J. (2011).  Law in the age of exabytes: Some further thoughts on ‘information inflation’ and current issues in e-discovery search. Richmond Journal of Law and Technology, 17(3), Article 9. Retrieved from http://jolt.richmond.edu/v17i3/article9.pdf

Blackman, J., Aft, A., & Carpenter, C. (2012). FantasySCOTUS: Crowdsourcing a prediction market for the Supreme Court. Northwestern Journal of Technology and Intellectual Property, 10(3), Article 3. Retrieved from http://scholarlycommons.law.northwestern.edu/njtip/vol10/iss3/3

Cohen, W. W. (1995). Fast effective rule induction. In Machine learning: Proceedings of the twelfth international conference, ML95.

References (cont’d) Conrad, J. (2010). E-discovery revisited: the need for artificial intelligence beyond information

retrieval. Artificial Intelligence and Law, 18, 321-345. doi:10.1007/s10506-010-9096-6 Cormack, G. V., & Grossman, M. R., Hedin, B., & Oard, D. W. (2011). Overview of the TREC 2010

legal track. In The Nineteenth Text Retrieval Conference (TREC 2010) Proceedings. N.p.: NIST. Da Silva Moore v. Publicis Groupe, 287 F.R.D. 182 (S.D.N.Y, 2012). DESI IV (2011). [Call for papers:] ICAIL 2011 workshop on setting standards for searching

electronically stored information in discovery proceedings (DESI IV Workshop), June 6, 2011, University of Pittsburgh, Pittsburgh, PA.

DESI V (2013). [Call for papers:] ICAIL 2013 workshop on standards for using predictive coding, machine learning, and other advanced search and review methods in e-discovery (DESI V workshop), June 14, 2013, Consiglio Nazionale delle Ricerche, Rome, Italy.

Dimmock, S. G., & Gerken, W. C. (2012). Predicting fraud by investment managers. Journal of Financial Economics, 105, 153-173. doi:10.1016/j.jfineco.2012.01.002

EORHB, Inc. v. HOA Holdings LLC, Civ. Ac. No. 7409-VCL (Del. Ch. Oct. 15, 2012). Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Machine Learning, 29, 131-163. Gibson, I., Rosen, D. W., & Stucker, B. (2010). Additive manufacturing technologies: Rapid

prototyping to direct digital manufacturing. New York: Springer Global Aerospace, Inc., v. Landow Aviation, L.P., No. CL 61040 (Va. Cir., Apr. 23, 2012). Grossman, M. R., & Cormack, G. V. (2011). Technology-assisted review in e-discovery can be

more effective and more efficient than exhaustive manual review. Richmond Journal of Law and Technology, 17(3), Article 11. Retrieved from http://jolt.richmond.edu/v17i3/article11.pdf

Grossman, M. R., Cormack, G. V., Hedin, B., & Oard, D. W. (2011). Overview of the TREC 2011 legal track. In The Twentieth Text Retrieval Conference (TREC 2011) Proceedings. N.p.: NIST.

References (cont’d) Guimerà, R., & Sales-Pardo, M. (2011). Justice blocks and predictability of U.S. Supreme Court votes.

PLOS ONE, 6(11), e27188. doi:10.1371/journal.pone.0027188 Hulicki, M. (2013, May). Recent judgments of the highest court as a step towards objectification of

patentability. Paper presented at CICL 2013: Conference on Innovation and Communication Law, Glen Arbor, MI.

In re Actos (Pioglitazone) Products, No. 6:11-md-2299 (M.D. La., July 27, 2012). Joachims, T. (1998). Text categorization with support vector machines: Learning with many relevant

features. In C. Nédellec & C. Rouveiro (Eds.), Proceedings of the 10th European Conference on Machine Learning (pp. 137–142).

Katz, D. M. (2013). Quantitative legal prediction—Or—How I learned to stop worrying and start preparing for the data-driven future of the legal service industry. Emory Law Journal, 62, 101-158.

Kleen Prods. LLC v. Packaging Corp. of Am., No. 10 C 5711 (N.D. Ill., Sept. 28, 2012). LexMachina. (n.d.). About, technology. Retrieved from https://lexmachina.com/about/ Martin, A. D., & Quinn, K. M. (2002). Dynamic ideal point estimation via Markov chain Monte Carlo

for the U.S. Supreme Court, 1953–1999. Political Analysis, 10, 134-153. doi:10.1093/pan/10.2.134 McShane, B. B., Watson, O. P., Baker, T., & Griffith, S. J. (2012). Predicting securities fraud

settlements and amounts: A hierarchical Bayesian model of federal securities class action lawsuits. Journal of Empirical Legal Studies, 9, 482-510. doi:10.1111/j.1740-1461.2012.01260.x

Mosten, F. S. (1994). Unbundling of legal services and the family lawyer. Family Law Quarterly, 28, 421-449.

Mulligan, C., & Lee, T. B. (forthcoming). Scaling the patent system. N.Y.U. Annual Survey of American Law. Retrieved from http://www.ssrn.com/abstract=2016968

Oard, D. W., Baron, J. R., Hedin, B., Lewis, D. D., & Tomlinson, S. (2010). Evaluation of information retrieval for e-discovery. Artificial Intelligence and Law, 18, 347-386. doi:10.1007/s10506-010-9093-9

References (cont’d) Oard, D. W., & Webber, W. (2013). Information retrieval for e-discovery.

Foundations and Trends in Information Retrieval, 7, 1-141. Retrieved from http://ediscovery.umiacs.umd.edu/pub/ow12fntir.pdf

Pace, N. M., & Zakaras, L. (2012). Where the money goes: Understanding litigant expenditures for producing electronic discovery. Santa Monica, CA: Rand Institute for Civil Justice.

Porat, A., & Strahilevitz, L. J. (2013). Personalizing default rules and disclosure with big data (University of Chicago Coase-Sandor Institute for Law and Economics working paper no. 634, 2nd series). Retrieved from http://www.law.uchicago.edu/Lawecon/index.html

Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81-106.

Ribstein, L. (2012). Delawyering the corporation. Wisconsin Law Review, 2012, 305-332.

Richards, R. (2009, June). What is legal information? Paper presented at the Conference on Legal Information: Scholarship and Teaching, at the University of Colorado School of Law, Boulder, CO. Retrieved from http://legalinformatics.wordpress.com/2009/05/31/what-is-legal-information-conference-paper/

References (cont’d)

Roitblat, H. L., Kershaw, A., & Oot, P. (2010). Document categorization in legal electronic discovery: Computer classification vs. manual review. Journal of the American Society for Information Science and Technology, 61, 70-80. doi/10.1002/asi.21233

Ruger, T. W., Kim, P. T., Martin, A. D., Quinn, K. M. (2004). The Supreme Court forecasting project: Legal and political science approaches to predicting Supreme Court decisionmaking. Columbia Law Review, 104, 1150-1210.

Scherer, S., Wimmer, M. A., & Markisic, S. (2013). Bridging narrative scenario texts and formal policy modeling through conceptual policy modeling. Artificial Intelligence and Law. doi:10.1007/s10506-013-9142-2

The Sedona Conference. (2009). Commentary on achieving quality in e-discovery. N. p.: The Sedona Conference.

Tauberer, J. (2012, December 7). Bill prognosis gets a few improvements. GovTrack Blog [web log post]. Retrieved from http://www.govtrack.us/blog/2012/12/007/bill-prognosis-gets-a-few-improvements

Webber, W. (2011, July). Re-examining the effectiveness of manual review. Paper presented at SIGIR 2011 Information Retrieval for E-Discovery (SIRE) Workshop, Beijing, China.

Yano, T., Smith, N. A., & Wilkerson, J. D. (2012, October). Textual predictors of bill survival in congressional committees. Paper presented at New Directions in Analyzing Text as Data 2012, Harvard University, Cambridge, MA. Retrieved from http://projects.iq.harvard.edu/ptr/files/yanosmithwilkersonbillsurvival.pdf