19
Value-Based Program Characterization and Its Application to Software Plagiarism Detection Embedded Lab. Park Yeongseong ICSE 2011 Yoon-Chan Jhi, Xinran Wang, Sencun Zhu, Peng Liu, Dinghao Wu Penn State University Xiaoqi Jia State Key Laboratory of Information Security, Institute of Software, Chinese Academy of Sciences

Value-Based Program Characterization and Its Application to Software Plagiarism Detection

  • Upload
    torie

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

Value-Based Program Characterization and Its Application to Software Plagiarism Detection. ICSE 2011 Yoon-Chan Jhi, Xinran Wang, Sencun Zhu, Peng Liu, Dinghao Wu Penn State University Xiaoqi Jia State Key Laboratory of Information Security, Institute of Software, - PowerPoint PPT Presentation

Citation preview

Page 1: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Value-Based Program Characterization and Its Application to Software Plagiarism De-

tection

Embedded Lab.Park Yeongseong

ICSE 2011

Yoon-Chan Jhi, Xinran Wang, Sencun Zhu, Peng Liu, Dinghao Wu Penn State University

Xiaoqi JiaState Key Laboratory of Information Security, Institute of Software,

Chinese Academy of Sciences

Page 2: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Introduction State of the art Core values Design Experiment Discussion Conclusion Q&A

Contents

Page 3: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection
Page 4: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Identifying same or similar code is very im-portant

Previous works◦ Static source code comparison – C1◦ Static excutable code comparison – C2◦ Dynamic control flow based methods – C3◦ Dynamic API based methods – C4

Introduction

Page 5: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Three highly desired requirements◦ R1 – Resiliency◦ R2 - Ability to directly work on binary executables◦ R3 – Platform independence

BUT!!!! Not satisfy requirement◦ Static source code comparison – C1 R1 R2◦ Static excutable code comparison – C2 R1◦ Dynamic control flow based methods – C3 R1 R3◦ Dynamic API based methods – C4 R3

Introduction

Page 6: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Introduce new approach◦ Core-values

5 optimization options (-O0 ~ -O3, -Os) 3 Compilers ( GCC, TCC, WCC ) KlassMaster, Thicket, Loco/Diablo Obfusca-

tors

Introduction

Page 7: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Code Obfuscation Techniques◦ data obfuscation, control obfuscation, layout obfusca-

tion and preventive transformations◦ indirect branches, control-flow flattening, function-

pointer aliasing

Static Analysis Based Plagiarism Detection◦ String-based◦ AST-based◦ Token-based◦ PDG-based◦ Birthmark-based

State of the arts

Page 8: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Dynamic Analysis Based Plagiarism Detec-tion◦ Whole program path based (WPP)◦ Sequence of API function calls birthmark(EXESEQ)◦ Frequency of API function calls

birthmark(EXEFREQ)◦ System call based birthmark

State of the arts

Page 9: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Runtime values◦ The output operands of the machine instructions ex-

ecuted

Core values◦ Constructed from runtime values

Eliminate non-core values◦ If is not derived form , is not a core-value of ◦ If is not in the set of runtime values of is not a core-

value of

Core values

Page 10: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Core values

Page 11: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Not all values associated with the execution of a program are core-values◦ Value-updating instruction◦ Related to the program’s semantics

Design-Value Sequence Extrac-tion

Page 12: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

To refine value sequences◦ Sequential refinement – reduction rate 16%~34%◦ Optimization-based refinement – 5 optimization◦ Address removal – exclude pointer values

Design-Value Sequence Refinementand Similarity Metric

Page 13: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Design-Overview

Page 14: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Intel Quad-Core 2.00 GHz CPU 4GB RAM Linux machin QEMU 0.9.1

Questions1. resilient 2. false accusation3. credible

Experiment

Page 15: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Obfuscation techniques◦ SandMark, KlassMaster : Java bytecode obfusca-

tors

Test application : Jlex◦ Lexical analyzer

Experiment-Obfuscation tool(resiliency)

Page 16: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Test Application◦ 5 individual XML pasers:expat, libxml2, Parsifal,

rxp,xercesc

Experiment-Similar Programs(false accusation)

Page 17: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Test application◦ Bzip2, gzip, oggenc, 9 of 11 programs

Result◦ Similarity scores between 0 and 0.27◦ zip and gzip similarity scores are 1.0

Same compression algorithm : deflate◦ zip and bzip2 similarity scores are 0.01 to 0.03

Different compression algorithm : block sorting

Experiment-Different Programs(credible)

Page 18: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

introduce a novel approach to dynamic characterization of executable programs.

The value-based method successfully dis-criminates 34 plagiarisms by SandMark, KlassMaster, Thicket.

Conclusion

Page 19: Value-Based  Program Characterization and Its Application to Software Plagiarism Detection

Q&A