Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Predicting English Keywords fromJava Bytecodes
Pablo Ariel Duboue, PhD
Les Laboratoires FoulabMontreal, Quebec
Séminaires RALI-OLST, Université de Montréal
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Before Montreal
I Columbia UniversityI WSD in biology texts (GENIES)I Natural Language Generation in medical and
intelligence domains (MAGIC, AQUAINT)I Thesis: “Indirect Supervised Learning of Strategic
Generation Logic”, defended Jan. 2005.I Advisor: Kathy McKeownI Committee:
Hirschberg/Jurafsky/Rambow/Jebara
I IBM Research WatsonI AQUAINT: Question Answering (PIQuAnT)I Enterprise Search - Expert Search (TREC)I Connections between events (GALE)I Deep QA - Watson
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
In Montreal
I am passionate about improving societythrough language technology and split mytime between teaching, doing researchand contributing to free software projects
I Working with Prof. Nie at GRIUMI Taught a graduate class in NLG in ArgentinaI Contributed to Free Software projects, including
some of my ownI Doing some consulting focusing on startups
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Semantics, Java Bytecodes, Javadocs
I Motivation: Machine Learning for NaturalLanguage Generation
I Finding good semantic representations “in thewild” is very rare
I Level of detail of semantic representations vs.natural language
I Similarities with binary code and codecomments
I Reverse Engineering practitioners could toleratenoisy text
I As discussed in the INLG panel last summer
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Java Bytecodes
I JVM is a stack machineI The set of opcodes (~200) is small to simplify
porting to new architectures.I The opcodes fall into six categories:
I Load/store (e.g. aaload, bastore)I Arithmetic/logic (e.g. iadd, fcmpg)I Type conversion (e.g. i2b, f2d)I Object construction and manipulation (new,
putfield)I Operand stack manipulation (e.g. swap,
dup2_x1)I Control flow (e.g. if_icmpgt,goto)I Method invocation and return (e.g.
invokedynamic, lreturn)
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
LDC and CALL
I While bytecodes represent a reducedvocabulary, they can incorporate names ofclasses or methods and string constants
ldc pushes a constant onto the operandstack (number or string)
getfield instance and field namegetstatic classname and field name
invokedynamic invokes a dynamic method
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Javadocs
I Javadocs are standardized Java commentsI Include special mark-up in the form of ’@’
constructionsI @param, @throws, @return among others
I In my work, I focus on the comments associatedwith each method
I Example:I Creates a CacheRandom instance with a given
cache capacity. @param capacity Thecapacity of the cache.
I Adjusts the relative offset where the matchbegins to an absolute value. Only used byAwkMatcher to adjust the offset for streammatches. @return The length of the match.
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
What is Reverse Engineering
I From WikipediaReverse engineering is the process ofdiscovering the technological principlesof a device, object, or system throughanalysis of its structure, function, andoperation. (...) The same techniques aresubsequently being researched forapplication to legacy software systems(...) to replace incorrect, incomplete, orotherwise unavailable documentation.
I REcon: the premier reverse engineeringconference, held yearly at Montreal
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Reverse Engineering Example
private f i n a l i n t c( i n t ) {0 aload_01 get f ie ld org . jpc . emulator . f . v4 invokeinterface org . jpc . support . j .e ( )9 aload_010 get f ie ld org . jpc . emulator . f . i13 invokev i r tua l org . jpc . emulator . motherboard .q.e( )16 aload_017 get f ie ld org . jpc . emulator . f . j20 invokev i r tua l org . jpc . emulator . motherboard .q.e( )23 iconst_024 i s to re_225 i load_126 i f l e 12829 aload_030 get f ie ld org . jpc . emulator . f .b33 invokev i r tua l org . jpc . emulator . processor . t .w( )
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Reverse Engineering Example
private f i n a l i n t c( i n t ) {0 aload_01 get f ie ld org . jpc . emulator . f . v4 invokeinterface org . jpc . support . j .e ( )9 aload_010 get f ie ld org . jpc . emulator . f . i13 invokev i r tua l org . jpc . emulator . motherboard .q .e( )16 aload_017 get f ie ld org . jpc . emulator . f . j20 invokev i r tua l org . jpc . emulator . motherboard .q .e( )23 iconst_024 i s to re_225 i load_126 i f l e 12829 aload_030 get f ie ld org . jpc . emulator . f .b33 invokev i r tua l org . jpc . emulator . processor . t .w( )
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Reverse Engineering Example
private f i n a l i n t c( i n t ) {0 aload_01 get f ie ld org . jpc . emulator . f . v4 invokeinterface org . jpc . support . j .e ( )9 aload_010 get f ie ld org . jpc . emulator . f . i13 invokev i r tua l org . jpc . emulator . motherboard .q .e( )16 aload_017 get f ie ld org . jpc . emulator . f . j20 invokev i r tua l org . jpc . emulator . motherboard .q .e( )23 iconst_024 i s to re_225 i load_126 i f l e 12829 aload_030 get f ie ld org . jpc . emulator . f .b33 invokev i r tua l org . jpc . emulator . processor . t .w( )
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Reverse Engineering Example
private f i n a l i n t c( i n t ) {0 aload_01 get f ie ld org . jpc . emulator . f . v4 invokeinterface org . jpc . support . j .e ( )9 aload_010 get f ie ld org . jpc . emulator . f . i13 invokev i r tua l org . jpc . emulator . motherboard .q.e( )16 aload_017 get f ie ld org . jpc . emulator . f . j20 invokev i r tua l org . jpc . emulator . motherboard .q.e( )23 iconst_024 i s to re_225 i load_126 i f l e 12829 aload_030 get f ie ld org . jpc . emulator . f .b33 invokev i r tua l org . jpc . emulator . processor . t .w( )
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Reverse Engineering Example
private f i n a l i n t c( i n t ) {0 aload_01 get f ie ld org . jpc . emulator . f . v4 invokeinterface org . jpc . support . j .e ( )9 aload_010 get f ie ld org . jpc . emulator . f . i13 invokev i r tua l org . jpc . emulator . motherboard .q.e( )16 aload_017 get f ie ld org . jpc . emulator . f . j20 invokev i r tua l org . jpc . emulator . motherboard .q.e( )23 iconst_024 i s to re_225 i load_126 i f l e 12829 aload_030 get f ie ld org . jpc . emulator . f .b33 invokev i r tua l org . jpc . emulator . processor . t .w( )
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Debian
I Using the Debian archiveI apt-file search - -package-only .jar
I 1,400+ packagesI dpkg-query -p package name
I Look for Source fieldI dpkg-source -x source .dsc
I Search for Java source files.I dpkg -x binary .deb
I Search for jars, disassemble the methods.
I Assembling the Bytecodes / Javadoc CorpusI Disassemble using jclassinfo - -disasmI Dump Javadoc comments using qdox.
I A lightweight Java source parsing library.I Heuristically match source methods to compiled
methods.I Normalize source code signatures to binary
signatures.
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Numbers
I Final corpus:I 1M methods.I 35M words.I 24M JVM instructions.
I This corpus is 3x bigger than the one discussed inthe REcon talk
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Pipeline
1. HTML detagging2. PTB tokenizer3. Morfessor4. cclparser5. Naive Bayes
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Morfessor
I Unsupervised morphome detectionI http://www.cis.hut.fi/projects/morpho/
I CacheRandom → Cache + RandomI GenericCache.DEFAULT_CAPACITY → Generic +
Cache. + DEFAULT_CAPACITYI someFileName → some + FileNameI PatternStreamInput → Pattern + Stream + Input
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
CCL ParserI CCL Parser is an unsupervised parser that does
not require POS tagsI Unsupervised POS induction, incremental (can
deal with long sentences)I Yoav Seginer (2007), Fast Unsupervised
Incremental Parsing. ACL.I http://www.seggu.net/ccl/I GPLv2 – but current codebase does not save
trained modelsI ( ( ( ( ( ( ( ( ( ( (creates a) cache random)
instance (with a)) given) cache) capacity. (@param)) capacity the) capacity) of the)cache))
I As chunkerI [creates a] [cache random] [instance] [with a]
[given] [cache] [capacity.] [@ param][capacity the] [capacity] [of the] [cache][same] [as cache random] [generic] [cache.][default_capacity]
I [default] [constructor] [given] [default] [access][to prevent instantiation] [outside the][package]
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Naive Bayes
I P(term|bytecodes)I In case of complex opcodes (e.g., ldc “This is a
very long string”), the count for the opcode issplit between:
I 0.5 for the full opcode, as a wholeI 0.5 / #parts for each subpart ({ldc, This, is, a, very,
long, string})
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Overall Results
I Top scoring terms, using one count per opcode
Term P R F@ param 0.73 0.64 0.685
0.73 0.63 0.679object 0.97 0.06 0.114
@ throws 0.72 0.05 0.099text 0.64 0.02 0.038
property 0.69 0.01 0.031description 0.72 0.01 0.029@ return the 0.78 0.01 0.028
0.80 0.01 0.026
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Without Per-opcode Normalization
Term P R F@ generated 0.76 0.80 0.783
replaced 0.93 0.60 0.734@ param 0.64 0.74 0.690
icu 0.75 0.49 0.600o the 0.47 0.75 0.582
@ stable 0.72 0.45 0.561@ inheritdoc 0.42 0.60 0.495@ return the 0.41 0.52 0.463
receiver 0.72 0.31 0.440
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Where to go from here
I The meaning in the bytecodes is not in thepresence of individual opcodes but in theirsequencing
I MOTIF analysis in bioinformatics
I Comparable SMTI Most systems (e.g., Munteanu and Marcu
(2006)) use either an aligned corpora or abilingual dictionary
I I can try to obtain that by asking developers towrite descriptions for segments of the code
I Alternatively, I can try to adapt TextTiling tobytecodes
I Suggested by another Foulaber (Danukeru)
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Applications in Reverse EngineeringI Hinting Subroutines
I The motivating example at the beginning.I “Beacon identification” in Software Engineering.
I Custom (malware) VMsI Identifying which methods correspond to
different VM operations (addition, jump, etc).
I Dalvik Word Clouds.I Use dex2jar, obtain word clouds for the whole
executable.I Maybe the user can tell if anything looks fishy
there?
I Flagging Suspicious Methods.I Finding methods that can be described with
keywords very different from the rest of theexisting methods.
I Can be done with dynamically generatedbytecodes.
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Applications Outside Reverse Engineering
I Semantic SearchI Searching for methods related to certain English
termsI Query expansion using bytecodes
I Software Engineer DocumentationI Generating documentation from bytecodesI Long term goal
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Snippets and Sentence CompressionI Improving Information Retrieval user experience
and engine performance by having bettersnippets
I Working closely with Dr. Jing He.I Summarization snippets seem better than
regular snippets but are much longer⇒sentence compression
I Query: wine romeI Page:
http://penelope.uchicago.edu/%7Egrout/encyclopaedia_romana/wine/wine.html
I Bing snippet: Return to Notae. Wine and Rome.Now nearly extinct in the wild, grapes (vitisvinifera) grew throughout the ancientMediterranean, the juice readily fermenting asthe enzymes ...
I Summarization: Wine almost always was mixedwith water for drinking; undiluted wine merumwas considered the habit of provincials andbarbarians. The earliest work on wine andagriculture was written in Punic. Indeed, by 154BC, says Pliny, wine production in Italy wasunsurpassed.
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Taught Graduate Class in Argentina
I My alma materI Universidad Nacional de Cordoba
I Natural Language GenerationI
http://wiki.duboue.net/index.php/2011_FaMAF_Intro_to_NLGI Touched NLG from DBs, Summarization and
decoding in SMTI 12 students, about a fourth of the total PhD
students in the dept
I Large NLP GroupI http://pln.famaf.unc.edu.ar/I Possibilities for visiting people from Montreal
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Student Projects
I Natural Language Generation for SoftwarePatches
I http://nlg4patch.com.ar/
I Natural Language Generation for UML diagramsI ongoing
I Referring Expression Evaluation using DBpediaI HLT-NAACL 2012 Short Paper “On The Feasibility
of Open Domain Referring ExpressionGeneration Using Large Scale Folksonomies”
I Surface Realization of Spanish using the SemParCorpus
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Outline
IntroductionAbout the SpeakerBytecodes as Weak SemanticsReverse Engineering
DetailsCorpus AssemblyMain PipelineResultsApplications
Other TopicsGRIUM/RALIOther AcademicFocus on Technology
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Free Software
I Debian scienceI apertium, transfer-based machine translation for
related language-pairs
I NLTKI Personal Projects
I Farmer text supportI php-nlgenI NLG in Puredata
I http://www.ohloh.net/accounts/DrDub
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Tech Scene MontrealI Foulab
I Montreal oldest and more prestigioushackerspace
I http://foulab.orgI Hackerspaces are community-operated physical
places, where people can meet and work ontheir projects.
I http://hackerspaces.org for the full listI Open House every Tuesday night, everybody is
welcomed
I Hack-a-thonsI Upcoming:
http://quebecouvert.org/events/hackonslacorruption/
I Notman houseI The “House of the Web” in MontrealI http://notman.org/
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Consulting
I R&D for start-upsI Focusing on companies with positive
contributionsI Quick turnaround from ideas to usersI http://honeypot.matchfwd.com
I Own venturesI 4opiniones.com
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Summary
I I have presented a work-in-progress targettingthe automated documentation generation fromcompiled code
I Most recent progress is in unsupervisedterminology identification
I Currently working in improved ML
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Acknowledgements
I GRIUMI Prof. Nie and Dr. Jing He
I FoulabI Danukeru
I REcon organizersI Subgraph.
I Annie Ying
Keywords forBytecodes
Dr. Duboue
IntroductionThe Speaker
Bytecodes as Semantics
Reverse Engineering
DetailsCorpus Assembly
Main Pipeline
Results
Applications
Other TopicsGRIUM/RALI
Other Academic
Focus on Technology
Summary
Contacting the Speaker
I Email: [email protected] Website: http://duboue.netI Twitter: @pabloduboueI LinkedIn: http://linkedin.com/in/pabloduboueI IRC: DrDub
http://keywords4bytecodes.org
I Always looking for new collaborationopportunities
I Very interested in teaching a class either inMontreal or on-line