Upload
-
View
234
Download
0
Embed Size (px)
Citation preview
7/31/2019 I3.2.pptx
1/33
Urbana-Champaign, May12, 2012
INARC I3.2 Mid-Year ReportI3.2: Modeling and Mining of Text-Rich Information Networks
Dan Roth (Task Co-lead) Jiawei Han (Task Co-Lead)
Heng Ji (CUNY) Xifeng Yan (UCSB)
University of Illinois at Urbana-ChampaignNS-CTA: INARC
7/31/2019 I3.2.pptx
2/33
I3.2: Modeling and Mining ofText-Rich Information Networks
Key Objectives:
Structurally model a text-rich info. network and investigatemethods for mining knowledge from such networks
Enhance keyword search and knowledge discovery capabilityby the text-rich info. network model
Deliverables:
Q1: Methodologies for modeling and construction of multi-dimensional, relatively structured info. networks byprogressive info. network analysis
Q2: Models for enhanced text data analysis using relativelystructured, heterogeneous info. networks
Q3: Methods for multi-facet search in text-rich info. networksQ4: System prototype demo of the approaches
Impact:
Modeling, principles, and methodologies developed for text-rich info. networks will lead to more relevant query results
Key Technical Innovations:
Exploitation of mostly unstructured data from reportsalong with some relatively structured metadata andthe links (e.g., hyperlinks) between reports to discover
key entities associated to a given query. Theexploitation builds upon semantic processing (e.g.,topic modeling), network analysis (e.g, iTopics) anddata mining (e.g., topic/text cubes) technologies
Efficient algorithms to enrich text mining techniqueswith the info. network topology
Information trustworthiness analysis in text-rich info.networks and other text-rich networks
Role Researchers
Lead D. Roth, UIUC (INARC)
Lead J. Han, UIUC (INARC)
Primary H. Ji, CUNY (INARC)
Primary X. Yan, UCSB (INARC)
Collaborators N. Chawla, Notre Dame (SCNARC) (linked with E2.3)
J. J. Garcia-Luna-Aceves, UCSC (CNARC)
M. Magdon-Ismail, RPI (SCNARC) (linked with S2.1)
Z. Wen, IBM (SCNARC)
Total $322K
2
7/31/2019 I3.2.pptx
3/33
3
Text-Rich Information Networks: Combining contents & network
Focused on large heterogeneous information networks Collections ofnews articles from diverse resources, blogs and forums
Wikipedia, an information network consisting ofstructured and
unstructured data
Developed State-of-the-art algorithmic tools
Supporting knowledge acquisition, information extraction, text modeling
and integrated information structure discovery
Utilizing deep text analysis & large scale statistical models over the
content and the structure of the network
Make use of both explicit network structure and hidden ontological
structure (e.g., category structure)
Advanced our understanding of how to:
Acquire and extract information from heterogeneous information
networks when data is noisy, volatile, uncertain, and incomplete
3
Advancing the
State-of-the-Art ofNetwork Science
7/31/2019 I3.2.pptx
4/33
Subtask3: Multi-Facet
Search
4
Subtask 1: Modeling and construction of multi-dimensional, relatively
structured information networks by integrated text and information analysis
4
Overall Task Organization
Subtask2: Topic
Modeling and
Discovery with
InfoNet
Subtask 3: Multi-facet search in
text-rich information networks
Subtask 2: Enhanced text
data analysis usingrelatively structured,
heterogeneous
information networks
Subtask1: Text-richInfoNet
Construction
7/31/2019 I3.2.pptx
5/33
5
Subtask 1: Modeling and construction of multi-dimensional, relatively
structured information networks by integrated text and informationanalysis
Explicitly capture the interplay between textual topics and network
structure
Subtask 2: Enhanced text data analysis using relatively structured,heterogeneous information networks
Novel theories and methods to make text data and information network
mutually enhance each other in text understanding and information
analysis
Subtask 3: Multi-facet search in text-rich information networks
Exploring effective methods for search and mining in text-rich
information networks
5
Novelty Claims
7/31/2019 I3.2.pptx
6/33
6
Modeling and construction of multi-dimensional, relatively structured
information networks by integrated text and information analysis
Data Fusion and Information Network Fusion: Web structure mining for
integration of web data with info. networks *WWW11, SIGMOD11 demo+
Wikification (integration of wikipedia for entity/concept resolution) *ACL11+
Enrichment & disambiguation of information network
Dynamic Acquisition of Taxonomic Relations Network *EMNLP10+
Leverage Semantic Information Network to Enhance Entity Co-reference
Resolution and Entity Identification [ACL-HLT11]
Micro and Macro Collaborative Networks Ranking for Entity and EventCoreference Resolution *EMNLP2011SUB+
Markov Logic Networks and Learning-to-Rank to Enhance Open Domain Role
Discovery *TAC10+
6
Subtask 1: Text-Rich Network Modelingand Construction
G
7/31/2019 I3.2.pptx
7/33
Growing Parallel Paths for WebStructure Mining
DIV UL
AB
AC
HTML DIV UL
LI
LI
AX
AY
HTML DIV UL
LI
LI
AZ
AW
TABLE TR
TD
TD AU
AV
HTML
HTML
LI
LI
DIV
DIV ...
...
Page A
Page D
Page E
Page F
DIV P AFHTML
Page C
DIV
P
AE
Page B
HTML
P
AD
1
2
3
4
5
6
X
Y
Z
W
U
V
Path
Result:
Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick Barber, Jiawei Han, and Donato Malerba,
Growing Parallel Paths for Entity-Page Discovery, WWW'11, Mar. 2011 7
http://www.cs.uiuc.edu/homes/hanj/pdf/www11_twininger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_twininger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_twininger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_twininger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_twininger.pdf7/31/2019 I3.2.pptx
8/33
WinaCS: Web Information Network Analysisfor Computer Science
Web structure-guided information
extraction and integration Integration of DBLP information
networks
Integration of mined web structures
with DBLP networks for knowledge-
base construction
Supports intelligent querying & mining
Tim Weninger, Marina Danilevsky, et al., WinaCS: Construction and Analysis of Web-Based
Computer Science Information Networks", ACM SIGMOD'11 (system demo), Athens, Greece, June 2011. 8
http://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdf7/31/2019 I3.2.pptx
9/339
Wikification: Example: Entity Resolution & Tracking
Regi Blinker played three matchesfor Oranje. The left winger started
his pro carreer at Feyenoord andplayed 400 official matches forFeyenoord, Celtic and Sparta. He
retired from football in 2003.Where is he now?
9
7/31/2019 I3.2.pptx
10/3310
Wikification [ACL2011] Given:
An information networks consisting of news articles and blogs,
Wikipedia: Text, Structured Information, Network (hyperlink) Structureand Ontological (Category) Structure.
Goal:
Identify all entities and concepts mentioned in articles and blog
Disambiguate & map each entity and concept to its appropriate
Wikipedia page Entity (and Concept) Resolution
Associate with each concept a collection of semantic attributes
Progressively enrich the information network and enable betteraccess to it.
Approach:
A global optimization problem that accounts for Local, node-specific information,
Global, node and network structure information
Ontological network structure
Machine Learning algorithms determine candidates and rank nodes
10Lev Ratinov, Doug Downey, Mike Anderson, Dan Roth,Local and Global Algorithms for Disambiguation to Wikipedia ,ACL11,
Id ifi i f T i
7/31/2019 I3.2.pptx
11/3311
Identification of TaxonomicRelations [EMNLP2010]
The use of information networks to acquire Taxonomic Relations.
Given: An information networks consisting of news articles and blogs Pairs of Concepts or Entities
Make use of: WikipediaText, Structured Information, Network(hyperlink) Structure and Ontological (Category) Structure.
Goal:
Developing large ontologies is essential to progressively enrich theinformation network and enable better access to it.
Huge amount of work has been done on developing stationary networks
Suffers from low coverage, noise, and brittleness
A Machine Learning & Optimization based approach
Exploits the fact that data in heterogeneous information networks isnoisy, uncertain, and incomplete.
Considers multiple relations, makes use ofa global constraintoptimization process to leverage both Wikipedia and the web.
Significantly outperforms existing well-known taxonomical networks.
(Honda, Toyota) are Siblings
M1A2 is-a Tank is-a Vehicle
AK-47 is-a Gun
11Quang Do and Dan Roth,Constraints based Taxonomic Relation Classification, EMNLP10
7/31/2019 I3.2.pptx
12/33
Leverage Semantic Information Network to EnhanceEntity Coreference Resolution / Entity Identification
Disambiguation
Name Variant Clustering
9.4% absolute improvement in micro-averaged accuracy
(CUNY) Heng Ji and Ralph Grishman. "Knowledge Base Population: Successful Approaches
and Challenges". ACL-HLT2011. 12
https://agora.cs.illinois.edu/download/attachments/30425499/5.pdf?version=1&modificationDate=1300050634857https://agora.cs.illinois.edu/download/attachments/30425499/5.pdf?version=1&modificationDate=1300050634857https://agora.cs.illinois.edu/download/attachments/30425499/5.pdf?version=1&modificationDate=1300050634857https://agora.cs.illinois.edu/download/attachments/30425499/5.pdf?version=1&modificationDate=1300050634857https://agora.cs.illinois.edu/download/attachments/30425499/5.pdf?version=1&modificationDate=1300050634857https://agora.cs.illinois.edu/download/attachments/30425499/5.pdf?version=1&modificationDate=13000506348577/31/2019 I3.2.pptx
13/33
13
1cq
2cq
3cq
4cq5cq
6cq7cq ( )q
Bo
( )q
Ao
q 0.70.4
q
0.30.6
correct rank :
Micro and Macro Collaborative Networks Ranking forEntity and Event Coreference Resolution
Previous methods only focused on thetarget node and one learning theory
itselfPropose a new collaborative networkranking theory which imitates humancollaborative learning
Leverage inter-connections amongcollaborative entities in information
networksAutomatic profiling for each node
Construct a collaborative network for eachentity based on graph-based clustering
Rank multiple decisions from collaborativeentities (micro) and algorithms (macro)based on global prediction
7% absolute improvement in micro-averaged accuracy
On-going CUNY+UIUC work: usingtopic modeling for entity clustering
(CUNY) Zheng Chen and Heng Ji. 2011. Collaborative Ranking: A Case Study in EntityLinking. Proc. EMNLP2011 [SUB] 13
7/31/2019 I3.2.pptx
14/33
Khamis Mushait
14 14
Wail Al-Shehri
V3
Markov Logic Networks and Learning-to-Rank to EnhanceOpen Domain Role Discovery
Waleed Al-Shehri
Abdul Aziz Al-OmariAbdul Rahman Al-Omari
V4
V6 V7 V8
V9
V10
V11V12
Wail Al-Shehri
V3
Waleed Al-Shehri
Abdul Aziz Al-OmariAbdul Rahman Al-Omari
V4
911 SuspectTerrorist Network
V15
TerroristInformation Network
originmember
Al-Qaeda
V13
sibling
news pageweb blog
twitterforumBoston
V14residence
residence
Mohamed AttaMohamed AttaV16
pilot
pilotSaudi Arabian Airlines
Discovered 26 roles for persons, 16 roles for organizations and 13 roles for locations
Markov Logic Networks for Cross-slot and Cross-query reasoning based on InfoNet andtextual linkages to resolve conflictions and predict missing links
Weight=15:
Weight=100:
Maximum Entropy based Learning-to-rank model to re-rank candidate answers
13%-22% absolute F-measure improvement
(CUNY) Chen et al. "CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling SystemDescription". Proc. TAC2010 and Lecture Notes in Computer Science, 2010
, , ( , ) ( , ) ( ) ( )x y z Ambiguous X Y Textual Linkage Y Z Pilot X Pilot Z Remove X
, ,( , ) ( , ) ( , )
x y zSibling X Y Origin Y Z Origin X Z
https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=13000511085567/31/2019 I3.2.pptx
15/33
15
Enhanced text data analysis using relatively structured,
heterogeneous information networks
Progressive Dynamic Information Network Analysis[EMNLP11, ACL-HLT2011 (sub)]
Integration of Heterogeneous Info. Network and TopicModeling (Biased Propagation) [KDD11 (sub)]
Topic Modeling for Active Learning and Inference in EventNetwork Construction [ACL-HLT2011 (sub)]
Geographical Topic Discovery & Comparison [WWW11]
Latent Association Analysis of Document Pairs [KDD11
(sub)]
15
Subtask 2. Network-Enhanced Text Analysis
7/31/2019 I3.2.pptx
16/33
Ali Larijani
IranSupreme NationalSecurity Council
Islamic Republic ofIran Broadcasting
FaridehMotahari
TehranUniversity Hassan
Rowhani
Progressive Dynamic Information Network Analysis
Motivations
Most information obtained on text-rich
InfoNet construction so far is viewed asstatic, ignoring the temporal dimensionof many types of attributes
Approaches Temporal Role Representation
[T1 T2 T3 T4] =
New Evaluation Metric
Local temporal role discovery usingnew kernel methods based ondependency paths
Global inference and aggregation toresolve conflicts using Integer LinearProgramming (ongoing collaborationwith Dan Roth at UIUC)
Results
State-of-the-art temporal roleclassification accuracy and lowestvagueness/over-constraining
(CUNY) Javier Artiles, Qi Li, Enrique Amigo and Heng Ji. 2011. Leveraging Cross-documentRedundancy for Temporal Information Extraction. EMNLP2011, ACL-HLT2011 [SUB] 16
Baseline
Aggregation over 2 tuples Aggregation over 10 tuples
Our Approach on Information Networks
7/31/2019 I3.2.pptx
17/33
Probabilistic Topic Models with Biased Propagation onHeterogeneous Information Networks [KDD11 (sub)]
Problem and Motivation: Discover latent topics & identify clusters
of multi-typed objects simultaneously Treat multi-typed objects differently (e.g.,
D w. rich text & U w.o. explicit text) Solution and Contribution:
Basic idea: biased topic propagation Propose a novel TMBP algorithm to
directly incorporate heter. infornetinstead of homog. InforNet with topicmodeling (improve 20%-40% over PLSA)
17
Topic modeling with heterogeneous InforNet
Topic modelBiased propagation
(UIUC) Hongbo Deng, Jiawei Han, Bo Zhao, Yintao Yu, and Cindy Xide Lin, "Probabilistic TopicModels with Biased Propagation on Heterogeneous Information Networks", KDD'11 (sub)
T i M d li f A i L i d I f i
7/31/2019 I3.2.pptx
18/33
Topic Modeling for Active Learning and Inference in
Event Network Construction
Topic Modeling can enhanceinformation network construction by
grouping similar event typestogether and converginginformation distributions
Using Topic modeling, with only1/4 training data we can achievecomparable performance as
passive learningCross-document inference withintopic clusters provided 10%improvement over state-of-the-artevent extraction, significant gainsover IR based clustering
Ongoing work: apply new entity-driven and biased propagationbased topic modeling methods
(CUNY + UIUC) Hao Li, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Topically RelatedData is Better Data: Topic Modeling for Event Extraction. ACL-HLT2011[sub]
Putin
weapons
nuclear
talks
forces
troops
army
militaryBritish
AFPmillion
government
dollars
convicted
billion company
court
sentence
Event Type: "Contact"
Trigger: talk, meet etc.
Arguments: "Entity"
"Instrument" "Place"
"Time-Within"
Event Type: "Business"
Trigger: form, dissolve
Arguments:
"Org""Place" "Time-
Within" "Agent"
Event Type: "Attack"
Trigger: blew, attackArguments: "Attacker"
"Target" "Place" "Time-
Within"
EventType:"Transaction"
Trigger: Borrow, Launch
Arguments: "Giver"
"Recipient""Money""Sell
er""Artifact""Buyer"
Pyong
yang
China
officials
Washington
north
southKorea
program
United
States
Saddam
control
fighting
city
Baghdad
Iraqi
regime
Kurdish
York
case
media
Event Type: "Justice"
Trigger: Arrest, Jail
Arguments:"Defendant"
"Time-Within"
"Adjudicator" "Place"
Doc 1
Doc 3
Doc 4
Doc 6
Doc N
Doc 2
Doc 5
7/31/2019 I3.2.pptx
19/33
Geographical Topic Discovery & Comparison
Motivation: Analyze GPS-associated documents,e.g., geo-tagged photos and tweets sent fromiphones
Problem: Given a collection of GPS-associateddocuments and # of topics K, discover K geo-topics along with the topic distribution indifferent geo. locations
Latent Geographical Topic Analysis Combine text and GPS location info
Words that are close to each other are morelikely to be in the same region. Words thatare in the same regions are more likely to be
in the same topic Regions are not known beforehand. Our
framework adopts the region discoveryprocess according to the dataset
19Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas Huang, Geographical
Topic Discovery and Comparison, WWW'11, Mar. 2011
http://www.cs.uiuc.edu/homes/hanj/pdf/www11_zyin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_zyin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_zyin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_zyin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_zyin.pdf7/31/2019 I3.2.pptx
20/33
Latent Association Analysis of Document Pairs
Latent Association Analysis (LAA) mines the topics of two documentsets simultaneously, taking the bipartite network between twodocument sets into consideration
One of the first attempts to analyze the topic structures of twoconnected document sets, aiming to infer their mapping networkmodel
LAA significantly outperforms existing algorithms with 70% accuracyimprovement
Topic Simplex for Corpus 1
?
Topic Simplex for Corpus 2
0 1
1
?
Correlation Factor
Document Pairs
Gengxin Miao, et al., Latent Association Analysis of Document Pairs, KDD11 (sub)
7/31/2019 I3.2.pptx
21/33
21
Information Network-Based Trustworthiness Analysis *COLING10,
Army Sci10 (Best Paper Award), WWW11+
Progressive Network Analysis for Expert Search (Diffusion through
Co-occurrence Relationships for Expert Search on the Web)
*SIGIR11 sub+
Modeling and Exploiting Heterogeneous Sources for Expertise
Ranking *SIGIR11 sub+
Personalized Recommendation on Information Networks
*SIGIR11 sub+
Multi-facet Search in Self-Boosting Information Networks
(Demo: Terrorism Network Search and Browsing) *SIGIR11 sub+
21
Subtask 3: Multi-Facet Search and Mining
Information Network Based Trustworthiness
7/31/2019 I3.2.pptx
22/33
22
Information Network-Based TrustworthinessAnalysis
Given: Multiple Information networks: websites, blogs, forums, sensor networks
Some claims, e.g., [Person A travelled to France], [There is a fire indowntown Chicago] Prior beliefs and background knowledge
Our goal is to: Score trustworthiness of Claims based on support across multiple (trusted) sources in the network source characteristics:
reputation, interest-group, verifiability of information, etc. Prior Beliefs and Background knowledge
Rate databases/sources as more/less trustworthy Track how the trustworthiness of fact / database varies with time as the text
corpus grows over time New framework for incorporating prior knowledge into anyfact-finding
algorithm Done via a Linear Programming approach Highly expressive declarative constraints Tractable (polynomial time)
Prior knowledge improve results Absolutely essential when the users judgment varies from the norm
22Dan Roth et al, COLING10, Army Sci10 (Best Paper Award), WWW11
7/31/2019 I3.2.pptx
23/33
Progressive Network Analysis for Expert Search
Goal: find and rank people who have expertise described by user query
Web pages are more noisy, contain spam compared to corpus in anenterprise. Both relevance and reputation should be considered
Use a heterogeneous hypergraph to model the co-occurrencerelationships among people and words and devise a heat diffusion
model on the hyerpgraph Applied to 0.5B web pages
Accuracy: 50%-200% improvement than the leading language modelmethods. Significantly overcome noises in the Web.
Ziyu Guan, et al., Diffusion through Co-occurrence Relationships for Expert Search on the Web, SIGIR11 (sub)23
M d li d E l iti H t
7/31/2019 I3.2.pptx
24/33
Modeling and Exploiting HeterogeneousSources for Expertise Ranking
24
Coauthor graphCitation graph
Problem: How to leverage both heterogeneous network anddocuments to identify the relevant experts for a given query?
Baseline: The expertise of a person could be characterizedbased on his/her associated documents (doc-based method)
Intuitions:
Citation graph: Similar documents are likely to have
similar relevance to a given queryCoauthor graph: Two authors are most likely to sharesimilar expertise if they coauthor many papers.
Document-author bipartite graph: mutually reinforcedbetween documents (x) and authors (y)
Top-10 experts for query:Information retrieval
Solution: We formulate a joint regularization framework toincorporate several hypotheses to capture the information ofdifferent graphs together with textual documents
Hongbo Deng, et al., Modeling and Exploiting Heterogeneous Sources for Expertise Ranking, SIGIR11 (sub)
Result: Using DBLP with 2M nodes and 10M edges.Significant improvements over the baseline.
M lti Facet Search in Self Boosting Information Net orks
7/31/2019 I3.2.pptx
25/33
Multi-Facet Search in Self-Boosting Information Networks(Example: Terrorism Network Search and Browsing)
Demo: http://blender2.cs.qc.cuny.edu/BlenderGraph/
Video: http://nlp.cs.qc.cuny.edu/terrorism.m4v
(CUNY + UIUC) Sam Anzaroot, Javier Artiles, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Search andBrowsing Self-Boosting Information Networks. SIGIR2011 [SUB]
Facilitate a military analyst in expert finding and terrorist information search gathering,control and analysis for any given query Entity-topic analyzer for self-expansion and self-boosting: Terrorism organization members status of members (die, arrest,...) and information networks associatedwith each member
http://blender2.cs.qc.cuny.edu/BlenderGraph/http://nlp.cs.qc.cuny.edu/terrorism.m4vhttp://nlp.cs.qc.cuny.edu/terrorism.m4vhttp://blender2.cs.qc.cuny.edu/BlenderGraph/7/31/2019 I3.2.pptx
26/33
26
Subtask 1: Text-Rich Network Modeling and Construction
Object search task enhanced by entity disambiguation and rolediscovery can provide methods for finding groups of soldiers and
identifying terrorists with certain expertise
Subtask 2: Network-Enhanced Text Analysis
Asymmetric wars and counter-terrorism need understand text-rich net
Text mining for monitoring potential threats and detecting terrorism
with entity-topic modeling and event detection and tracking
Subtask 3: Multi-Facet Search and Mining in Text-Rich Networks
Most military applications need to search in multi-facets on text and
unstructured data, including emails, reports, telecommunicationmessages, military-related news and blogs
Our multi-facet multi-dimensional information network search and
browsing tool has rich functions and provide intelligent network
expansions
26
Military Relevance
7/31/2019 I3.2.pptx
27/33
I3.2s Collaboration Network
27
I3.2: Han, Ji,Roth, Yan
weekly teleconsfrequent emails5 joint papers
I1.2
Tarek, Charu
I1.1
RothHuang
ARLCole
Winkler
I3.1Han, Yan
T1.4
Parsons
E2.3Han
Logic Reasoning
for InformationValidation
IRC LeungData &
Experiments
Military Data forTopic Analysis
T1.1 Adali
T1.5Lin, Wen
S1.1 Lin
7/31/2019 I3.2.pptx
28/33
Next Six Months and Path Ahead to 2012
Continue research on mining text-intensive information networks
Research in three frontiers: (1) integrated classification and clustering innetwork mining, (2) build up a theory on link/relationship analysis inheterogeneous networks, and (3) explore military applications
Collaborations with researchers in other networks
Work with Nitesh Chawla, who has done much work on link prediction,on evaluation of mining methods for clustering and classification of
heterogeneous networks Work with SCNARC (Boleslaw Szymanski et al.) on using the method
developed here to mine social and cognitive networks
Next year research planned if funded
Effective theory and methods for mining heterogeneous networks
involving social and communication networks Network classification and clustering modeling in heterogeneous
information, social, and communication networks
Application of role discovery, network classification, and anomalydetection methods in military applications
28
7/31/2019 I3.2.pptx
29/33
Research Papers (Accepted/Published, 2011)
1. Tim Weninger, Marina Danilevsky, Fabio Fumarola, Joshua Hailpern, Jiawei Han, et al., WinaCS:Construction and Analysis of Web-Based Computer Science Information Networks", Proc. of 2011
ACM SIGMOD Int. Conf. on Management of Data (SIGMOD'11), (system demo paper), Athens,Greece, June 2011.
2. Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas Huang, Geographical TopicDiscovery and Comparison, Proc. of 2011 Int. World Wide Web Conf. (WWW'11), Hyderabad,India, Mar. 2011 (Full paper).
3. Tim Weninger, Fabio Fumarola, Cindy Xide Lin, Rick Barber, Jiawei Han, and Donato Malerba,Growing Parallel Paths for Entity-Page Discovery, Proc. of 2011 Int. World Wide Web Conf.
(WWW'11), Hyderabad, India, Mar. 2011 (Poster paper)4. Heng Ji and Ralph Grishman. "Knowledge Base Population: Successful Approaches andChallenges". Accepted by Proc. the 49th Annual Meeting of the Association for ComputationalLinguistics: Human Language Technologies (ACL-HLT2011), 2011.
5. Heng Ji, Adam Lee and Wen-Pin Lin. "Information Network Construction and Alignment fromAutomatically Acquired Comparable Corpora". Invited book chapter for Building and UsingComparable Corpora. Springer, 2011.
6. Heng Ji, Benoit Favre, Wen-Pin Lin, Dan Gillick, Dilek Hakkani-Tur and Ralph Grishman. 2011.
"Open-domain Multi-document Summarization via Information Extraction: Challenges andProspects". Invited book chapter for Multi-source, Multilingual Information Extraction andSummarisation. Springer.
7. Lev Ratinov, Doug Downey, Mike Anderson, Dan Roth, Local and Global Algorithms forDisambiguation to Wikipedia , ACL11
8. Y. Chan and D. Roth, Exploiting Syntactico-Semantic Structures for Relation Extraction, ACL11
29
http://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_zyin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_zyin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_twininger.pdfhttps://agora.cs.illinois.edu/download/attachments/30425499/5.pdf?version=1&modificationDate=1300050634857https://agora.cs.illinois.edu/download/attachments/30425499/5.pdf?version=1&modificationDate=1300050634857https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_6.pdf?version=1&modificationDate=1300050897721https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_6.pdf?version=1&modificationDate=1300050897721https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_7.pdf?version=1&modificationDate=1300051015435https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_7.pdf?version=1&modificationDate=1300051015435https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_7.pdf?version=1&modificationDate=1300051015435https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_7.pdf?version=1&modificationDate=1300051015435https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_7.pdf?version=1&modificationDate=1300051015435https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_7.pdf?version=1&modificationDate=1300051015435https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_7.pdf?version=1&modificationDate=1300051015435https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_7.pdf?version=1&modificationDate=1300051015435https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_6.pdf?version=1&modificationDate=1300050897721https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_6.pdf?version=1&modificationDate=1300050897721https://agora.cs.illinois.edu/download/attachments/30425499/5.pdf?version=1&modificationDate=1300050634857https://agora.cs.illinois.edu/download/attachments/30425499/5.pdf?version=1&modificationDate=1300050634857http://www.cs.uiuc.edu/homes/hanj/pdf/www11_twininger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_twininger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_twininger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_zyin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/www11_zyin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/sigmod11_tweninger.pdf7/31/2019 I3.2.pptx
30/33
Research Papers (Published, Sept.-Dec. 2010)
1. Manish Gupta, Rui Li, Zhijun Yin, and Jiawei Han, Survey on Social Tagging Techniques", SIGKDD
Explorations, 12(1):58-72, 2010.
2. Lu Liu, Jie Tang, Jiawei Han, Meng Jiang, Shiqiang Yang, Mining Topic-Level Influence in
Heterogeneous Networks", Proc. 2010 ACM Int. Conf. on Information and Knowledge
Management (CIKM'10), Toronto, Canada, Oct. 2010
3. Tim Weninger, Fabio Fumarola, Jiawei Han, Donato Malerba, Mapping Web Pages to Database
Records via Link Paths", Proc. 2010 ACM Int. Conf. on Information and Knowledge Management
(CIKM'10), Toronto, Canada, Oct. 2010.
4. Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han, The Wisdom of SocialMultimedia: Using Flickr for Prediction and Forecast", Proc. 2010 ACM Multimedia Int. Conf.
(ACM-Multimedia10), Florence, Italy, Oct. 2010
5. Zheng Chen, Suzanne Tamang, Adam Lee, Xiang Li, Wen-Pin Lin, Javier Artiles, Matthew Snover,
Marissa Passantino and Heng Ji. "CUNY-BLENDER TAC-KBP2010 Entity Linking and Slot Filling
System Description". Proc. Text Analytics Conference (TAC2010), 2010
6. Hao Li, Xiang Li, Heng Ji and Yuval Marton. 2010. "Domain-Independent Novel Event Discoveryand Semi-Automatic Event Annotation". Proc. the 23rd Pacific Asia Conference on Language,
Information and Computation (PACLIC 2010)
7. J. Pasternack and Dan Roth, Comprehensive Trust Metrics for Information Networks , Army
Science Conf.10 (Best Paper Award), Dec. 2010.
8. Q. Do and D. Roth, Constraints based Taxonomic Relation Classification, EMNLP10, Oct. 2010
30
http://www.cs.uiuc.edu/homes/hanj/pdf/cikm10_lliu.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/cikm10_lliu.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/cikm10_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/cikm10_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/mm10_xjin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/mm10_xjin.pdfhttps://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_17.pdf?version=1&modificationDate=1300051134188https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_17.pdf?version=1&modificationDate=1300051134188https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_17.pdf?version=1&modificationDate=1300051134188https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_17.pdf?version=1&modificationDate=1300051134188https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_17.pdf?version=1&modificationDate=1300051134188https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_17.pdf?version=1&modificationDate=1300051134188https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_17.pdf?version=1&modificationDate=1300051134188https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_17.pdf?version=1&modificationDate=1300051134188https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556https://agora.cs.illinois.edu/download/attachments/30425499/hji_app_16.pdf?version=1&modificationDate=1300051108556http://www.cs.uiuc.edu/homes/hanj/pdf/mm10_xjin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/mm10_xjin.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/cikm10_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/cikm10_tweninger.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/cikm10_lliu.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/cikm10_lliu.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/cikm10_lliu.pdfhttp://www.cs.uiuc.edu/homes/hanj/pdf/cikm10_lliu.pdf7/31/2019 I3.2.pptx
31/33
Research Papers (Submitted, 2011)1. (UIUC + U. Michigan) Cindy Xide Lin, Qiaozhu Mei, Yunliang Jiang, and Jiawei Han, "Inferring the Diffusion and Evolution of
Topics in Social Communities", KDD'11 (sub)
2. (UIUC) Hongbo Deng, Jiawei Han, Bo Zhao, Yintao Yu, Cindy Xide Lin, "Probabilistic Topic Models with Biased Propagation on
Heterogeneous Information Networks", KDD'11 (sub)3. (UIUC) Zhijun Yin (UIUC), Liangliang Cao (UIUC), Jiawei Han (UIUC), Chengxiang Zhai (UIUC), Thomas Huang (UIUC), "LPTA: A
Probabilistic Model for Latent Periodic Topic Analysis", KDD'11 (sub)
4. (CUNY + UIUC) Heng Ji and Jiawei Han. 2011. Web-Scale Knowledge Discovery and Information Extraction. Invited Paper for
IEEE Special Issue on Web-Scale Multimedia Processing and Applications. In Preparation.
5. (CUNY + UIUC) Hao Li, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Topically Related Data is Better Data: Topic Modeling for
Event Extraction. Submitted to the 49th Annual Meeting of the Association for Computational Linguistics: Human Language
Technologies (ACL-HLT2011)
6. (CUNY + UIUC) Sam Anzaroot, Javier Artiles, Heng Ji, Hongbo Deng and Jiawei Han. 2011. Search and Browsing Self-BoostingInformation Networks. Submitted to the 34th Annual International ACM SIGIR Conference (SIGIR2011)
7. (CUNY) Javier Artiles, Qi Li, Enrique Amigo and Heng Ji. 2011. Leveraging Cross-document Redundancy for Temporal
Information Extraction. Submitted to Empirical Methods in Natural Language Processing (EMNLP2011)
8. (CUNY) Javier Artiles, Enrique Amigo, Qi Li and Heng Ji. 2011. Evaluating Temporal Information Extraction. Submitted to ACL-
HLT2011
9. (CUNY) Zheng Chen and Heng Ji . 2011. Collaborative Ranking: A Case Study in Entity Linking. Submitted to Conference on
Empirical Methods in Natural Language Processing (EMNLP2011)
10. (CUNY) Qi Li, Javier Artiles and Heng Ji. 2011. Dependency Paths Kernel for Temporal Relation Classification. Submitted to 49thAnnual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL-HLT2011).
11. (CUNY) Suzanne Tamang and Heng Ji. 2011. Learning-to-Rank for Slot Filling System Combination and Assessment. Submitted
to Conference on Empirical Methods in Natural Language Processing (EMNLP2011)
12. (CUNY) Zheng Chen, Suzanne Tamang, Adam Lee and Heng Ji. 2011. A Toolkit for Knowledge Base Population. Submitted to
the 34th Annual International ACM SIGIR Conference (SIGIR2011)
13. (CUNY) Xiang Li and Heng Ji. 2011. Comment-guided Reinforcement Learning for Slot Filling. Submitted to Conference on
Empirical Methods in Natural Language Processing (EMNLP2011)
31
7/31/2019 I3.2.pptx
32/33
Other Technical Contributions (Book: UIC + UIUC + CMU) Philip S. Yu, Jiawei Han, and Christos Faloutsos (Editors), LINK MINING: MODELS,
ALGORITHMS AND APPLICATIONS, Springer, 2010.
(UIUC) Jiawei Han has received Daniel C. Drucker Eminent Faculty Award at UIUC (UCSB) Ms. Gengxin Miao, who was supported by the INARC program, has received IBM Ph.D. Fellowship for
2011-2012. Gengxin Miao is co-supervised by Xifeng Yan at INARC.
(CUNY) Heng Ji. CUNY Chancellor's "Salute to Scholar" Award, November 2010.
(CUNY) Heng Ji. National Science Foundation Research Experiences for Undergraduates, March 2011
Jiawei Han, Towards Integrated Mining of Multiple Social and Information Networks (keynote speech) The
2011 Int. Conf. on Advances in Social Network Analysis and Mining (ASONAM11), July 2011.
Jiawei Han, Exploring the Power of Heterogeneous Information Networks in Data Mining (keynote speech)The 2011 Int. SIAM Data Mining Conf. (SDM11), April 2011.
Jiawei Han, Construction and Analysis of Web-Based Computer Science Information Networks (keynote
speech) The 2011 Int. Conf. on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing, June 2011.
Latifur Khan, Wei Fan, Jiawei Han, Jing Gao, Mohammad Mehedy Masud, Data Stream Mining: Challenges
and Techniques, (tutorial), The 15th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD
2011), May 2011
Jiawei Han, Web Structure Mining and Information Network Analysis: An Integrated Approach, invited speechat the Third International Workshop on Network Theory: Web Science Meets Network Science, March 2011.
Heng Ji, Web-Scale Knowledge Discovery and Population from Unstructured Data, Keynote Speech ACLCLP
2010 Information Retrieval Conference, December 2010.
Heng Ji. Overview of the TAC2010 Knowledge Base Population Track, Keynote Speech at Web People Search
(WePS-3) Conference, September 2010.
32
Personalized Recommendation on
http://www.amazon.com/Link-Mining-Models-Algorithms-Applications/dp/1441965149http://www.amazon.com/Link-Mining-Models-Algorithms-Applications/dp/1441965149http://www.amazon.com/Link-Mining-Models-Algorithms-Applications/dp/1441965149http://www.amazon.com/Link-Mining-Models-Algorithms-Applications/dp/14419651497/31/2019 I3.2.pptx
33/33
Personalized Recommendation onInformation Networks
Concept extractionText Concept
Combine text & links inheterogeneous
networks
Find good conceptualassociations of userinterests; distinguishclean sources andnoisy sources
(UIUC) Chi Wang, et al., Learning Relevance in a Heterogeneous Social Network and Its