21
Mining Technology Landscape from Stack Overflow Chunyang Chen, Zhenchang Xing School of Computer Science and Engineering, Nanyang Technological University, Singapore

School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

Mining Technology Landscape from Stack Overflow

Chunyang Chen, Zhenchang XingSchool of Computer Science and Engineering, Nanyang Technological University, Singapore

Page 2: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

Given a task, most developers need an overview of related

technology to understand:

The scope and challenge of this task;

The existing method or algorithm for such task;

The framework, library or related code to help solve the task

……

Motivation

Page 3: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

How to find technology landscape?

1. Search Google for technology overview

Motivation

Easy to be lost, biasdand out of date

Page 4: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

2. They ask in Q&A site such as Stack Overflow

Motivation

Not allowed

Page 5: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

How can we help such information needs automatically

and objectively?

a graph for technology landscape!

How to solve it

Page 6: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

Stack Overflow covers most programming technologies

12m questions, 20m answers, 6m users

Mine a landscape of technologies automatically from SO.

Observation

Page 7: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

Tags in Stack Overflow

A tag is a word or phrase that describes the technology or concept of the question.

Frequent co-occurring tags indicate potential relations between technologies.

There are millions of questions attached with tags.

Hence, we adopt tags in SO to mine technology landscape.

Dataset

Tags

Page 8: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

Association rules mining

Minimum support & confidence value

Frequent pairs of tags e.g., (java -- spring, c -- pointer, html -- css)

Build a graph

Link tag pairs into a graph, we call it technology associative

network (TAN)

Community detection

Cluster nodes in the graph

Construct Technology Associative Network

Page 9: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

The overview of important technologies related to

programming in Stack Overflow:

Technology Associative Network

Page 10: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

Tags in Stack Overflow belong to different categories:

• Java programming language;

• Eclipse IDE;

• Binary-search algorithm;

• Caching mechanism;

• D3.js library.

Tag Category

Page 11: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

Community TagWiki info

• First sentence

Get the category of the tag

• Part-of-Speech tagging

and phrase chunking

• First noun phrase after

“be” is category label

Identify tag category

Page 12: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

Combine both the graph and category information to

complete the technology landscape:

Technology Landscape

Page 13: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

Website (TechLand)

https://graphofknowledge.appspot.com

Application

4

5

6

1

2

3

Page 14: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

RQ1: Can the mined TAN capture the important

technologies from a majority of Stack Overflow

questions?

RQ2: How do different mining thresholds affect the size

and modularity of the mined TAN?

RQ3: Are the mined technology associations semantically

related?

RQ4:What are structural properties of the mined TAN?

RQ5: How do the technology landscape evolve over time?

Empirical Study

Page 15: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

If one tag of a question is covered, we count this question

as covered

The higher minimum support value, the lower coverage

RQ1: Coverage of Tags and Questions

Page 16: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

size and modularity of technology

community

Higher confidence results in sparse

graph while lower confidence

results in dense graph

Semantic distance of technology

associations by “Google

Distance”

As Figure (a) shows, most edges are

covered in Google Trends

RQ2 & RQ3

Page 17: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

General & Specific TAN

General Technological associative network can capture high-

level knowledge.

After zooming in, more detailed knowledge emerges

RQ4: network structure

Page 18: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

For each month, draw TAN

Their difference is smaller

and smaller

Different communities begin

to emerge and stabilize

RQ5: Network Evolution

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

2008-0

9

2008-1

2

2009-0

3

2009-0

6

2009-0

9

2009-1

2

2010-0

3

2010-0

6

2010-0

9

2010-1

2

2011-0

3

2011-0

6

2011-0

9

2011-1

2

2012-0

3

2012-0

6

2012-0

9

2012-1

2

2013-0

3

2013-0

6

2013-0

9

2013-1

2

2014-0

3

2014-0

6

2014-0

9

2014-1

2

Jacc

ard c

oeffic

ient

Month

technology

technology-association

Page 19: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

1. Answer questions in Stack Overflow

Usefulness of our TechLand

• 3 types, 9 questions from Stack Overflow;

• 7 PhD students

• Compare original answers and that from our TechLand

• Mark accuracy, coverage, satisfaction by 5-point likert scale

Page 20: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

2. Web usage

Usefulness of our TechLand

https://graphofknowledge.appspot.com/

Page 21: School of Computer Science and Engineering, Nanyang ...School of Computer Science and Engineering, Nanyang Technological University, Singapore • Chen, Chunyang, and Zhenchang Xing

Thanks for listening

Chunyang Chen, Zhenchang XingSchool of Computer Science and Engineering, Nanyang Technological University, Singapore

• Chen, Chunyang, and Zhenchang Xing. "Mining technology landscape from stack overflow."

In Proceedings of the 10th ACM/IEEE International Symposium on Empirical Software Engineering and

Measurement, p. 14. ACM, 2016.

• Chen, Chunyang, Zhenchang Xing, and Lei Han. "Techland: Assisting technology landscape inquiries

with insights from stack overflow." In 2016 IEEE International Conference on Software Maintenance

and Evolution (ICSME), pp. 356-366. IEEE, 2016.