Understanding Android Fragmentation with Topic
Analysis of Vendor-Specific Bugs
Dan Han, Chenlei Zhang, Xiaochao Fan, Abram Hindle, Kenny Wong and Eleni StrouliaDepartment of Computing Science
University of Alberta
• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion
Outline
• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion
Introduction
Hardware-Based Fragmentation
http://www.android.com/devices/?country=all
Software-Based Fragmentation
http://www.blackeco.com/petites-speculations-autour-de-la-prochaine-version-dandroid/
Why do we care
More than 20 Android device manufacturers
Multiple Android versions
6
Hundreds of different Android devices
Developers
Users
Stakeholders
What do we do in this study
Goal: search for evidence of Android fragmentation within Android ecosystem based on the Android bug reports
Approach: apply topic model and topic analysis
• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion
Previous Work
Topic Model and Topic Analysis
Topic Model: a statistical model for discovering abstract topics that occur in a collection of documents. Latent Dirichlet Allocation (LDA)
Topic Analysis: extract and evaluate the topics from a corpus of text documents through topic models Traceability recovery: Asuncion et al., Lukins et al., Hindle et
al.
Feature location: Marcus et al., Poshyvanyk et al., Grant et al.
Software evolution and trend analysis: Thomas et al., Martie et al.
Differences between previous work and our work
Previous work applied unsupervised topic models, e.g. LDA
We performed Labeled-LDA, a supervised topic model to analyze topic evolution
We compared the performance between LDA and Labeled-LDA on our dataset
LDA and Labeled-LDA
Labeled-LDA A novel method applied in
software engineering so far
Manual labeling Supervised topic
modeling algorithm Labeled-LDA only
predicts the relevance between each document and its labels
LDA Well studied in software
engineering Unsupervised topic
modeling algorithm Need documents and
number of topics N as input
LDA predicts the relevance between each document and all the N topics
Difference between a topic and a label
Topic: A word distribution extracted from bug
reports by topic models
Label: The annotation of a document
• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion
Methodology
Methodology
Case Study
Android bug reports, 2008-2011, 20,000+
Vendor-specific bug reports HTC -- 1503 Motorola --1058
http://www.puremobile.co.uk/insiderblog/wp-content/uploads/2011/08/Motorola-Mobile_logo.jpg
http://www.finestdaily.com/news/htc-jetstream-to-be-launched-on-september-6t.html/attachment/htc_cmyk_white_strapline
VS
Create labels for Android bug reports
Feature-oriented labels for Android bug reports
Android labels Features in Android versionse.g. Language, Bluetooth
Popular applicationse.g. Google Maps, Gmail
Hardware of Android devices e.g. Keyboard, GPS
Label Android bug reports
60 person-hours of manual labeling effort
Labeled bug reports are public now
HTC – 72 labels in total Motorola – 58 labels in total
Apply Labeled-LDA
Apply LDA
Try a range of N to find the most distinct topics
Label each topic using our manual labels for the bug reports of HTC and Motorola
2 hours of labeling effort
Comparing LDA and Labeled-LDA
Each topic model generates the document-topic matrix
Determine if LDA generates similar results to Labeled-LDA
Compute and compare the Jaccard similarity of documents related to each topic generated by LDA and Labeled-LDA
• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion
Comparing Topic Models
Comparing Topic Models in HTCPairwise Jaccard Similarity between each topic in LDA and Labeled-LDA
Labeled-LDA
LDA
Diagonal Entries in HTC
Labeled-LDA
LDA
Comparing Topic Models in MotorolaPairwise Jaccard Similarity between each topic in LDA and Labeled-LDA
LDA
Labeled-LDA
Conclusion of comparing LDA and Labeled-LDA
Mean Jaccard similarities of the diagonal entries are 0.2 for HTC and 0.08 for Motorola
The number of bug reports related to same labels in LDA and Labeled-LDA are different ( tests: p<0.01) for both HTC and Motorola
Labeled-LDA produced more feature relevant topics than LDA
2
• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion
Topic Analysis
Categorized Topics
Common Topics Unique Topics
Unique TopicsCommon
Topics Unique Topics
HTC Motorola
Common Topics
Both vendors share many identical topic wordsLabel HTC Motorola
bluetooth bluetooth, headset, car, connect, device, connection, version, data, app, desire, 2.2, work, connects, behavior, 2.1
bluetooth, headset, droid, device, connected, connect, devices, calls, car, issue, connection, 2.2, car, pair, time
Relevance of common topic “bluetooth” in HTC and MotorolaAndroid 2.1 Android 2.2
Common Topics
Topics of each vendor tend to have vendor-specific terms
Label HTC Motorola
display screen, version, desire, behavior, app, home, number, code, final, press, sure, user, black, new, power
droid, screen, button, correct, home, display, behavior, landscape, 2.1, menu, bar, xoom, device, user, status
http://www.motorola.com/xoomhttp://en.wikipedia.org/wiki/Motorola_droid
http://en.wikipedia.org/wiki/HTC_Desire
Unique Topics in HTCLabel HTC Motorola
keyboard keyboard, input, text, key, version, number, typing, on-screen, mode, field, landscape, virtual, keys, type, message
keyboard, droid, keys, text, press, space, box, open, device, key, app, software, 2.0.1, landscape
Relevance of unique topic “keyboard” in HTCAndroid 2.1Android 1.5
Unique Topics in Motorola
Label HTC Motorola
GPS gps, data, position, location, maps, google, time, lock, wrong, icon, turn, home, latitude, unit, tag, available
maps, gps, google, app, droid, location, application, navigation, map, device, traffic, time, upgrade, turn, route
Relevance of unique topic “GPS” in Motorola
Android 2.2
• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion
Discussion
Fragmentation Discussion
Software-Based Fragmentation New features and changes contribute the bug reports Difficult to test across all of the vendor and product-lines
Relevance of common topic “bluetooth” in HTC and Motorola
Android 2.1 Android 2.2
Fragmentation Discussion
Hardware-Based Fragmentation Different product lines were associated with different topics Evident by differing bug topics and product specific issues
Label HTC Motorola
display screen, version, desire, behavior, app, home, number, code, final, press, sure, user, black, new, power
droid, screen, button, correct, home, display, behavior, landscape, 2.1, menu, bar, xoom, device, user, status
• Introduction• Previous Work• Methodology• Comparing Topic Models• Fragmentation Topic Analysis• Fragmentation Discussion• Conclusion
Conclusion
ConclusionFound how fragmentation is manifested within Android between HTC and Motorola
Incompatibility issues Portability issues
Compared the performance of Labeled-LDA and LDA
Labeled-LDA produced more feature relevant topics than LDA Labeled-LDA need more manual effort http://softwareprocess.es/static/Fragmentation.html (http://goo.gl/SwGDT)
Could be useful to make project dashboards, process mining and software process recovery