110
STATISTICAL NATURAL LANGUAGE PROCESSING METHODS FOR INTELLIGENT PROCESS AUTOMATION Inaugural-Dissertation zur Erlangung des Doktorgrades der Philosophie an der Ludwig-Maximilians-Universität München vorgelegt von Alena Moiseeva aus Moskau, Russische Föderation München 2019 [ September 4, 2020 at 10:17 ]

Statistical Natural Language Processing Methods for

  • Upload
    others

  • View
    5

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Statistical Natural Language Processing Methods for

S TAT I S T I C A L N AT U R A L L A N G U A G E P R O C E S S I N G M E T H O D SF O R I N T E L L I G E N T P R O C E S S A U T O M AT I O N

Inaugural-Dissertationzur Erlangung des Doktorgrades der Philosophiean der Ludwig-Maximilians-Universitaumlt Muumlnchen

vorgelegt vonAlena Moiseeva

aus Moskau Russische Foumlderation

Muumlnchen 2019

[ September 4 2020 at 1017 ndash classicthesis ]

Erstgutachter Prof Dr Hinrich Schuumltze

1Korreferent Prof Dr Alexander M Fraser2Korreferent Prof Dr Rudolf Seiler

tag der muumlndlichen pruumlfung 19 05 2020

[ September 4 2020 at 1017 ndash classicthesis ]

A B S T R A C T

Nowadays digitization is transforming the way businesses work Re-cently Artificial Intelligence (AI) techniques became an essential partof the automation of business processes In addition to cost advan-tages these techniques offer fast processing times and higher cus-tomer satisfaction rates thus ultimately increasing sales One of theintelligent approaches for accelerating digital transformation in com-panies is the Robotic Process Automation (RPA) A RPA-system is asoftware tool that robotizes routine and time-consuming responsibil-ities such as email assessment various calculations or creation ofdocuments and reports (Mohanty and Vyas 2018) Its main objectiveis to organize a smart workflow and therethrough to assist employeesby offering them more scope for cognitively demanding and engag-ing work

Intelligent Process Automation (IPA) offers all these advantages aswell however it goes beyond the RPA by adding AI componentssuch as Machine- and Deep Learning techniques to conventional au-tomation solutions Previously IPA approaches were primarily em-ployed within the computer vision domain However in recent timesNatural Language Processing (NLP) became one of the potential appli-cations for IPA as well due to its ability to understand and interprethuman language Usually NLP methods are used to analyze largeamounts of unstructured textual data and to respond to various in-quiries However one of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots virtual agents)that are used to enable human-to-machine communication Nowa-days conversational agents gain enormous demand due to their abil-ity to support a large number of users simultaneously while com-municating in a natural language The implementation of a conver-sational agent comprises multiple stages and involves diverse typesof NLP sub-tasks starting with natural language understanding (egintent recognition named entity extraction) and going towards dia-logue management (ie determining the next possible bots action)and response generation Typical dialogue system for IPA purposesundertakes straightforward customer support requests (eg FAQs)allowing human workers to focus on more complicated inquiries

In this thesis we are addressing two potential Intelligent ProcessAutomation (IPA) applications and employing statistical Natural Lan-guage Processing (NLP) methods for their implementation

The first block of this thesis (Chapter 2 ndash Chapter 4) deals withthe development of a conversational agent for IPA purposes within thee-learning domain As already mentioned chatbots are one of thecentral applications for the IPA domain since they can effectively per-form time-consuming tasks while communicating in a natural lan-

III

[ September 4 2020 at 1017 ndash classicthesis ]

guage Within this thesis we realized the IPA conversational botthat takes care of routine and time-consuming tasks regularly per-formed by human tutors of an online mathematical course Thisbot is deployed in a real-world setting within the OMB+ mathemat-ical platform Conducting experiments for this part we observedtwo possibilities to build the conversational agent in industrial set-tings ndash first with purely rule-based methods considering the miss-ing training data and individual aspects of the target domain (iee-learning) Second we re-implemented two of the main system com-ponents (ie Natural Language Understanding (NLU) and DialogueManager (DM) units) using the current state-of-the-art deep-learningarchitecture (ie Bidirectional Encoder Representations from Trans-formers (BERT)) and investigated their performance and potential useas a part of a hybrid model (ie containing both rule-based and ma-chine learning methods)

The second part of the thesis (Chapter 5 ndash Chapter 6) considers anIPA subproblem within the predictive analytics domain and addressesthe task of scientific trend forecasting Predictive analytics forecastsfuture outcomes based on historical and current data Therefore us-ing the benefits of advanced analytics models an organization canfor instance reliably determine trends and emerging topics and thenmanipulate it while making significant business decisions (ie invest-ments) In this work we dealt with the trend detection task ndash specif-ically we addressed the lack of publicly available benchmarks forevaluating trend detection algorithms We assembled the benchmarkfor the detection of both scientific trends and downtrends (ie topicsthat become less frequent overtime) To the best of our knowledgethe task of downtrend detection has not been addressed before Theresulting benchmark is based on a collection of more than one mil-lion documents which is among the largest that has been used fortrend detection before and therefore offers a realistic setting for thedevelopment of trend detection algorithms

IV

[ September 4 2020 at 1017 ndash classicthesis ]

Z U S A M M E N FA S S U N G

Robotergesteuerte Prozessautomatisierung (RPA) ist eine Art von Soft-ware-Bots die manuelle menschliche Taumltigkeiten wie die Eingabevon Daten in das System die Anmeldung in Benutzerkonten oderdie Ausfuumlhrung einfacher aber sich wiederholender Arbeitsablaumlufenachahmt (Mohanty and Vyas 2018) Einer der Hauptvorteile undgleichzeitig Nachteil der RPA-bots ist jedoch deren Faumlhigkeit die ge-stellte Aufgabe punktgenau zu erfuumlllen Einerseits ist ein solches Sys-tem in der Lage die Aufgabe akkurat sorgfaumlltig und schnell auszu-fuumlhren Andererseits ist es sehr anfaumlllig fuumlr Veraumlnderungen in definier-ten Szenarien Da der RPA-Bot fuumlr eine bestimmte Aufgabe konzipiertist ist es oft nicht moumlglich ihn an andere Domaumlnen oder sogar fuumlreinfache Aumlnderungen in einem Arbeitsablauf anzupassen (Mohantyand Vyas 2018) Diese Unfaumlhigkeit sich an veraumlnderte Bedingungenanzupassen fuumlhrte zu einem weiteren Verbesserungsbereich fuumlr RPA-bots ndash den Intelligenten Prozessautomatisierungssystemen (IPA)

IPA-Bots kombinieren RPA mit Kuumlnstlicher Intelligenz (AI) und koumln-nen komplexe und kognitiv anspruchsvollere Aufgaben erfuumlllen dieuA Schlussfolgerungen und natuumlrliches Sprachverstaumlndnis erfordernDiese Systeme uumlbernehmen zeitaufwaumlndige und routinemaumlszligige Auf-gaben ermoumlglichen somit einen intelligenten Arbeitsablauf und be-freien Fachkraumlfte fuumlr die Durchfuumlhrung komplizierterer AufgabenBisher wurden die IPA-Techniken hauptsaumlchlich im Bereich der Bild-verarbeitung eingesetzt In der letzten Zeit wurde die natuumlrlicheSprachverarbeitung (NLP) jedoch auch zu einem der potenziellen An-wendungen fuumlr IPA und zwar aufgrund von der Faumlhigkeit die mensch-liche Sprache zu interpretieren NLP-Methoden werden eingesetztum groszlige Mengen an Textdaten zu analysieren und auf verschiedeneAnfragen zu reagieren Auch wenn die verfuumlgbaren Daten unstruk-turiert sind oder kein vordefiniertes Format haben (zB E-Mails) oderwenn die in einem variablen Format vorliegen (zB Rechnungen ju-ristische Dokumente) dann werden ebenfalls die NLP Techniken an-gewendet um die relevanten Informationen zu extrahieren die dannzur Loumlsung verschiedener Probleme verwendet werden koumlnnen

NLP im Rahmen von IPA beschraumlnkt sich jedoch nicht auf die Extrak-tion relevanter Daten aus Textdokumenten Eine der zentralen An-wendungen von IPA sind Konversationsagenten die zur Interaktionzwischen Mensch und Maschine eingesetzt werden Konversations-agenten erfahren enorme Nachfrage da sie in der Lage sind einegroszlige Anzahl von Benutzern gleichzeitig zu unterstuumltzen und dabeiin einer natuumlrlichen Sprache kommunizieren Die Implementierungeines Chatsystems umfasst verschiedene Arten von NLP-Teilaufgabenbeginnend mit dem Verstaumlndnis der natuumlrlichen Sprache (zB Ab-sichtserkennung Extraktion von Entitaumlten) uumlber das Dialogmanage-

V

[ September 4 2020 at 1017 ndash classicthesis ]

ment (zB Festlegung der naumlchstmoumlglichen Bot-Aktion) bis hin zurResponse-Generierung Ein typisches Dialogsystem fuumlr IPA-Zweckeuumlbernimmt in der Regel unkomplizierte Kundendienstanfragen (zBBeantwortung von FAQs) so dass sich die Mitarbeiter auf komplexereAnfragen konzentrieren koumlnnen

Diese Dissertation umfasst zwei Bereiche die durch das breitereThema vereint sind naumlmlich die Intelligente Prozessautomatisierung(IPA) unter Verwendung statistischer Methoden der natuumlrlichen Sprach-verarbeitung (NLP)

Der erste Block dieser Arbeit (Kapitel 2 ndash Kapitel 4) befasst sichmit der Impementierung eines Konversationsagenten fuumlr IPA-Zweckeinnerhalb der E-Learning-Domaumlne Wie bereits erwaumlhnt sind Chat-bots eine der zentralen Anwendungen fuumlr die IPA-Domaumlne da siezeitaufwaumlndige Aufgaben in einer natuumlrlichen Sprache effektiv aus-fuumlhren koumlnnen Der IPA-Kommunikationsbot der in dieser Arbeitrealisiert wurde kuumlmmert sich ebenfalls um routinemaumlszligige und zeit-aufwaumlndige Aufgaben die sonst von Tutoren in einem Online-Mathe-matikkurs in deutscher Sprache durchgefuumlhrt werden Dieser Botist in der taumlglichen Anwendung innerhalb der mathematischen Platt-form OMB+ eingesetzt Bei der Durchfuumlhrung von Experimenten be-obachteten wir zwei Moumlglichkeiten den Konversationsagenten im in-dustriellen Umfeld zu entwickeln ndash zunaumlchst mit rein regelbasiertenMethoden unter Bedingungen der fehlenden Trainingsdaten und be-sonderer Aspekte der Zieldomaumlne (dh E-Learning) Zweitens habenwir zwei der Hauptsystemkomponenten (SprachverstaumlndnismodulDialog-Manager) mit dem derzeit fortschrittlichsten Deep LearningAlgorithmus reimplementiert und die Performanz dieser Komponen-ten untersucht

Der zweite Teil der Doktorarbeit (Kapitel 5 ndash Kapitel 6) betrachtetein IPA-Problem innerhalb des Vorhersageanalytik-Bereichs Vorher-sageanalytik zielt darauf ab Prognosen uumlber zukuumlnftige Ergebnisseauf der Grundlage von historischen und aktuellen Daten zu erstellenDaher kann ein Unternehmen mit Hilfe der Vorhersagesysteme zBdie Trends oder neu entstehende Themen zuverlaumlssig bestimmen unddiese Informationen dann bei wichtigen Geschaumlftsentscheidungen (zBInvestitionen) einsetzen In diesem Teil der Arbeit beschaumlftigen wiruns mit dem Teilproblem der Trendprognose ndash insbesondere mit demFehlen oumlffentlich zugaumlnglicher Benchmarks fuumlr die Evaluierung vonTrenderkennungsalgorithmen Wir haben den Benchmark zusammen-gestellt und veroumlffentlicht um sowohl Trends als auch Abwaumlrtstrendszu erkennen Nach unserem besten Wissen ist die Aufgabe der Ab-waumlrtstrenderkennung bisher nicht adressiert worden Der resultie-rende Benchmark basiert auf einer Sammlung von mehr als einerMillion Dokumente der zu den groumlszligten gehoumlrt die bisher fuumlr dieTrenderkennung verwendet wurden und somit einen realistischenRahmen fuumlr die Entwicklung von Trenddetektionsalgorithmen bietet

VI

[ September 4 2020 at 1017 ndash classicthesis ]

A C K N O W L E D G E M E N T S

This thesis was possible due to several people who contributed signif-icantly to my work I owe my gratitude to each of them for guidanceencouragement and inspiration during my doctoral study

First of all I would like to thank my advisor Prof Dr HinrichSchuumltze for his mentorship support and thoughtful guidance Myspecial gratitude goes for Prof Dr Ruedi Seiler and his fantastic teamat Integral-Learning GmbH I am thankful for your valuable inputinterest in our collaborative work and for your continuous supportI would also like to show gratitude to Prof Dr Alexander Fraser forco-advising support and helpful feedback

Finally none of this would be possible without my familyrsquos sup-port there are no words to express the depth of gratitude I feel forthem My earnest appreciation goes to my husband Dietrich thankyou for all the time you were by my side and for the encouragementyou provided me through these years I am very grateful to my par-ents for their understanding and backing at any moment I appreciatethe love of my grandparents it is hard to realize how significant theircare is for me Besides that I am also grateful to my sister Alexan-dra elder brother Denis and parents-in-law for merely being alwaysthere for me

VII

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

C O N T E N T S

List of Figures XIList of Tables XIIAcronyms XIII1 introduction 1

11 Outline 3

i e-learning conversational assistant 5

2 foundations 7

21 Introduction 7

22 Conversational Assistants 9

221 Taxonomy 9

222 Conventional Architecture 12

223 Existing Approaches 15

23 Target Domain amp Task Definition 18

24 Summary 19

3 multipurpose ipa via conversational assistant 21

31 Introduction 21

311 Outline and Contributions 22

32 Model 23

321 OMB+ Design 23

322 Preprocessing 23

323 Natural Language Understanding 25

324 Dialogue Manager 26

325 Meta Policy 30

326 Response Generation 32

33 Evaluation 37

331 Automated Evaluation 37

332 Human Evaluation amp Error Analysis 37

34 Structured Dialogue Acquisition 39

35 Summary 39

36 Future Work 40

4 deep learning re-implementation of units 41

41 Introduction 41

411 Outline and Contributions 42

42 Related Work 43

43 BERT Bidirectional Encoder Representations from Trans-formers 44

44 Data amp Descriptive Statistics 46

45 Named Entity Recognition 48

451 Model Settings 49

46 Next Action Prediction 49

461 Model Settings 49

47 Evaluation and Results 50

48 Error Analysis 58

IX

[ September 4 2020 at 1017 ndash classicthesis ]

X contents

49 Summary 58

410 Future Work 59

ii clustering-based trend analyzer 61

5 foundations 63

51 Motivation and Challenges 63

52 Detection of Emerging Research Trends 64

521 Methods for Topic Modeling 65

522 Topic Evolution of Scientific Publications 67

53 Summary 68

6 benchmark for scientific trend detection 71

61 Motivation 71

611 Outline and Contributions 72

62 Corpus Creation 73

621 Underlying Data 73

622 Stratification 73

623 Document representations amp Clustering 74

624 Trend amp Downtrend Estimation 75

625 (Down)trend Candidates Validation 75

63 Benchmark Annotation 77

631 Inter-annotator Agreement 77

632 Gold Trends 78

64 Evaluation Measure for Trend Detection 79

65 Trend Detection Baselines 80

651 Configurations 81

652 Experimental Setup and Results 81

66 Benchmark Description 82

67 Summary 83

68 Future Work 83

7 discussion and future work 85

bibliography 87

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F F I G U R E S

Figure 1 Taxonomy of dialogue systems 10

Figure 2 Example of a conventional dialogue architecture 12

Figure 3 OMB+ online learning platform 24

Figure 4 Ruled-Based System Dialogue flow 36

Figure 5 Macro F1 evaluation NAP ndash default modelComparison between German and multilingualBERT 52

Figure 6 Macro F1 test NAP ndash default model Compari-son between German and multilingual BERT 53

Figure 7 Macro F1 evaluation NAP ndash extended model 54

Figure 8 Macro F1 test NAP ndash extended model 55

Figure 9 Macro F1 evaluation NER ndash Comparison be-tween German and multilingual BERT 56

Figure 10 Macro F1 test NER ndash Comparison between Ger-man and multilingual BERT 57

Figure 11 (Down)trend detection Overall distribution ofpapers in the entire dataset 73

Figure 12 A trend candidate (positive slope) Topic Sen-timent and Emotion Analysis (using Social Me-dia Channels) 76

Figure 13 A downtrend candidate (negative slope) TopicXML 76

XI

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F TA B L E S

Table 1 Natural language representation of a sentence 13

Table 2 Rule 1 Admissible configurations 27

Table 3 Rule 2 Admissible configurations 27

Table 4 Rule 3 Admissible configurations 27

Table 5 Rule 4 Admissible configurations 28

Table 6 Rule 5 Admissible configurations 28

Table 7 Case 1 ID completeness 31

Table 8 Case 2 ID completeness 31

Table 9 Showcase 1 Short flow 33

Table 10 Showcase 2 Organisational question 33

Table 11 Showcase 3 Fallback and human-request poli-cies 34

Table 12 Showcase 4 Contextual question 34

Table 13 Showcase 5 Long flow 35

Table 14 General statistics for conversational dataset 46

Table 15 Detailed statistics on possible systems actionsin conversational dataset 48

Table 16 Detailed statistics on possible named entitiesin conversational dataset 48

Table 17 F1-results for the Named Entity Recognition task 51

Table 18 F1-results for the Next Action Prediction task 51

Table 19 Average dialogue accuracy computed for theNext Action Prediction task 51

Table 20 Trend detection List of gold trends 79

Table 21 Trend detection Example of MAP evaluation 80

Table 22 Trend detection Ablation results 82

XII

[ September 4 2020 at 1017 ndash classicthesis ]

acronyms XIII

AI Artificial Intelligence

API Application Programming Interface

BERT Bidirectional Encoder Representations from Transformers

BiLSTM Bidirectional Long Short Term Memory

CRF Conditional Random Fields

DM Dialogue Manager

DST Dialogue State Tracker

DSTC Dialogue State Tracking Challenge

EC Equivalence Classes

ES ElasticSearch

GM Gaussian Models

IAA Inter Annotator Agreement

ID Informational Dictionary

IPA Intelligent Process Automation

ITS Intelligent Tutoring System

LDA Latent Dirichlet Allocation

LSA Latent Semantic Analysis

LSTM Long Short Term Memory

MAP Mean Average Precision

MER Mutually Exclusive Rules

NAP Next Action Prediction

NER Named Entity Recognition

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OMB+ Online Mathematik Bruumlckenkurs

PL Policy Learning

pLSA probabilistic Latent Semantic Analysis

REST Representational State Transfer

RNN Recurrent Neural Network

RPA Robotic Process Automation

RegEx Regular Expressions

TL Transfer Learning

ToDS Task-oriented Dialogue System

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

1I N T R O D U C T I O N

Robotic Process Automation (RPA) is a type of software bots that im-itates manual human activities like entering data into a system log-ging into accounts or executing simple but repetitive workflows (Mo-hanty and Vyas 2018) In other words RPA-systems allow businessapplications to work without human involvement by replicating hu-man worker activities A further benefit of RPA-bots is that they aremuch cheaper compared to the employment of a human worker andthey are also able to work around-the-clock Overall advantages ofsoftware bots are hugely appealing and include cost savings accuracyand compliance while executing processes improved responsiveness(ie bots are faster than human workers) and finally such bots areusually agile and multi-skilled (Ivancic Vugec and Vukšic 2019)

However one of the main benefits and at the same time ndash draw-backs of the RPA-systems is their ability to precisely fulfill the as-signed task On the one hand the bot can carry out the task accu-rately and diligently On the other hand it is highly susceptible tochanges in defined scenarios Being designed for a particular taskthe RPA-bot is often not adaptable to other domains or even simplechanges in a workflow (Mohanty and Vyas 2018) This inability toadapt to changing conditions gave rise to a further area of improve-ment for RPA-bots ndash Intelligent Process Automation (IPA) systems

IPA-bots combine Robotic Process Automation (RPA) with ArtificialIntelligence (AI) and thus can perform complex and more cognitivelydemanding tasks that require reasoning and language understand-ing Hence IPA-bots went beyond automating simple ldquoclick tasksrdquo (asin RPA) and can accomplish jobs more intelligently ndash employing ma-chine learning algorithms and advanced analytics Such IPA-systemsundertake time-consuming and routine tasks and thus enable smartworkflows and free up skilled workers to accomplish higher-valueactivities

Previously IPA techniques were primarily employed within the com-puter vision domain Starting with the automation of simple displayclick-tasks and going towards sophisticated self-driving cars withtheir ability to interpret a stream of situational data obtained fromsensors and cameras However in recent times Natural LanguageProcessing (NLP) became one of the potential applications for IPA aswell due to its ability to understand and interpret human languageNLP methods are used to analyze large amounts of textual data andto respond to various inquiries Furthermore if the available datais unstructured and does not have any predefined format (ie cus-tomer emails) or if it is available in a variable format (ie invoices

1

[ September 4 2020 at 1017 ndash classicthesis ]

2 introduction

legal documents) then the Natural Language Processing (NLP) tech-niques are applied to extract the multiple relevant information Thisdata then can be utilized to solve distinct problems (ie natural lan-guage understanding predictive analytics) (Mohanty and Vyas 2018)For instance in the case of predictive analytics such a bot wouldmake forecasts about the future using current and historical data andtherethrough assist with sales or investments

However NLP in the context of Intelligent Process Automation (IPA)is not limited to the extraction of relevant data and insights from tex-tual documents One of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots) that are used toenable human-to-machine interaction This type of software is com-monly used for customer support or within recommendation- andbooking systems Conversational agents gain enormous demand dueto their ability to simultaneously support a large number of userswhile communicating in a natural language In a conventional chat-bot system a user provides input in a natural language the bot thenprocesses this query to extract potentially useful information and re-sponds with a relevant reply This process comprises multiple stagesand involves different types of NLP subtasks starting with NaturalLanguage Understanding (NLU) (eg intent recognition named en-tity extraction) and going towards dialogue management (ie deter-mining the next possible bots action considering the dialogue his-tory) and response generation (eg converting the semantic repre-sentation of next bots action to a natural language utterance) Typicaldialogue system for Intelligent Process Automation (IPA) purposesusually undertakes shallow customer support requests (eg answer-ing of FAQs) allowing human workers to focus on more sophisticatedinquiries

Natural Language Processing (NLP) techniques may also be usedindirectly for IPA purposes There are attempts to automatically iden-tify and classify tasks extracted from textual process descriptions asmanual user or automated The goal of such an NLP applicationis to reduce the effort required to identify suitable candidates for fur-ther robotic process automation (Friedrich Mendling and Puhlmann2011 Leopold Aa and Reijers 2018)

Intelligent Process Automation (IPA) is an emerging technology andevolves mainly due to the recognized potential benefit of combiningRobotic Process Automation (RPA) and Artificial Intelligence (AI) tech-niques to solve industrial problems (Ivancic Vugec and Vukšic 2019Mohanty and Vyas 2018) Industries are typically interested in be-ing responsive to customers and markets (Mohanty and Vyas 2018)Thus most customer support applications need to be real-time activeand highly robust to achieve higher client satisfaction rates The man-ual accomplishment of such tasks is often time-consuming expensiveand error-prone Furthermore due to the massive amounts of textualdata available for analysis it seems not to be feasible to process thisdata entirely by human means

[ September 4 2020 at 1017 ndash classicthesis ]

11 outline 3

11 outline

This thesis covers two topics united by the broader theme which isIntelligent Process Automation (IPA) utilizing statistical Natural Lan-guage Processing (NLP) methods

the first block of the thesis (Chapter 2 ndash Chapter 4) addressesthe challenge of implementing a conversational agent for Intelli-gent Process Automation (IPA) purposes within the e-learningfield As mentioned above chatbots are one of the central ap-plications of the IPA domain since they can effectively performtime-consuming tasks requiring natural language understand-ing allowing human workers to concentrate on more valuabletasks The IPA conversational bot that was realized within thisthesis takes care of routine and time-consuming tasks performedby human tutors within an online mathematical course Thisbot is deployed in a real-world setting and interact with stu-dents within the OMB+ mathematical platform Conducting ex-periments we observed two possibilities to build the conversa-tional agent in industrial settings ndash first with purely rule-basedmethods considering the missing training data and individualaspects of the target domain (ie e-learning) Second we re-implemented two of the main system components (ie Natu-ral Language Understanding (NLU) and Dialogue Manager (DM)units) using the current state-of-the-art deep-learning algorithm(ie Bidirectional Encoder Representations from Transformers(BERT)) and investigated their performance and potential use asa part of a hybrid model (ie containing both rule-based andmachine learning methods)

the second block of the thesis (Chapter 5 ndash Chapter 6) con-siders an Intelligent Process Automation (IPA) problem withinthe predictive analytics domain and addresses the task of scien-tific trend detection Predictive analytics makes forecasts aboutfuture outcomes based on historical and current data Henceorganizations can benefit through the advanced analytics mod-els by detecting trends and their behaviors and then employ-ing this information while making crucial business decisions(ie investments) In this work we deal with the subproblemof trend detection task ndash specifically we address the lack ofpublicly available benchmarks for the evaluation of trend de-tection algorithms Therefore we assembled the benchmark forboth trends and downtrends (ie topics that become less frequentovertime) detection To the best of our knowledge the task ofdowntrend detection has not been well addressed before Fur-thermore the resulting benchmark is based on a collection ofmore than one million documents which is among the largestthat has been used for trend detection before and therefore of-fers a realistic setting for developing trend detection algorithms

[ September 4 2020 at 1017 ndash classicthesis ]

4 introduction

All main chapters contain their specific introduction contribution listrelated work section and summary

[ September 4 2020 at 1017 ndash classicthesis ]

Part I

E - L E A R N I N G C O N V E R S AT I O N A L A S S I S TA N T

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

2F O U N D AT I O N S

In this block we address the challenge of designing and implement-ing a conversational assistant for Intelligent Process Automation (IPA)purposes within the e-learning domain The system aims to supporthuman tutors of the Online Mathematik Bruumlckenkurs (OMB+) plat-form during the tutoring round by undertaking routine and time-consuming responsibilities Specifically the system intelligently auto-mates the process of information accumulation and validation thatprecede every conversation Therethrough the IPA-bot frees up tutorsand allows them to concentrate on more complicated and cognitivelydemanding tasks

In this chapter we introduce the fundamental concepts required bythe topic of the first part of the thesis We begin with the foundationsto conversational assistants in Section 22 where we present the taxon-omy of existing dialogue systems in Section 221 give an overview ofthe conventional dialogue architecture in Section 222 with its mostessential components and describe the existing approaches for build-ing a conversational agent in Section 223 We finally introduce thetarget domain for our experiments in Section 23

21 introduction

Conversational Assistants - a type of software that interacts with peo-ple via written or spoken natural language have become widely pop-ularized in recent times Todayrsquos conversational assistants support abroad range of everyday skills including but not limited to creatinga reminder answering factoid questions and controlling smart homedevices

Initially the notion of artificial companion capable of conversingin a natural language was anticipated by Alan Turing in 1949 (Tur-ing 1949) Still the first significant step in this direction was donein 1966 with the appearance of Weizenbaums system ELIZA (Weizen-baum 1966) Its successor was the ALICE (Artificial LinguisticInternet Computer Entity) ndash a natural language processing dialoguesystem that conversed with a human by applying heuristical patternmatching rules Despite being remarkably popular and technologi-cally advanced for their days both systems were unable to pass theTuring test (Alan 1950) Following a long silent period the nextwave of growth of intelligent agents was caused by the advances inNatural Language Processing (NLP) This was enabled by access tolow-cost memory higher processing speed vast amounts of available

7

[ September 4 2020 at 1017 ndash classicthesis ]

8 foundations

data and sophisticated machine learning algorithms Hence one ofthe most notable leaps in the history of conversational agents beganin the year 2011 with intelligent personal assistants In that time Ap-ples Siri followed by Microsofts Cortana and Amazons Alexa in 2014and then Google Assistant in 2016 came to the scene The main focusof these agents was not to mimic humans as in ELIZA or ALICEbut to assist a human by answering simple factoid questions and set-ting notification tasks across a broad range of content Another typeof conversational agent called Task-oriented Dialogue System (ToDS)became a focus of industries in the year 2016 For the most part thesebots aimed to support customer service (Cui et al 2017) and respondto Frequently Asked Questions (Liao et al 2018) Their conversa-tional style provided more depth than those of personal assistantsbut a narrower range since they were patterned for task solving andnot for open-domain discussions

One of the central advantages of conversational agents and the rea-son why they are attracting considerable interest is their ability togive attention to several users simultaneously while supporting nat-ural language communication Due to this conversational assistantsgain momentum in numerous kinds of practical support applications(ie customer support) where a high number of users are involved atthe same time Classical engagement of human assistants is very cost-efficient and such support is usually not full-time accessible There-fore to automate human assistance is desirable but also an ambi-tious task Due to the lack of solid Natural Language Understand-ing (NLU) advancing beyond primary tasks is still challenging (Lugerand Sellen 2016) and even the most prosperous dialogue systemsoften fail to meet userrsquos expectations (Jain et al 2018) Significantchallenges involved in the design and implementation of a conversa-tional agent varying from domain engineering to Natural LanguageUnderstanding (NLU) and designing of an adaptive domain-specificconversational flow The quintessential problems and how-to ques-tions while building conversational assistants include

bull How to deal with variability and flexibility of language

bull How to get high-quality in-domain data

bull How to build meaningful representations

bull How to integrate commonsense and domain knowledge

bull How to build a robust system

Besides that current Natural Language Processing (NLP) and Ar-tificial Intelligence (AI) techniques have weaknesses when applied todomain-specific task-oriented problems especially if underlying datacomes from an industrial source As machine learning techniquesheavily rely on structured and cleaned training data to deliver consis-tent performance typically noisy real-world data without access toadditional knowledge databases negatively impacts the classificationaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 9

Despite that conversational agents could be successfully employedto solve less cognitive but still highly relevant tasks In this case adialogue agent could be operated as a part of the Intelligent ProcessAutomation (IPA) system where it takes care of straightforward butroutine and time-consuming functions performed in a natural lan-guage while allowing a human worker to focus on more cognitivedemanding processes

22 conversational assistants

Development and design of a particular dialogue system usually varyby its objectives among the most fundamental are

bull Purpose Should it be an open-ended dialogue system (egchit-chat) task-oriented system (eg customer support) or apersonal assistant (eg Google Assistant)

bull Skills What kind of behavior should a system demonstrate

bull Data Which data is required to develop (resp train) specificblocks and how could it be assembled and curated

bull Deployment Will the final system be integrated into a partic-ular software platform What are the challenges in choosingsuitable tools How will the deployment happen

bull Implementation Should it be a rule-based information-retrievalmachine-learning or a hybrid system

In the following chapter we will take a look at the most commontaxonomies of dialogue systems and different design and implemen-tation strategies

221 Taxonomy

According to the goals conversational agents could be classified intotwo main categories task-oriented systems (eg for customer support)and social bots (eg for chit-chat purposes) Furthermore systemsalso vary regarding their implementation characteristics Below weprovide a detailed overview of the most common variations

task-oriented dialogue systems are famous for their abilityto lead a dialogue with the ultimate goal of solving a userrsquos problem(see Figure 1) A customer support agent or a flightrestaurant book-ing system exemplify this class of conversational assistants Suchsystems are more meaningful and useful as those with an entirely so-cial goal however they usually affect more complications during theimplementation stage Since Task-oriented Dialogue System (ToDS) re-quire a precise understanding of the userrsquos intent (ie goal) the Natu-ral Language Understanding (NLU) segment must perform flawlessly

[ September 4 2020 at 1017 ndash classicthesis ]

10 foundations

Besides that if applied in an industrial setting ToDS are usually builtmodular which means they mostly rely on handcrafted rules and pre-defined templates and are restricted in their abilities Personal assis-tants (ie Google Assistant) are members of the task-oriented familyas well Though compared to standard agents they involve more so-cial conversation and could be used for entertainment purposes (egldquoOk Google tell me a jokerdquo) The latter is typically not available inconventional task-oriented systems

social bots and chit-chat systems in contrast regularly donot hold any specific goal and are designed for small-talk and enter-tainment conversations (see Figure 1) Those agents are usually end-to-end trainable and highly data-driven which means they requirelarge amounts of training data However due to entirely learnabletraining such systems often produce inappropriate responses mak-ing them extremely unreliable for practical applications Alike chit-chat systems social bots are rich in social conversation however theypossess an ability to solve uncomplicated user requests and conducta more meaningful dialogue (Fang et al 2018) Those requests arenot the same as in task-oriented systems and are mostly pointed toanswer simple factoid questions

Figure 1 Taxonomy of dialogue systems according to the measure of theirtask accomplishment and social contribution Figure originatesfrom Fang et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 11

The aforementioned system types could be decomposed farther ac-cording to the implementation characteristics

open domain amp closed domain A Task-oriented Dialogue Sys-tem (ToDS) is typically built within a closed domain setting Thespace of potential inputs and outputs is restricted since the sys-tem attempts to achieve a specific goal Customer support orshopping assistants are cases of closed domain problems (Wenet al 2016) These systems do not need to be able to talk aboutoff-site topics but they have to fulfill their particular tasks asefficiently as possible

Whereas chit-chat systems do not assume a well-defined goalor intention and thus could be built within an open domainThis means the conversation can progress in any direction andwithout any or minimal functional purpose (Serban et al 2016)However the infinite number of various topics and the fact thata certain amount of commonsense is required to create reason-able responses makes an open-domain setting a hard problem

content-driven amp user-centric Social bots and chit-chat sys-tems are typically utilizing a content-driven dialogue mannerThat means that their responses are based on the large and dy-namic content derived from the daily web mining and knowl-edge graphs This approach builds a dialogue based on trendycontent from diverse sources and thus it is an ideal way forsocial bots to catch usersrsquo attention and bring a user into thedialogue

User-centric approaches in turn assume precise language un-derstanding to detect the sentiment of usersrsquo statements Ac-cording to the sentiment the dialogue manager learns usersrsquopersonalities handles rapid topic changes and tracks engage-ment In this case the dialogue builds on specific user interests

long amp short conversations Models with a short conversationstyle attempt to create a single response to a single input (egweather forecast) In the case with a long conversation a systemhas to go through multiple turns and to keep track of what hasbeen said before (ie dialogue history) The longer the conver-sation the more challenging it is to train a system Customersupport conversations are typically long conversational threadswith multiple questions

response generation One of the essential elements of a dialoguesystem is a response generation This could be either done in aretrieval-based manner or by using generative models (ie NaturalLanguage Generation (NLG))

The former usually utilizes predefined responses and heuristicsto select a relevant response based on the input and context Theheuristic could be a rule-based expression match (ie RegularExpressions (RegEx)) or an ensemble of machine learning clas-

[ September 4 2020 at 1017 ndash classicthesis ]

12 foundations

sifiers (eg entity extraction prediction of next action) Suchsystems do not generate any original text instead they select aresponse from a fixed set of predefined candidates These mod-els are usually used in an industrial setting where the systemmust show high robustness performance and interpretabilityof the outcomes

The latter in contrast do not rely on predefined replies Sincesuch models are typically based on (deep-) machine learningtechniques they attempt to generate new responses from scratchThis is however a complicated task and is mostly used forvanilla chit-chat systems or in a research domain where thestructured training data is available

222 Conventional Architecture

In this and the following subsection we review the pipeline and meth-ods for Task-oriented Dialogue System (ToDS) Such agents are typi-cally composed of several components (Figure 2) addressing diversetasks required to facilitate a natural language dialogue (WilliamsRaux and Henderson 2016)

Figure 2 Example of a conventional dialogue architecture with its maincomponents in the bounding box Language Understanding (NLU)Dialogue Manager (DM) Response Generation (NLG) Figure origi-nates from NAACL 2018 Tutorial Su et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 13

A conventional dialogue architecture implements this with the fol-lowing components

natural language understanding unit has the following ob-jectives it parses the user input classifies the domain (ie cus-tomer support restaurant booking not required for a singledomain systems) and intent (ie find-flight book-taxi) and ex-tracts entities (ie Italian food Berlin)

Generally NLU unit maps every user utterance into semanticframes that are predefined according to different scenarios Ta-ble 1 illustrates an example of sentence representation wherethe departure-location (Berlin) and the arrival-location (Munich)are specified as slot values and the domain (airline travel) andintent (find-flight) are specified as well Typically there are twokinds of possible representations The first one is the utterancecategory represented in the form of the userrsquos intent The sec-ond could be the word-level information extraction in the formof Named Entity Recognition (NER) or Slot Filling tasks

Sentence Find flight from Berlin to Munich today

SlotsConcepts O O O B-dept O B-arr B-date

Named Entity O O O B-city O B-city O

Intent Find_Flight

Domain Airline Travel

Table 1 An illustrative example of a sentence representation1

A dialog act classification sub-module also known as intent clas-sification is dedicated to detecting the primary userrsquos goal atevery dialogue state It labels each userrsquos utterance with oneof the predefined intents Deep neural networks can be directlyapplied to this conventional classification problem (KhanpourGuntakandla and Nielsen 2016 Lee and Dernoncourt 2016)

Slot filling is another sub-module for language understandingUnlike intent detection (ie classification problem) slot fillingis defined as a sequence labeling problem where words in thesequence are labeled with semantic tags The input is a se-quence of words and the output is a sequence of slots onefor every word Variations of Recurrent Neural Network (RNN)architectures (LSTM BiLSTM) (Deoras et al 2015) and (attention-based) encoder-decoder architectures (Kurata et al 2016) havebeen employed for this task

[ September 4 2020 at 1017 ndash classicthesis ]

14 foundations

dialogue manager unit is subdivided into two components ndashfirst is a Dialogue State Tracker (DST) that holds the conver-sational state and second is a Policy Learning (PL) that definesthe systems next action

Dialogue State Tracker (DST) is the core component to guaranteea robust dialogue flow since it manages the userrsquos intent at ev-ery turn The conventional methods which are broadly appliedin commercial implementations often adopt handcrafted rulesto pick the most probable candidate (Goddeau et al 1996 Wangand Lemon 2013) among the predefined Though a variety ofstatistical approaches have emerged since the Dialogue StateTracking Challenge (DSTC)2 was arranged by Jason D Williamsin year 2013 Among the deep-learning based approaches isthe Belief Tracker proposed by Henderson Thomson and Young(2013) It employs a sliding window to produce a sequence ofprobability distributions over an arbitrary number of possiblevalues (Chen et al 2017) In the work of Mrkšic et al (2016) au-thors introduced a Neural Belief Tracker to identify the slot-valuepairs The system used the user intents preceding the user in-put the user utterance itself and a candidate slot-value pairwhich it needs to decide about as the input Then the systemiterated overall candidate slot-value pairs to determine whichones have just been expressed by the user (Chen et al 2017)Finally Vodolaacuten Kadlec and Kleindienst proposed a Hybrid Di-alog State Tracker that achieved the state-of-the-art performancefor DSTC2 shared task Their model used a separate-learned de-coder coupled with a rule-based system

Based on the state representation from the Dialogue State Tracker(DST) the PL unit generates the next system action that is thendecoded by the Natural Language Generation (NLG) moduleUsually a rule-based agent is used to initialize the system (Yanet al 2017) and then supervised learning is applied to theactions generated by these rules Alternatively the dialoguepolicy can be trained end-to-end with reinforcement learningto manage the system making policies toward the final perfor-mance (Cuayaacutehuitl Keizer and Lemon 2015)

natural language generation unit transforms the next actioninto a natural language response Conventional approaches toNatural Language Generation (NLG) typically adopt a template-based style where a set of rules is defined to map every nextaction to a predefined natural language response Such ap-proaches are simple generally error-free and easy to controlhowever their implementation is time-consuming and the styleis repetitive and hardly scalable

Among the deep learning techniques is the work by Wen etal (2015) that introduced neural network-based approaches toNLG with an LSTM-based structure Later Zhou and Huang

2Currently Dialog System Technology Challenge

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 15

(2016) adopted this model along with the question informa-tion semantic slot values and dialogue act types to generatemore precise answers Finally Wu Socher and Xiong (2019)proposed Global to Local Memory Pointer Networks (GLMP) ndash anarchitecture that incorporates knowledge bases directly into alearning framework (due to the Memory Networks component)and is able to re-use information from the user input (due tothe Pointer Networks component)

In particular cases dialogue flow can include speech recognitionand speech synthesis units (if managing a spoken natural languagesystem) and connection to third-party APIs to request informationstored in external databases (eg weather forecast) (as depicted inFigure 2)

223 Existing Approaches

As stated above a conventional task-oriented dialogue system is im-plemented in a pipeline manner That means that once the systemreceives a user query it interprets it and acts according to the dia-logue state and the corresponding policy Additionally based on un-derstanding it may access the knowledge base to find the demandedinformation there Finally a system transforms a systemrsquos next ac-tion (and if given an extracted information) into its surface form as anatural language response (Monostori et al 1996)

Individual components (ie Natural Language Understanding (NLU)Dialogue Manager (DM) Natural Language Generation (NLG)) of aparticular conversational system could be implemented using differ-ent approaches starting with entirely rule- and template-based meth-ods and going towards hybrid approaches (using learnable componentsalong with handcrafted units) and end-to-end trainable machine learn-ing methods

rule-based approaches Though many of the latest researchapproaches handle Natural Language Understanding (NLU) and Nat-ural Language Generation (NLG) units by using statistical NaturalLanguage Processing (NLP) models (Bocklisch et al 2017 Burtsevet al 2018 Honnibal and Montani 2017) most of the industriallydeployed dialogue systems still utilise handcrafted features and rulesfor the state tracking action prediction intent recognition and slotfilling tasks (Chen et al 2017 Ultes et al 2017) So most of thePyDial3 framework modules offer rule-based implementations usingRegular Expressions (RegEx) rule-based dialogue trackers (Hender-son Thomson and Williams 2014) and handcrafted policies Alsoin order to map the next action to text Ultes et al (2017) suggest rulesalong with a template-based response generation

3PyDial Framework httpwwwcamdialorgpydial

[ September 4 2020 at 1017 ndash classicthesis ]

16 foundations

The rule-based approach ensures robustness and stable performancethat is crucial for industrial systems that communicate with a largenumber of users simultaneously However it is highly expensive andtime-consuming to deploy a real dialogue system built in that man-ner The major disadvantage is that the usage of handcrafted systemsis restricted to a specific domain and possible adaptation requiresextensive manual engineering

end-to-end learning approaches Due to the recent advanceof end-to-end neural generative models (Collobert et al 2011) manyefforts have been made to build an end-to-end trainable architecturefor conversational agents Rather than using the traditional pipeline(see Figure 2) the end-to-end model is conceived as a single module(Chen et al 2017)

Wen et al (2016) followed by Bordes Boureau and Weston (2016)were among the pioneers who adapted an end-to-end trainable ap-proach for conversational systems Their approach was to treat dia-logue learning as the problem of learning a mapping from dialoguehistories to system responses and to apply an encoder-decoder model(Sutskever Vinyals and Le 2014) to train the entire system The pro-posed model still had its significant limitations (Glasmachers 2017)such as missing policies to learn a dialogue flow and the inability toincorporate additional knowledge that is especially crucial for train-ing a task-oriented agent To overcome the first problem Dhingraet al (2016) introduced an end-to-end reinforcement learning approachthat jointly trains dialogue state tracker and policy learning unitsThe second weakness was addressed by Eric and Manning (2017)who augmented existing recurrent network architectures with a dif-ferentiable attention-based key-value retrieval mechanism over theentries of a knowledge base This approach was inspired by Key-Value Memory Networks (Miller et al 2016) Later on Wu Socherand Xiong (2019) proposed Global to Local Memory Pointer Networks(GLMP) where the authors incorporated knowledge bases directlyinto a learning framework In their model a global memory encoderand a local memory decoder are employed to share external knowl-edge The encoder encodes dialogue history modifies global con-textual representation and generates a global memory pointer Thedecoder first produces a sketch response with empty slots Next ittransfers the global memory pointer to filter the external knowledgefor relevant information and then instantiates the slots via the localmemory pointers

Despite having better adaptability compared to any rule-based sys-tem and being simpler to train end-to-end approaches remain unattain-able for commercial conversational agents that operating on real-world data A well and carefully constructed task-oriented dialoguesystem in an observed domain using handcrafted rules (in NLU andDST units) and predefined responses for NLG still outperforms the

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 17

end-to-end systems due to its robustness (Glasmachers 2017 WuSocher and Xiong 2019)

hybrid approaches Though end-to-end learning is an attrac-tive solution for conversational systems current techniques are data-intensive and require large amounts of dialogues to learn simple be-haviors To overcome this barrier Williams Asadi and Zweig (2017)introduce Hybrid Code Networks (HCNs) which is an ensemble ofretrieval and trainable units System utilizes an Recurrent NeuralNetwork (RNN) for Natural Language Understanding (NLU) blockdomain-specific knowledge that are encoded as software (ie API

calls) and custom action templates used for Natural Language Gen-eration (NLG) unit Compared to existing end-to-end methods theauthors report that their approach considerably reduces the amountof data required for training (Williams Asadi and Zweig 2017) Ac-cording to the authors HCNs achieve state-of-the-art performance onthe bAbI dialog dataset (Bordes Boureau and Weston 2016) and out-perform two commercial customer dialogue systems4

Along with this work Wen et al (2016) propose a neural network-based model that is end-to-end trainable but still modularly con-nected In their work authors treat dialogue as a sequence to se-quence mapping problem (Sutskever Vinyals and Le 2014) aug-mented with the dialogue history and the current database searchoutcome (Wen et al 2016) The authors explain the learning processas follows At every turn the system takes a user input representedas a sequence of tokens and converts it into two internal represen-tations a distributed representation of the intent and a probabilitydistribution over slot-value pairs called the belief state (Young et al2013) The database operator then picks the most probable values inthe belief state to form the search result At the same time the intentrepresentation and the belief state are transformed and combined bya Policy Learning (PL) unit to form a single vector representing thenext system action This system action is then used to generate asketch response token by the token The final system response iscomposed by substituting the actual values of the database entriesinto the sketch sentence structure (Wen et al 2016)

Hybrid models appear to replace the established rule- and template-based approaches currently utilized in an industrial setting The per-formance of the most Natural Language Understanding (NLU) com-ponents (ie Named Entity Recognition (NER) domain and intentclassification) is reasonably high to apply them for the developmentof commercial conversational agents Whereas the Natural LanguageGeneration (NLG) unit could still be realized as a template-based so-lution by employing predefined response candidates and predictingthe next system action employing deep learning systems

4Internal Microsoft Research datasets in customer support and troubleshootingdomains were employed for training and evaluation

[ September 4 2020 at 1017 ndash classicthesis ]

18 foundations

23 target domain amp task definition

MUMIE5 is an open-source e-learning platform for studying and teach-ing mathematics and computer science Online Mathematik Bruumlck-enkurs (OMB+) is one of the projects powered by MUMIE SpecificallyOnline Mathematik Bruumlckenkurs (OMB+)6 is a German online learningplatform that assists students who are preparing for an engineeringor computer science study at a university

The coursersquos central purpose is to assist students in reviving theirmathematical skills so that they can follow the upcoming universitycourses This online course is thematically segmented into 13 partsand includes free mathematical classes with theoretical and practi-cal content Every chapter begins with an overview of its sectionsand consists of explanatory texts with built-in videos examples andquick checks of understanding Furthermore various examinationmodes to practice the content (ie training exercises) and to checkthe understanding of the theory (ie quiz) are available Besidesthat OMB+ provides a possibility to get assistance from a human tu-tor Usually the students and tutors interact via a chat interface inwritten form The language of communication is German

The current dilemma of the OMB+ platform is that the number ofstudents grows significantly every year but it is challenging to findqualified tutors ndash and it is expensive to hire more human tutors Thisresults in a longer waiting period for students until their problemscan be considered and processed In general all student questionscan be grouped into three main categories Organizational questions(ie course certificate) contextual questions (ie content theorem) andmathematical questions (exercises solutions) To assist a student witha mathematical question a tutor has to know the following regularinformation What kind of topic (or subtopic) a student has a problemwith At which examination mode (quiz chapter level training or exer-cise section level training or exercise or final examination) a studentis working right now And finally the exact question number and exactproblem formulation This means that a tutor has to ask the same on-boarding questions every time a new conversation starts This is how-ever very time-consuming and could be successfully solved within anIntelligent Process Automation (IPA) system

The work presented in Chapter 3 and Chapter 4 was enabled by thecollaboration with MUMIE and Online Mathematik Bruumlckenkurs (OMB+)team and addresses the abovementioned problem Within Chapter 3we implemented an Intelligent Process Automation (IPA) dialogue sys-tem with rule- and retrieval-based algorithms that interacts with stu-dents and performs the onboarding This system is multipurposeon the one hand it eases the assistance process for human tutors byreducing repetitive and time-consuming activities and therefore al-lows workers to focus on more challenging tasks Second interacting

5Mumie httpswwwmumienet6httpswwwombplusde

[ September 4 2020 at 1017 ndash classicthesis ]

24 summary 19

with students it augments the resources with structured and labeledtraining data The latter in turn allowed us to re-train specific sys-tem components utilizing machine and deep learning methods Wedescribe this procedure in Chapter 4

We report that the earlier collected human-to-human data from theOMB+ platform could not be used for training a machine learning sys-tem since it is highly unstructured and contains no direct question-answer matching which is crucial for trainable conversational algo-rithms We analyzed these conversations to get insight from themthat we used to define the dialogue flow

We also do not consider the automatization of mathematical ques-tions in this work since this task would require not only preciselanguage understanding and extensive commonsense knowledge butalso the accurate perception of mathematical notations The tutor-ing process at OMB+ differentiates a lot from the standard tutoringmethods Here instructors attempt to navigate students by givinghints and tips instead of providing an out-of-the-box solution Exist-ing human-to-human dialogues revealed that such kind of task couldbe even challenging for human tutors since it is not always directlyperceptible what precisely the student does not understand or hasdifficulties with Furthermore dialogues containing mathematical ex-planations could last for a long time and the topic (ie intent andfinal goal) can shift multiple times during the conversation

To our knowledge this task would not be feasible to sufficientlysolve with current machine learning algorithms especially consider-ing the missing training data A rule-based system would not bethe right solution for this purpose as well due to a large number ofcomplex rules which would need to be defined Thus in this workwe focus on the automatization of less cognitively demanding taskswhich are still relevant and must be solved in the first instance to easethe tutoring process for instructors

24 summary

Conversational agents are a highly demanding solution for indus-tries due to their potential ability to support a large number of usersmulti-threaded while performing a flawless natural language conver-sation Many messenger platforms are created and custom chatbotsin various constellations emerge daily Unfortunately most of thecurrent conversational agents are often disappointing to use due tothe lack of substantial natural language understanding and reason-ing Furthermore Natural Language Processing (NLP) and ArtificialIntelligence (AI) techniques have limitations when applied to domain-specific and task-oriented problems especially if no structured train-ing data is available

Despite that conversational agents could be successfully applied tosolve less cognitive but still highly relevant tasks ndash that is within the

[ September 4 2020 at 1017 ndash classicthesis ]

20 foundations

Intelligent Process Automation (IPA) domain Such IPA-bots wouldundertake tedious and time-consuming responsibilities to free up theknowledge worker for more complex and intricate tasks IPA-botscould be seen as a member of goal-oriented dialogue family and arefor the most part performed either in a rule-based manner or throughhybrid models when it comes to a real-world setting

[ September 4 2020 at 1017 ndash classicthesis ]

3M U LT I P U R P O S E I PA V I A C O N V E R S AT I O N A LA S S I S TA N T

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research describedin this chapter was carried out in its entirety by the au-thor of this thesis The other authors of the publicationacted as advisers At the time of writing this thesis thesystem described in this work was deployed at the OnlineMathematik Bruumlckenkurs (OMB+) platform

As we outlined in the introduction Intelligent Process Automa-tion (IPA) is an emerging technology with the primary goal of assist-ing the knowledge worker by taking care of repetitive routine andlow-cognitive tasks Conversational assistants that interact with usersin a natural language are a potential application for the IPA domainSuch smart virtual agents can support the user by responding to var-ious inquiries and performing routine tasks carried out in a naturallanguage (ie customer support)

In this work we tackle the challenge of implementing an IPA con-versational agent in a real-world industrial context and conditions ofmissing training data Our system has two meaningful benefits Firstit decreases monotonous and time-consuming tasks and hence letsworkers concentrate on more intellectual processes Second interact-ing with users it augments the resources with structured and labeledtraining data The latter in turn can be used to retrain a systemutilizing machine and deep learning methods

31 introduction

Conversational dialogue systems can give attention to several userssimultaneously while supporting natural communication They arethus exceptionally needed for practical assistance applications wherea high number of users are involved at the same time Regularly di-alogue agents require a lot of natural language understanding andcommonsense knowledge to hold a conversation reasonably and in-telligently Therefore nowadays it is still challenging to substitutehuman assistance with an intelligent agent completely However adialogue system could be successfully employed as a part of the In-telligent Process Automation (IPA) application In this case it wouldtake care of simple but routine and time-consuming tasks performed

21

[ September 4 2020 at 1017 ndash classicthesis ]

22 multipurpose ipa via conversational assistant

in a natural language while allowing a human worker to focus onmore cognitive demanding processes

Recent research in the dialogue generation domain is conducted byemploying Artificial Intelligence (AI) techniques like machine- anddeep learning (Lowe et al 2017 Wen et al 2016) However thosemethods require high-quality data for training that is often not di-rectly available for domain-specific and industrial problems Eventhough companies gather and store vast volumes of data it is of-ten low-quality with regards to machine learning approaches (Daniel2015) Especially if it concerns dialogue data which has to be prop-erly structured as well annotated Whereas if trained on artificiallycreated datasets (which is often the case in the research setting) thesystems can solve only specific problems and are hardly adaptableto other domains Hence despite the popularity of deep learningend-to-end models one still needs to rely on traditional pipelines inpractical dialogue engineering mainly while introducing a new do-main

311 Outline and Contributions

This work addresses the challenge of implementing a dialogue systemfor IPA purposes within the practical e-learning domain under condi-tions of missing training data The chapter is structured as followsIn Section 32 we introduce our method and describe the design ofthe rule-based dialogue architecture with its main components suchas Natural Language Understanding (NLU) Dialogue Manager (DM)and Natural Language Generation (NLG) units In Section 33 wepresent the results of our dual-focused evaluation and Section 34 de-scribes the format of collected structured data Finally we concludein Section 35

Our contributions within this work are as follows

bull We implemented a robust conversational system for IPA pur-poses within a practical e-learning domain and under the con-ditions of missing training (ie dialogue) data

bull The system has two main objectives

ndash First it reduces repetitive and time-consuming activitiesand allows workers of the e-learning platform to focussolely on mathematical inquiries

ndash Second by interacting with users it augments the resourceswith structured and partially labeled training data for fur-ther implementation of trainable dialogue components

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 23

32 model

The central purpose of the introduced system is to interact with stu-dents at the beginning of every chat conversation and gather informa-tion on the topic (and sub-topic) examination mode and level questionnumber and exact problem formulation Such onboarding process savestime for human tutors and allows them to handle mathematical ques-tions solely Furthermore the system is realized in a way such that itaccumulates labeled and structured dialogues in the background

Figure 4 displays the intact conversation flow After the systemaccepts user input it analyzes it at different levels and extracts infor-mation if such is provided If some of the information required fortutoring is missing the system asks the student to provide it Whenall the information points are collected they are automatically vali-dated and forwarded to a human tutor The latter can then directlyproceed with the assistance process In the following we explain thekey components of the system

321 OMB+ Design

Figure 3 illustrates the internal structure and design of the OMB+ plat-form It has topics and sub-topics as well as four various examinationmodes Each topic (Figure 3 tag 1) correspond to a chapter level and al-ways has sub-topics (Figure 3 tag 2) that correspond to a section levelExamination modes training and exercise are ambiguous because theycorrespond to either a chapter (Figure 3 tag 3) or a section (Figure 3tag 5) level and it is important to differentiate between them sincethey contain different types of content The mode final examination(Figure 3 tag 4) always corresponds to a chapter level whereas quiz(Figure 3 tag 5) can belong only to a section level According to thedesign of the OMB+ platform there are several ways of how a possibledialogue flow can proceed

322 Preprocessing

In a natural language conversation one may respond in various formsmaking the extraction of data from user-generated text challengingdue to potential misspellings or confusable spellings (ie Aufgabe 1aAufgabe 1 (a)) Hence to facilitate a substantial recognition of intentsand extraction of entities we normalize (eg correct misspellingsdetect synonyms) and preprocess every user input before moving for-ward to the Natural Language Understanding (NLU) module

[ September 4 2020 at 1017 ndash classicthesis ]

24 multipurpose ipa via conversational assistant

Figure 3 OMB+ Online Learning Platform where 1 is the Topic (corre-sponds to a chapter level) 2 is a Sub-Topic (corresponds to a sectionlevel) 3 is chapter level examination mode 4 is the Final Exam-ination Mode (available only for chapter level) 5 are the Examina-tion Modes Exercise Training (available at section levels) and Quiz(available only at section level) and 6 is the OMB+ Chat

We perform both linguistic and domain-specific preprocessing whichinclude the following steps

bull lowercasing and stemming of words in the input query

bull removal of stop words and punctuation

bull all mentions of X in mathematical formulations are removed toavoid confusion with roman number 10 (ldquoXrdquo)

bull in a combination of the type word ldquoKapitelAufgaberdquo + digitwritten as a word (ie ldquoeinsrdquo ldquozweitesrdquo etc) word is replacedwith a digit (ldquoKapitel einsrdquo rarr ldquoKapitel 1rdquo ldquoim fuumlnften Kapitelrdquo rarrldquoim Kapitel 5rdquo) roman numbers are replaced with a digit as well(ldquoKapitel IVrdquorarr ldquoKapitel 4rdquo)

bull detected ambiguities are normalized (eg ldquoTrainingsaufgaberdquorarrldquoTrainingrdquo)

bull recognized misspellings resp type errors are corrected (egldquoDifeernzialrechnungrdquorarr ldquoDifferentialrechnungrdquo)

bull permalinks are parsed and analyzed From each permalink it ispossible to extract topic examination mode and question num-ber

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 25

323 Natural Language Understanding

We implement the Natural Language Understanding (NLU) unit uti-lizing handcrafted rules Regular Expressions (RegEx) and Elasctic-search1 API

intent classification According to the design of the OMB+platform all student questions can be classified into three cate-gories organizational questions (ie course certificate) contextualquestions (ie content theorem) and mathematical questions (exer-cises solutions) To classify the input query by its intent (ie cat-egory) we employ weighted key-word information in the formof handcrafted rules We assume that specific words are explic-itly associated with a corresponding intent (eg theorem or rootdenote the mathematical question feedback or certificate denoteorganizational inquire) If no intent could be ordered then itis assumed that the NLU unit was not capable of understand-ing and the intent is unknown In this case the virtual agentrequests the user to provide an intent manually by picking onefrom the mentioned three options The queries from organiza-tional and theoretical categories are directly handed over to ahuman tutor while mathematical questions are analyzed by theautomated agent for further information extraction

entity extraction On the next step the system retrieves the en-tities from a user message We specify five following prerequi-site entities which have to be collected before the dialogue canbe handed over to a human tutor topic sub-topic examina-tion mode and level and question number This part is imple-mented using ElasticSearch (ES) and RegEx To facilitate the useof ES we indexed the OMB+ web-page to an internal databaseBesides indexing the titles of topics and sub-topics we also pro-vided supplementary information on possible synonyms andwriting styles We additionally filed OMB+ permalinks whichdirect to the site pages To query the resulting database we em-ploy the internal Elasticsearch multi match function and set theminimum should match parameter to 20 This parameter spec-ifies the number of terms that must match for a document tobe considered relevant Besides that we adopted fuzziness withthe maximum edit distance set to 2 characters The fuzzy queryuses similarity based on Levenshtein edit distance (Levenshtein1966) Finally the system produces a ranked list of potentialmatching entries found in the database within the predefinedrelevance threshold (we set it to θ=15) We then pick the mostprobable entry as the right one and select the correspondingentity from the user input

Overall the Natural Language Understanding (NLU) module re-ceives the user input as a preprocessed text and examines it across all

1httpswwwelasticcoproductselasticsearch

[ September 4 2020 at 1017 ndash classicthesis ]

26 multipurpose ipa via conversational assistant

predefined RegEx statements and for a match in the ElasticSearch (ES)database We use ES to retrieve entities for the topic sub-topic andexamination mode whereas RegEx is used to extract the informationon a question number Every time the entity is extracted it is filled inthe Informational Dictionary (ID) The ID has the following six slotsto be filled in topic sub-topic examination level examination modequestion number and exact problem formulation (this information isrequested on further steps)

324 Dialogue Manager

A conventional Dialogue Manager (DM) consists of two interactingcomponents The first component is the Dialogue State Tracker (DST)that maintains a representation of the current conversational stateand the second is the Policy Learning (PL) that defines the next systemaction (ie response)

In our model every agentrsquos next action is determined by the stateof the previously obtained information accumulated in the Informa-tional Dictionary (ID) For instance if the system recognizes that thestudent works on the final examination it also knows (defined by hand-coded logic in the predefined rules) that there is no necessity to askfor sub-topic because the final examination always corresponds to achapter level (due to the layout of OMB+ platform) If the system rec-ognizes that the user struggles in solving quiz it has to ask for boththe corresponding topic and sub-topic because the quiz always relatesto a section level

To discover all of the potential conversational flows we implementMutually Exclusive Rules (MER) which indicate that two events e1and e2 are mutually exclusive or disjoint since they cannot both oc-cur at the same time (ie the intersection of these events is emptyP(A cap B) = 0) Additionally we defined transition and mapping rulesHence only particular entities can co-occur while others must beexcluded from the specific rule combination Assume a topic t =

Geometry According to MER this topic can co-occur with one of thepertaining sub-topics defined at the OMB+ (ie s = Angel) Thus thelevel of the examination would be l = Section (but l 6= Chapter)because the student already reported a sub-section he or she is work-ing on In this specific case the examination mode could either bee isin [TrainingExerciseQuiz] but not e = Final_ExaminationThis is because e = Final_Examination can co-occur only with atopic (chapter-level) but not with a sub-topic (section-level) The ques-tion number q could be arbitrary and has to co-occur with every othercombination

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 27

This allows the formal solution to be found Assume a list of alltheoretically possible dialogue states S = [topic sub-topic trainingexercise chapter level section level quiz final examination questionnumber] and for each element sn in S is true that

sn =

1 if information is given

0 otherwise(1)

This gives all general (resp possible) dialogue states without ref-erence to the design of the OMB+ platform However to make thedialogue states fully suitable for the OMB+ from the general stateswe select only those which are valid To define the validness of thestate we specify the following five MERrsquos

R1 Topic

not T T

Table 2 Rule 1 ndash Admissible topic configurations

Rule (R1) in Table 2 denotes admissible configurations for topic andmeans that the topic could be either given (T ) or not (notT )

R2 Examination Mode

not TR not E not Q not FE

TR not E not Q not FE

not TR E not Q not FE

not TR not E Q not FE

not TR not E not Q FE

Table 3 Rule 2 ndash Admissible examination mode configurations

Rule (R2) in Table 3 denotes that either no information on the ex-amination mode is given or examination mode is Training (TR) orExercise (E) or Quiz (Q) or Final Examination (FE) but not more thanone mode at the same time

R3 Level

not CL not SL

CL not SL

not CL SL

Table 4 Rule 3 ndash Admissible level configurations

Rule (R3) in Table 4 indicates that either no level information isprovided or the level corresponds to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

[ September 4 2020 at 1017 ndash classicthesis ]

28 multipurpose ipa via conversational assistant

R4 Examination amp Level

not TR not E not SL not CL

TR not E SL not CL

TR not E not SL CL

TR not E not SL not CL

not TR E SL not CL

not TR E not SL CL

not TR E not SL not CL

Table 5 Rule 4 ndash Admissible examination mode (only for Training and Exer-cise) and corresponding level configurations

Rule (R4) in Table 5 means that Training (TR) and Exercise (E) ex-amination modes can either belong to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

R5 Topic amp Sub-Topic

not T not ST

T not ST

not T ST

T ST

Table 6 Rule 5 ndash Admissible topic and corresponding sub-topic configura-tions

Rule (R5) in Table 6 symbolizes that we could be either given onlya topic (T ) or the combination of topic and sub-topic (ST ) at the sametime or only sub-topic or no information on this point at all

We then define a valid dialogue state as a dialogue state that meetsall requirements of the abovementioned rules

foralls(State(s)and R1minus5(s))rarr Valid(s) (2)

Once we have the valid states for dialogues we perform a mappingfrom each valid dialogue state to the next possible systems action Forthat we first define five transition rules The order of transition rulesis important

T1 = notT (3)

means that no topic (T ) is found in the ID (ie could not be ex-tracted from user input)

T2 = notEM (4)

indicates that no examination mode (EM) is found in the ID

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 29

T3 = EM where EM isin [TRE] (5)

denotes that the extracted examination mode (EM) is either Train-ing (TR) or Exercise (E)

T4 =

T notST TR SL

T notST E SL

T notST Q

(6)

means that no sub-topic (ST ) is provided by a user but ID either al-ready contains the combination of topic (T ) training (TR) and sectionlevel (SL) or the combination of topic exercise (E) and section levelor the combination of topic and quiz (Q)

T5 = notQNR (7)

indicates that no question number (QNR) was provided by a stu-dent (or could not be successfully extracted)

Finally we assumed the list of possible next actions for the system

A =

Ask for Topic

Ask for Examination Mode

Ask for Level

Ask for Sub-Topic

Ask for Question Nr

(8)

Following the transition rules we map each valid dialogue state tothe possible next action am in A

existaforalls(Valid(s)and T1(s))rarr a(s T) (9)

in the case where we do not have any topic provided the nextaction is to ask for the topic (T )

existaforalls(Valid(s)and T2(s))rarr a(s EM) (10)

if no examination mode is provided by a user (or it could not besuccessfully extracted from the user query) the next action is definedas ask for examination mode (EM)

existaforalls(Valid(s)and T3(s))rarr a(s L) (11)

in case where we know the examination mode EM isin [TrainingExercise] we have to ask about the level (ie training at chapter levelor training at section level) thus the next action is ask for level (L)

[ September 4 2020 at 1017 ndash classicthesis ]

30 multipurpose ipa via conversational assistant

existaforalls(Valid(s)and T4(s))rarr a(s ST) (12)

if no sub-topic is provided but the examination mode EM isin [Train-ing Exercise] at section level the next action is defined as ask for sub-topic (ST )

existaforalls(Valid(s)and T5(s))rarr a(s QNR) (13)

if no question number is provided by a user then the next action isask for question number (QNR)

Following this scheme we generate 56 state transitions which spec-ify the next system actions Being on a new conversation state wecompare the extracted (or updated) data in the Informational Dictio-nary (ID) with the valid dialogue states and select the mapped actionas the next systemrsquos action

flexibility of dst The DST transitions outlined above are meantto maintain the current configuration of the OMB+ learning platformStill further MERs could be effortlessly generated to create new tran-sitions Examining this we performed tests with the previous de-sign of the platform where all the topics except for the first onehad sub-topics Whereas the first topic contained both sub-topics andsub-sub-topics We could produce the missing transitions with the de-scribed approach The number of permissible transitions in this caseincreased from 56 to 117

325 Meta Policy

Since the proposed system is expected to operate in a real-world set-ting we had to develop additional policies that manage the dialogueflow and verify the systemrsquos accuracy We explain these policies be-low

completeness of informational dictionary The model au-tomatically verifies the completeness of the Informational Dic-tionary (ID) which is determined by the number of essentialslots filled in the informational dictionary There are 6 discretecases when the ID is deemed to be complete For example ifa user works on a final examination mode the agent does nothas to request a sub-topic or examination level Hence the ID

has to be filled only with data for a topic examination typeand question number Whereas if the user works on trainingmode the system has to collect information about the topic sub-topic examination level examination mode and the questionnumber Once the ID is intact it is provided to the verificationstep Otherwise the agent proceeds according to the next ac-tion The system extracts the information in each dialogue stateand therefore if the user gives updated information on any sub-

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 31

ject later in the dialogue history the corresponding slot will beupdated in the ID

Below are examples of two final cases (out of six) where Infor-mational Dictionary (ID) is considered to be complete

Case1 =

intent = Math

topic 6= None

exam mode = Final Examination

level = None

question nr 6= None

(14)

Table 7 Case 1 ndash Any topic examination mode is final examination exami-nation level does not matter any question number

Case2 =

intent = Math

topic 6= None

sub-topic 6= None

exam mode isin [Training Exercise]

level = Section

question nr 6= None

(15)

Table 8 Case 2 Any topic any related sub-topic examination mode is ei-ther training or exercise examination level is section any questionnumber

verification step After the system has collected all the essentialdata (ie ID is complete) it continues to the final verification stepHere the obtained data is presented to the current student inthe dialogue session The student is asked to review the exactnessof the data and if any entries are faulty to correct them TheInformational Dictionary (ID) is where necessary updated withthe user-provided data This procedure repeats until the userconfirms the correctness of the compiled data

fallback policy In the cases where the system fails to retrieveinformation from a studentrsquos query it re-asks a user and at-tempts to retrieve information from a follow-up response Themaximum number of re-ask trials is a parameter and set to threetimes (r = 3) If the system is still incapable of extracting in-formation the user input is considered as the ground truth andfilled to the appropriate slot in ID An exception to this ruleapplies where the user has to specify the intent manually Afterthree unclassified trials a dialogues session is directly handedover to a human tutor

[ September 4 2020 at 1017 ndash classicthesis ]

32 multipurpose ipa via conversational assistant

human request In every dialogue state a user can shift to a hu-man tutor For this a user can enter the ldquohumanrdquo (or ldquoMenschrdquoin German language) key-word Therefore every user messageis investigated for the presence of this key-word

326 Response Generation

The semantic representation of the systems next action is transformedinto a natural language utterance Hence each possible action ismapped to precisely one response that is predefined in the templateSome of the responses are fixed (ie ldquoWelches Kapitel bearbeitest dugeraderdquo) 2 Others have placeholders for custom values In the lattercase the utterance can be formulated dependent on the InformationalDictionary (ID)

Assume predefined response ldquoDeine Angaben waren wie folgt a)Kapitel K b) Abschnitt A c) Aufgabentyp M d) Aufgabe Q e)Ebene E Habe ich dich richtig verstanden Bitte antworte mit Ja oderNeinrdquo3 The slots [KAMQE] are the placeholders for the informa-tion stored in the ID During the conversation the system selects thetemplate according to the next action and fills the slots with extractedvalues if required After the extracted information is filled in the cor-responding placeholders in the utterance the final response could beas follows ldquoDeine Angaben waren wie folgt a) Kapitel I ElementaresRechnen b) Abschnitt Rechenregel fuumlr Potenzen c) Aufgabentyp Quiz d)Aufgabe 1(a) e) Ebene Abschnitt Habe ich dich richtig verstanden Bitteantworte mit Ja oder Neinrdquo4

Below we demonstrate five sample dialogues in original Germanlanguage with variable flows

bull short dialogue where certain information is already providedin the first studentrsquos input

bull organizational question case

bull contextual question case

bull example with the manual intent classification

bull example with fallback and human policies

bull long flow with entry correction on the final verification phase

2Translates to ldquoWhich chapter are you currently working onrdquo3Translates to ldquoYour details were as follows a) Chapter b) Section c) Examina-

tion Mode d) Question Number e) Level Did I understand you correctly Pleaseanswer yes or no

4Translates to ldquoYour details were as follows a) Elementary Calculus b) PowerLaws c) Quiz d) Question 1 (a) e) Section Level Did I understand you correctlyPlease answer yes or no

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 33

Student Hallo ich habe Schwierigkeiten beim Kapitel I Elementares Rech-nen mit der Uumlbungsaufgabe 1 a

Next-Action [Ask for a Level Chapter or Section]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel ldquoGeometrierdquo) oder um eine Aufgabe auf der Ebene einesAbschnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt ldquoWinkelrdquoim Kapitel ldquoGeometrierdquo) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlrAbschnitt

Student Kapitel

Next-Action [Final-Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Aufgabentyp Exercisec) Aufgabe 1ad) Ebene KapitelHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yep

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 9 Showcase 1 Short flow most of the information was extracted fromthe initial user input In bold are the extracted entities

Student Hallo ich habe eine Frage bzgl eines Zertifikates und zwar wiekann ich es austellen lassen

Next-Action [Organisational Question]Bot Es scheint sich wohl um eine organisatorische Frage zu handeln Bittefasse deine Frage in einem kurzen Text zusammen damit ich sie an meinenmenschlichen Kollegen weiterleiten kann

Student Wie bekomme ich ein Zertifikat

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 10 Showcase 2 Organisational Question Underlined are the key-words which point on the organisational intent

[ September 4 2020 at 1017 ndash classicthesis ]

34 multipurpose ipa via conversational assistant

Student hallo

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student ma

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student math

Next-Action [Ask for Topic]Bot Welches Kapitel bearbeitest du gerade

Student Lineare Gleichungssysteme

Next-Action [Ask for Examination Mode]Bot Bearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfung

Student human

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 11 Showcase 3 Manual intent classification Fallback and Human Re-quest policies (in bold type)

Student Halli-Hallo koumlnnte mir bitte jemand die Regel 72 (vertikale streck-ung und stauchung des graphen) erklaumlren Danke im Voraus

Next-Action [Contextual Question]Bot Es scheint sich wohl um eine inhaltliche Frage zu handeln Bitte fassedeine Frage in einem kurzen Text zusammen damit ich sie an meinen men-schlichen Kollegen weiterleiten kann

Student Ich habe Schwierigkeiten beim Verstehen der Regel 72 Was istgenau die vertikale Streckung des Graphen

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 12 Showcase 4 Contextual Question Underlined are the key-wordswhich point on the contextual intent

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 35

Student Hallo ich kann beim Kapitel 1 nicht weiterkommen Trainingsauf-gabe 1 (a)

Next-Action [Ask for Level]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel Geometrie) oder um eine Aufgabe auf der Ebene eines Ab-schnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt Winkel imKapitel Geometrie) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlr Ab-schnitt

Student ich glaube es ist eine Sektion

Next-Action [Ask for Sub-Topic]Bot Wie heiszligt der Abschnitt in dem sich die Aufgabe befindet auf die sichdeine Frage bezieht

Student ich bearbeite rechenregeln und potenzen

Next-Action [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Quizd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student nope

Next-Action [Ask_For]Bot Welche der folgenden Punkten habe ich falsch erkannt Schreibe bittewelche der folgenden Punkte es sind a b c d e

Student c

State-Transition [Correct-Fields]Bot Gib bitte die korrekte Information fuumlr c) ein

Student Training

State-Transition [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Trainingd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yes

State-Transition [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 13 Showcase 5 Long Flow Correction of entries In bold type are theextracted entities

[ September 4 2020 at 1017 ndash classicthesis ]

36 multipurpose ipa via conversational assistant

Figure 4 Dialogue Flow Abbreviations UNK - unknown ID - informa-tional dictionary RegEx - regular expressions

[ September 4 2020 at 1017 ndash classicthesis ]

33 evaluation 37

33 evaluation

In order to get the feedback on the quality functionality and use-fulness of the proposed model we evaluated it in two steps firstwith an automated method (Section 331) utilizing 130 manually an-notated conversations to verify the robustness of the system Secondwith the help of human tutors (Section 332) from OMB+ to exam-ine the user experience We describe the details as well as the mostcommon errors below

331 Automated Evaluation

To manage an automated evaluation we manually assembled a datasetwith 130 conversational flows We predefined viable initial user ques-tions user answers as well as gold system responses These flowscover most frequent dialogues that we previously observed in human-to-human dialogue data We examined our system by implementinga self-chatting evaluation bot The evaluation cycle is as follows

bull The system accepts the first question from a predefined dia-logue via an API request preprocesses and analyzes it to de-termine the intent and extract entities

bull Then it estimates the appropriate next action and responds ac-cordingly

bull This response is then compared to the systems gold answerIf the predicted result is correct then the system receives thenext predefined user input from the templated dialogue and re-sponds again as defined above This procedure proceeds untilthe dialogue terminates (ie ID is complete) Otherwise thesystem reports an unsuccessful case number

The system successfully passed this evaluation for all 130 cases

332 Human Evaluation amp Error Analysis

To evaluate the system on irregular cases we carried out experimentswith human tutors from the OMB+ platform The tutors are experi-enced regarding rare or complex issues ambiguous responses mis-spellings and other infrequent but still relevant problems which oc-cur during a natural language dialogue In the following we reviewsome common errors and describe additional observations

misspellings and confusable spelling frequently befall in theuser-generated text Since we attempt to let the conversationremain natural to provide better user experience we do notinstruct students for formal writing Therefore we have to ac-count for various writing issues One of the prevalent problems

[ September 4 2020 at 1017 ndash classicthesis ]

38 multipurpose ipa via conversational assistant

are misspellings German words are commonly long and canbe complex and because users often type quickly it can lead tothe wrong order of characters within a given word To tacklethis challenge we utilized fuzzy match within ES Though themaximum allowed edit distance in ElasticSearch (ES) is set to 2characters which means that all the misspellings beyond thisthreshold could not be correctly identified by ES (eg Differenzial-rechnung vs Differnetialrechnung) Another representative exam-ple would be the formulation of the section or question num-ber The equivalent information can be communicated in sev-eral discrete ways which has to be considered by RegEx unit(eg Aufgabe 5 a Aufgabe V a Aufgabe 5 (a)) A similar prob-lem occurs with confusable spelling (ie Differenzialrechnung vsDifferenzialgleichung)

We analyzed the cases stated above and improved our modelby adding some of the most common misspellings to the Elas-ticSearch (ES) database or handling them with RegEx during thepreprocessing step

elasticsearch threshold We observed both cases where thesystem failed to extract information although the user providedit and examples where ES extracted information not mentionedin a user query According to our understanding these errorsoccur due to the relevancy scoring algorithm of ElasticSearch (ES)where a documentrsquos score is a combination of textual similarityand other metadata based scores Our examination revealedthat ES mostly fails to extract the information if the user mes-sage is rather short (eg about 5 tokens) To overcome this diffi-culty we coupled the current input ut with the dialogue history(h = utmiddotmiddotmiddot tminus2) That eliminated the problem and improved theretrieval quality

To solve the opposite problem where ES extracts false infor-mation (or information that was not mentioned in a query)was more challenging We learned that the problem comesfrom short words or sub-words (ie suffix prefix) which ES

locates in a database and considers them credible enough TheES documentation suggests getting rid of stop words to elim-inate this behavior However this did not improve the searchAlso fine-tuning of ES parameters such as the relevance thresholdprefix length5 and minimum should match6 parameter did not re-sult in notable improvements To cope with this problem weimplemented a verification step where a user can edit the erro-neously retrieved data

5The amount of fuzzified characters This parameter helps to decrease the num-ber of terms which must be reviewed

6Indicates a number of terms that must match for a document to be consideredrelevant

[ September 4 2020 at 1017 ndash classicthesis ]

34 structured dialogue acquisition 39

34 structured dialogue acquisition

As we stated the implemented system not only supports the humantutor by assisting students but also collects structured and labeledtraining data in the background In a trial run of the rule-based sys-tem we accumulated a toy-dataset with dialogues The assembleddataset includes the following

bull Plain dialogues with unique dialogue-indexes

bull Plain ID information collected for the whole conversation

bull Pairs of questions (ie user requests) and responses (ie botreplies) with the unique dialogue- and turn-indexes

bull Triples in the form of (Question Next Action Response) Informa-tion on the next systemrsquos action could be employed to train aDM unit with (deep-) machine learning algorithms

bull On every conversational state we keep the entities that the sys-tem was able to extract along with their position in the utter-ance This information could be used to train a custom domain-specific NER model

35 summary

In this work we realized a conversational agent for IPA purposesthat addresses two problems First it decreases repetitive and time-consuming activities which allows workers of the e-learning plat-form to focus solely on mathematical and hence cognitively demand-ing questions Second by interacting with users it augments theresources with structured and labeled training data The latter canbe utilized for further implementation of learnable dialogue compo-nents

The realization of such a system was connected with multiple chal-lenges Among others were missing structured conversational dataambiguous or erroneous user-generated text and the necessity todeal with existing corporate tools and their design

The proposed conversational agent enabled us to accumulate struc-tured labeled data without any special efforts from the human (ietutors) side (eg manual annotation and post-processing of existingdata change of the conversational structure within the tutoring pro-cess) Once we collected structured dialogues we were able to re-train particular segments of the system with deep learning methodsand achieved consistent performance for both of the proposed tasksWe report the re-implemented elements in Chapter 4

At the time of writing this thesis the system described in this workwas deployed at the OMB+ platform

[ September 4 2020 at 1017 ndash classicthesis ]

40 multipurpose ipa via conversational assistant

36 future work

Possible extensions of our work are

bull In this work we implemented a rule-based dialogue modelfrom scratch and adjusted it for the specific domain (ie e-learning) The core of the model ndash is a dialogue manager thatdetermines the current state of the conversation and the possi-ble next action Rule-based systems are generally consideredto be hardly adaptable to new domains however our dialoguemanager proved to be flexible to slight modifications in a work-flow One of the possible directions of future work would bethus the investigation of the general adaptability of the dialoguemanager core to other scenarios and domains (eg differentcourse)

bull A further possible extension would we the introduction of dif-ferent languages The current model is implemented for the Ger-man language but the OMB+ platform provides this course alsoin English and Chinese Therefore it would be interesting toinvestigate whether it is possible to adjust the existing systemto other languages without drastic changes in the model

[ September 4 2020 at 1017 ndash classicthesis ]

4D E E P L E A R N I N G R E - I M P L E M E N TAT I O N O F U N I T S

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research described inthis chapter was carried out in its entirety by the authorof this thesis The other authors of the publication actedas advisors

In this chapter we address the challenge of re-implementation oftwo central components of our IPA dialogue system described in Chap-ter 3 by employing deep learning techniques Since the data for thetraining was collected in a (short) trial-run of the rule-based systemit is not sufficient to train a neural network from scratch Thereforewe utilize Transfer Learning (TL) methods and employ the previouslypre-trained Bidirectional Encoder Representations from Transform-ers (BERT) model that we fine-tune for two of our custom tasks

41 introduction

Data mining and machine learning technologies have gained notableadvancement in many research fields including but not limited tocomputer vision robotics advanced analytics and computational lin-guistics (Pan and Yang 2009 Trautmann et al 2019 Werner et al2015) Besides that in recent years deep neural networks with theirability to accurately map from inputs to outputs (ie labels imagessentences) while analyzing patterns in data became a potential solu-tion for many industrial automatization tasks (eg fake news detec-tion sentiment analysis)

Nevertheless conventional supervised models still have limitationswhen applied to real-world data and industrial tasks The first chal-lenge here is a training phase since a robust model requires an im-mense amount of structured and labeled data Still there are just afew cases where such data is publicly available (eg research datasets)but for most of the tasks domains and languages the data has tobe gathered over a long time The second hurdle is that if trainedon existing data from a distinct domain a supervised model cannotgeneralize to conditions that are different from those faced in thetraining dataset Most of the machine learning methods perform cor-rectly only under a general assumption the training and test dataare mapped to the same feature space and the same distribution (Panand Yang 2009) When the distribution changes most statistical mod-els need to be retrained from scratch using newly obtained training

41

[ September 4 2020 at 1017 ndash classicthesis ]

42 deep learning re-implementation of units

samples It is especially crucial if applying such approaches to real-world data that is usually very noisy and contains a large numberof scenarios In this case a model would be ill-trained to make deci-sions about novel patterns Furthermore for most of the real-worldapplications it is expensive or even not feasible at all to recollect therequired training data to retrain the model Hence the possibility totransfer the knowledge obtained during the training on existing datato the new unseen domain is one of the possible solutions for indus-trial problems Such a technique is called Transfer Learning (TL) andit allows dealing with the abovementioned scenarios by leveragingexisting labeled data of some related tasks or domains

411 Outline and Contributions

This work addresses the challenge of re-implementing two out ofthree central components in our rule-based IPA conversational assis-tant with deep learning methods As we discussed in Chapter 2 hy-brid dialogue models (ie consisting of both rule-based and trainablecomponents) appear to replace the established rule- and template-based methods which are to the most part employed in the in-dustrial setting To investigate this assumption and examine currentstate-of-the-art deep learning techniques we conducted experimentson our custom domain-specific data Since the available data werecollected in a trial-run and the obtained dataset was relatively smallto train a conventional machine learning model from scratch we uti-lized the Transfer Learning (TL) approach and fine-tuned the existingpre-trained model for our target domain

For the experiments we defined two tasks

bull First we considered the NER problem in a custom domain set-ting We defined a sequence labeling task and employed a Bidirec-tional Encoder Representations from Transformers (BERT) (De-vlin et al 2018) as one of the transfer learning methods widelyutilized in the NLP area We applied the model on our datasetand fine-tuned it for seven (7) domain-specific (ie e-learning)entities

bull Second we examined the effectiveness of transfer learning forthe dialogue manager core The DM is one of the fundamentalparts of the conversational agent and requires compelling learn-ing techniques to hold the conversation intelligently For thatexperiment we defined a classification task and applied the BERT

model to predict the systems next action for every given userinput in a conversation We then computed the macro F1-scorefor 14 possible classes (ie actions) and an average dialogueaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

42 related work 43

The main characteristics of both settings are

bull the set of named entities and labels (ie actions) between sourceand target domain is different

bull for both settings we utilized the BERT model in German andmultilingual versions with additional parameters

bull the dataset used to train the target model is small and consistsof highly domain-specific labels The latter is often appearingin industrial scenarios

We finally verified that the selected method performed reasonablywell on both tasks We reached the performance of 093 macro F1

points for Named Entity Recognition (NER) and 075 macro F1 pointsfor the Next Action Prediction (NAP) task Therefore we concludethat both NER and NAP components could be employed to substituteor extend the existing rule-based modules

42 related work

As discussed in the previous section the main benefit of TransferLearning (TL) is that it allows the domains and tasks used duringtraining and testing phases to be different Initial steps in the for-mulation of transfer learning were made after NeurIPS-951 workshopthat was focused on the demand for lifelong machine-learning meth-ods that preserve and reuse previously learned knowledge (Chen etal 2018) Research on transfer learning has attracted more and moreattention since then especially in the computer vision domain andin particular for image recognition tasks (Donahue et al 2014 SharifRazavian et al 2014)

Recent studies have shown that TL can also be successfully appliedwithin the Natural Language Processing (NLP) domain especially onsemantically equivalent tasks (Dai et al 2019 Devlin et al 2018Mou et al 2016) Several research works were carried out specificallyon the Named Entity Recognition (NER) task and explored the capa-bilities of transfer learning when applied to different named entitycategories (ie different output spaces) For instance Qu et al (2016)pre-trained a linear-chain Conditional Random Fields (CRF) on a largeamount of labeled data in the source domain Then they introduceda two linear layer neural network that learns the difference betweenthe source and target label distributions Finally the authors initial-ized another CRF with parameters learned in linear layers to predictthe labels of the target domain Kim et al (2015) conducted experi-ments with transferring features and model parameters between re-lated domains where the label classes are different but may have asemantic similarity Their primary approach was to construct labelembeddings to automatically map the source and target label classesto improve the transfer (Chen et al 2018)

1Conference on Neural Information Processing Systems Formerly NIPS

[ September 4 2020 at 1017 ndash classicthesis ]

44 deep learning re-implementation of units

However there are also models trained in a manner such that theycould be applied to multiple NLP tasks simultaneously Among sucharchitectures are ULMFit (Howard and Ruder 2018) BERT (Devlin etal 2018) GPT2 (Radford et al 2019) and TransformerXL (Dai et al2019) ndash powerful models that were recently proposed for transferlearning within the NLP domain These architectures are called lan-guage models since they attempt to learn language structure from largetextual collections in an unsupervised manner and then predict thenext word in a sequence based on previously observed context wordsThe Bidirectional Encoder Representations from Transformers (BERT)(Devlin et al 2018) model is one of the most popular architectureamong the abovementioned and it is also one of the first that showeda close to human-level performance on the General Language Un-derstanding Evaluation (GLUE) benchmark2 (Wang et al 2018) thatincludes tasks on sentiment analysis question answering semanticsimilarity and language inference The architecture of BERT consistsof stacked transformer blocks that are pre-trained on a large corpusconsisting of 800M words from modern English books (ie BooksCor-pus by Zhu et al) and 2 500M words of text from pre-processed (iewithout markup) English Wikipedia articles Overall BERT advancedthe state-of-the-art performance for eleven (11) NLP tasks We describethis approach in detail in the subsequent section

43 bert bidirectional encoder representations from

transformers

Bidirectional Encoder Representations from Transformers (BERT) ar-chitecture is divided into two steps pre-training and fine-tuning Thepre-training step was enabled through two following tasks

bull Masked Language Model ndash the idea behind this approach is totrain a deep bidirectional representation that is different fromconventional language models which are either trained left-to-right or right-to-left To train a bidirectional model the authorsmask out a certain number of tokens in a sequence The modelis then asked to predict the original words for masked tokens Itis essential that in contrast to denoising auto-encoders (Vincentet al 2008) the model does not need to reconstruct the entireinput instead it attempts to predict the masked words Sincethe model does not know which words exactly it will be askedabout it learns a representation for every token in a sequence(Devlin et al 2018)

bull Next Sentence Prediction ndash aims to capture relationships be-tween two sentences A and B In order to train the model theauthors pre-trained a binarized next sentence prediction taskwhere pairs of sequences were labeled according to the criteriaA and B either follow each other directly in the corpus (label

2httpsgluebenchmarkcom

[ September 4 2020 at 1017 ndash classicthesis ]

43 bert bidirectional encoder representations from transformers 45

IsNext) or are both taken from random places (label NotNext)The model is then asked to predict whether the former or thelatter is the case The authors demonstrated that pre-trainingtowards this task is extremely useful for both question answeringand natural language inference tasks (Devlin et al 2018)

Furthermore BERT uses WordPiece tokenization for embeddings(Wu et al 2016) which segments words into subword-level tokensThis approach allows the model to make additional inferences basedon word structure (ie -ing ending denotes similar grammatical cat-egories)

The fine-tuning step follows the pre-training process For that theBERT model is initialized with parameters obtained during the pre-training step and a task-specific fully-connected layer is used aftertransformer blocks which maps the general representations to cus-tom labels (employing softmax operation)

self-attention The key operation of BERT architecture is theself-attention which is a sequence-to-sequence operation with conven-tional encoder-decoder structure (Vaswani et al 2017) The encodermaps an input sequence (x1 x2 xn) to a sequence of continuousrepresentations z = (z1 z2 zn) Given z the decoder then gener-ates an output sequence (y1y2 ym) of symbols one element ata time (Vaswani et al 2017) To produce output vector yi the selfattention operation takes a weighted average over all input vectors

yi =sumj

wijxj (16)

where j is an index over the entire sequence and the weights sumto one (1) over all j The weight wij is derived from a dot productfunction over xi and xj (Bloem 2019 Vaswani et al 2017)

w primeij = xTi xj (17)

The dot product returns a value between negative and positive in-finity and a softmax maps these values to [0 1] to ensure that theysum to one (1) over the entire sequence (Bloem 2019 Vaswani et al2017)

wij =expwlsquoijsumj expwlsquoij

(18)

A self-attention is not the only operation in transformers but is theessential one since it propagates information between vectors Otheroperations in the transformer are applied to every vector in the inputsequence without interactions between vectors (Bloem 2019 Vaswaniet al 2017)

[ September 4 2020 at 1017 ndash classicthesis ]

46 deep learning re-implementation of units

44 data amp descriptive statistics

The benchmark that we accumulated through the trial-run includes300 structured and partially labeled dialogues with the average lengthof dialogue being six (6) utterances All the conversations were heldin the German language Detailed general statistics can be found inTable 14

Value

Max Len Dialogue (in utterances) 15

Avg Len Dialogue (in utterances) 6

Max Len Utterance (in tokens) 100

Avg Len Utterance (in tokens) 9

Overall Unique Action Labels 13

Overall Unique Entity Labels 7

Train ndash Dialogues ( Utterances) 200 (1161)

Eval ndash Dialogues ( Utterances) 50 (279)

Test ndash Dialogues ( Utterances) 50 (300)

Table 14 General statistics for conversational dataset

The dataset covers 14 unique next actions and seven (7) uniquenamed entities Detailed information on next actions and entities canbe found in Table 15 and Table 16 Below we listed possible actionswith the corresponding explanation and predefined mapped state-ments

unk ndash requests to define the intent of the question mathematicalorganizational or contextual (ie fallback policy in case if theautomated intent recognition unit fails) rarr ldquoHast du eine Fragezu einer Aufgabe (MATH) zu einem Text im Kurs (TEXT) oder eineorganisatorische Frage (ORG)rdquo

org ndash handovers discussion to a human tutor in case of organiza-tional question rarr ldquoEs scheint sich wohl um eine organisatorischeFrage zu handeln Bitte fasse deine Frage in einem kurzen Text zusam-men damit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

text ndash handovers discussion to a human tutor in case of contextualquestion rarr ldquoEs scheint sich wohl um eine inhaltliche Frage zuhandeln Bitte fasse deine Frage in einem kurzen Text zusammendamit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

[ September 4 2020 at 1017 ndash classicthesis ]

44 data amp descriptive statistics 47

topic ndash requests information on the topic rarr ldquoWelches Kapitel bear-beitest du gerade Antworte bitte mit der Kapitelnummer zB IA IB II oder mit dem Kapitelnamen zB Geometrierdquo

examination ndash requests information about the examination moderarrldquoBearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfungrdquo

subtopic ndash requests information about the subtopic in case of quizor section-level trainingexercise modes rarr ldquoWie heiszligt der Ab-schnitt in dem sich die Aufgabe befindet auf die sich deine Fragebeziehtrdquo

level ndash requests information about the level of examination modechapter or section rarr ldquoGeht es um eine Aufgabe auf Kapitel-Ebene(zB um eine Trainingsaufgabe im Kapitel Geometrie) oder um eineAufgabe auf der Ebene eines Abschnitts in einem Kapitel (zB umeine Aufgabe zum Abschnitt Winkel im Kapitel Geometrie) Antwortebitte mit KAP fuumlr Kapitel und SEK fuumlr Abschnittrdquo

question number ndash requests information about the question num-ber rarr ldquoWelche Aufgabe bearbeitest du gerade Wenn du zB beiTeilaufgabe c in der zweiten Trainingsaufgabe (oder der zweiten Uumlbungs-aufgabe) bist antworte mit 2c Bearbeitest du gerade einen Quiz odereine Schlusspruumlfung gib bitte nur die Teilaufgabe anrdquo

final request ndash considers the Informational Dictionary (ID) to becomplete and initiates the verification step rarr ldquoDeine Angabenwaren wie folgt Habe ich dich richtig verstanden Bitte antwortemit Ja oder Neinrdquo

verify request ndash requests the student to verify the assembled in-formation rarr ldquoWelche der folgenden Punkte habe ich falsch erkanntSchreibe bitte welche der folgenden Punkte es sind [a b c ]3 rdquo

correct request ndash requests the student to provide a correct infor-mation if there is an error in the assembled data rarr ldquoGib bittedie korrekte Information fuumlr [a]4 einrdquo

exact question ndash requests the student to provide the detailed ex-planation of the problem rarr ldquoBitte fasse deine Frage in einemkurzen Text zusammen damit ich sie an meinen menschlichen Kolle-gen weiterleiten kannrdquo

human handover ndash finalizes the conversation rarr ldquoVielen Dankunsere menschlichen Kollegen melden sich gleich bei dirrdquo

3Variables that depend on the state of Informational Dictionary (ID)4Variable that was selected as erroneous and needs to be corrected

[ September 4 2020 at 1017 ndash classicthesis ]

48 deep learning re-implementation of units

Action Counts Action Counts

Final Request 321 Unk 80

Human Handover 300 Subtopic 55

Exact Question 286 Correct Request 40

Question Number 176 Verify Request 34

Examination 175 Org 17

Topic 137 Text 13

Level 130

Table 15 Detailed statistics on possible systems actions Columns Countsdenote the number of occurrences of each action in the entiredataset

Entity Counts

Question Nr 317

Chapter 311

Examination 303

Subtopic 198

Level 80

Intent 70

Table 16 Detailed statistics on possible named entities Column Counts de-note the number of occurrences of each entity in the entire dataset

45 named entity recognition

We defined a sequence labeling task to extract custom entities fromuser input We considered seven (7) viable entities (see Table 16) tobe recognized by the model topic subtopic examination mode and levelquestion number intent as well as the entity other for remaining wordsin the utterance Since the data collected from the rule-based sys-tem already includes information on the entities we able to train adomain-specific NER unit However since the original user-input wasinformal the same information could be provided in different writ-ing styles That means that a single entity could have different surfaceforms (eg synonyms writing styles) Therefore entities that we ex-tracted from the rule-based system were transformed into a universalstandard (eg official chapter names) To consider all of the variableentity forms while post-labeling the original dataset we determinedgeneric entity names (eg chapter question nr) and mapped vari-ations of entities from the user input (eg Chapter = [ElementaryCalculus Chapter I ]) to them The overall dataset consists of 300labeled dialogues where 200 (with 1161 utterances) were employedfor training and 100 for evaluation and test sets (50 dialogues withca 300 utterances for each set respectively)

[ September 4 2020 at 1017 ndash classicthesis ]

46 next action prediction 49

451 Model Settings

We conducted experiments with German and multilingual BERT im-plementations5 The capitalization of words is significant for the Ger-man language therefore we run the experiments on the capitalizeddata while preserving the original punctuation We employed the avail-able base model for both multilingual and German BERT implemen-tations in the cased version We initiated the learning rate for bothmodels to 1e minus 4 and the maximum length of the tokenized inputwas set to 128 tokens We conducted the experiments multiple timeswith different seeds for a maximum of 50 epochs with the trainingbatch size set to 32 We utilized AdamW as the optimizer and appliedearly stopping if the performance did not improve significantly after5 epochs

46 next action prediction

We defined a classification problem where we aim to predict the sys-temrsquos next action according to the given user input We assumed 14custom actions (see Table 15) that we considered being our classesWhile collecting the toy dataset every input was automatically la-beled by the rule-based system with the corresponding next actionand the dialogue-id Thus no additional post-labeling was requiredon this step

We investigated two settings for this task

bull Default Setting Utilizing a user input and the correspondingclass (ie next action) without any supplementary context Bydefault we conduct all of our experiments in this setting

bull Extended Setting Employing a user input corresponding nextaction and previous systems action as a source of supplementarycontext For this setting we experimented with the best per-forming model from the default setting

The overall dataset consists of 300 labeled dialogues where 200(with 1161 utterances) were employed for training and 100 for evalu-ation and test sets (50 dialogues with ca 300 utterances for each setrespectively)

461 Model Settings

For the NAP task we carried out experiments with German and multi-lingual BERT implementations as well We examined the performanceof both capitalized and lower-cased inputs as well as plain and prepro-cessed data For the multilingual BERT we used the base model inboth cased and uncased modifications For the German BERT we

5httpsgithubcomhuggingfacetransformers

[ September 4 2020 at 1017 ndash classicthesis ]

50 deep learning re-implementation of units

utilized the base model in the cased modification only6 For bothmodels we initiated the learning rate to 4e minus 5 and the maximumlength of the tokenized input was set to 128 tokens We conductedthe experiments multiple times with different seeds for a maximumof 300 epochs with the training batch size set to 32 We utilized AdamW

as the optimizer and applied early stopping if the performance didnot improve significantly after 15 epochs

47 evaluation and results

To evaluate the models we computed word-level macro F1 score forthe Named Entity Recognition (NER) task and utterance-level macro F1

score for the Next Action Prediction (NAP) task The word-level F1

is estimated as the average of the F1 scores per class each computedfrom all words in the evaluation and test sets The outcomes for theNER task are represented in Table 17 For utterance-level F1 a singleclass label (ie next action) is taken for the whole utterance Theresults for the NAP task are shown in Table 18

We additionally computed average dialogue accuracy for the best per-forming NAP models This score indicates how well the predictednext actions compose the conversational flow The average dialogueaccuracy was estimated for 50 conversations in the evaluation andtest sets respectively The results are displayed in Table 19

The obtained results for the NER task revealed that German BERT

performed significantly better than the multilingual BERT model Theperformance of the custom NER unit is at 093 macro F1 points for allpossible named entities (see Table 17) In contrast for the NAP taskthe multilingual BERT model obtained better performance than theGerman BERT model The best performing method in the default set-ting gained a macro F1 of 0677 points for 14 possible classes whereasthe model in the extended setting performed better ndash its best macroF1 score is 0752 for the equal number of classes (see Table 18)

Regarding the dialogue accuracy the extended system fine-tunedwith multilingual BERT obtained better outcomes as the default one(see Table 19) The overall conclusion for the NAP is that the capi-talized setting increased the performance of the model whereas theinclusion of punctuation has not influenced the results

6Uncased pre-trained modification of model was not available

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 51

Task Model Cased Punct F1

Eval Test

NER GER X X 0971 0930

NER Mult X X 0926 0905

Table 17 Word-level F1 for the Named Entity Recognition (NER) task Inbold best performance for evaluation and test sets

Task Model Cased Punct Extended F1

Eval Test

NAP GER X times times 0711 0673

NAP GER X X times 0701 0606

NAP Mult X X times 0688 0625

NAP Mult X times times 0769 0677

NAP Mult X times X 0810 0752

NAP Mult times X times 0664 0596

NAP Mult times times times 0742 0502

Table 18 Utterance-level F1 for the Next Action Prediction (NAP) task Un-derlined best performance for evaluation and test sets for defaultsetting (without previous action context) In bold best perfor-mance for evaluation and test sets on extended setting (with previ-ous action context)

Model Accuracy

Eval Test

NAP default 0765 0724

NAP extended 0813 0801

Table 19 Average dialogue accuracy computed for the Next Action Predic-tion (NAP) task for best preforming models In bold best perfor-mance for evaluation and test sets

[ September 4 2020 at 1017 ndash classicthesis ]

52 deep learning re-implementation of units

Figure5M

acroF1

evaluationN

AP

ndashdefault

modelC

omparison

between

Germ

anand

multilingual

BERT

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 53

Figu

re6

Mac

roF1

test

NA

Pndash

defa

ult

mod

elC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

54 deep learning re-implementation of units

Figure7M

acroF1

evaluationN

AP

ndashextended

model

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 55

Figu

re8

Mac

roF1

test

NA

Pndash

exte

nded

mod

el

[ September 4 2020 at 1017 ndash classicthesis ]

56 deep learning re-implementation of units

Figure9M

acroF1

evaluationN

ERndash

Com

parisonbetw

eenG

erman

andm

ultilingualBER

T

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 57

Figu

re1

0M

acroF1

test

NER

ndashC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

58 deep learning re-implementation of units

48 error analysis

We then investigated the cases where models failed to predict thecorrect action or labeled the named entity span erroneously Belowwe outline the most common errors

next action prediction One of the prevalent flaws in the de-fault model was the mismatch between two consecutive actionsndash the action question number and subtopic That is due to theposition of these actions in the conversational flow The occur-rence of both actions in the dialogue is not strict and largelydepends on the previous action To explain this we illustratethe following possible conversational scenarios

bull The system can either directly proceed with the request forthe question number if the examination mode is the finalexamination

bull If the examination mode is training or exercise the systemhas to ask about the level first (ie chapter or section)

bull If it is a chapter-level the system can again directly pro-ceed with the request for the question number

bull In the case of section-level the system has to ask aboutthe subsection first (if this information was not providedbefore)

The examination of the extended model showed that the inductionof supplementary context (ie previous action) improved theperformance of the system by about 60

named entity recognition The cases where the system failedcover mismatches between the labels chapter and other andthe labels question number and other This failure arose dueto the imperfectly labeled span of a multi-word named entity(eg ldquoElementary Calculus Proportionality Percentagerdquo) The firstor last words in the named entity were excluded from the spanand erroneously labeled with the other label

49 summary

In this chapter we re-implemented two of three fundamental unitsof our IPA conversational agent ndash Named Entity Recognition (NER)and Dialogue Manager (DM) ndash using the methods of Transfer Learn-ing (TL) specifically the Bidirectional Encoder Representations fromTransformers (BERT) model

We believe that the obtained results are reasonably high consider-ing a comparatively little amount of training data we employed tofine-tune the models Therefore we conclude that both NAP and NER

segments could be used to replace or extend the existing rule-based

[ September 4 2020 at 1017 ndash classicthesis ]

410 future work 59

modules Rule-based components as well as ElasticSearch (ES) arelimited in their capacities and not flexible to novel patterns Whereasthe trainable units generalize better and could possibly decrease theamount of inaccurate predictions in case of accidental dialogue be-havior Furthermore to increase the overall robustness rule-basedand trainable components could be used synchronously when onesystem fails (eg ElasticSearch (ES) module can not extract an entity)the conversation continues on the prediction obtained from the othermodel

410 future work

There are possible directions for future work

bull At the time of writing of this thesis both units were trained andinvestigated on its own ndash that means without being incorpo-rated into the initial IPA conversational system Thus one of thepotential directions of future work would be the embodimentof these units into the original system and the examination ofpossible failure cases Furthermore the resulting hybrid sys-tem could then imply the constraint of additional meta policiesto prevent an unexpected systems behavior

bull Further investigation could be towards the multi-language mo-dality for the re-implemented units Since the Online Mathe-matik Bruumlckenkurs (OMB+) platform also supports English andChinese it would be interesting to examine whether the sim-ple translation from target language (ie English Chinese) tosource language (ie German) would be sufficient to employalready-assembled dataset and pre-trained units Presumablyit should be achievable in the case of the Next Action Predic-tion (NAP) unit but could lead to a noticeable performance dropfor the Named Entity Recognition (NER) task since there couldbe a considerable gap between translated and original entitiesspans

bull Last but not least one of the further extensions for the IPA

would be the eventual applicability for more cognitively de-manding tasks For instance it is of interest to extend the modelfor more complex domains like handling of mathematical ques-tions Would it be possible to automate the human assistance atleast by less complicated mathematical inquiries or is it still un-achievable with currently existing NLP techniques and machinelearning methods

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

Part II

C L U S T E R I N G - B A S E D T R E N D A N A LY Z E R

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

5F O U N D AT I O N S

As discussed in Chapter 1 predictive analytics is one of the poten-tial Intelligent Process Automation (IPA) tasks due to its capabilityto forecast future events by using current and historical data andtherethrough assist knowledge workers with for instance sales or in-vestments Modern machine learning algorithms allow an intelligentand fast analysis of large amounts of data However the evaluationof such algorithms and models remains one of the grave problemsin this domain This is mostly due to the lack of publicly availablebenchmarks for the evaluation of topic modeling and trend detectionalgorithms Hence in this part of the thesis we are addressing thischallenge and assembling the benchmark for detecting both trendsand downtrends (ie topics that become less frequent overtime) Tothe best of our knowledge the task of downtrend detection has notbeen well addressed before

In this chapter we introduce the concepts required by the secondpart of the thesis We begin with the enlightenment to the topic mod-eling task and define the emerging trend in Section 51 Next weexamine the related work existing approaches to topic modeling andtrend detection and their limitations in Section 52 In Chapter 6we introduce our TRENDNERT benchmark for trend and downtrenddetection and describe the method we employed for its assembly

51 motivation and challenges

Science changes and evolves rapidly novel research areas emergewhile others fade away Keeping pace with these changes is challeng-ing Therefore recognizing and forecasting emerging research trends isof significant importance for researchers academic publishers as wellas for funding agencies stakeholders and innovation companies

Previously the task of trend detection was mostly solved by do-main experts who used specialized and cumbersome tools to inves-tigate the data to get useful insights from it (eg through data visu-alization or statistics) However manual analysis of large amountsof data could be time-consuming and hiring domain experts is verycostly Furthermore the overall increase of research data (ie GoogleScholar PubMed DBLP) in the past decade makes the approachbased on human domain experts less and less scalable In contrastautomated approaches become a more desirable solution for the taskof emerging trend identification (Kontostathis et al 2004 Salatino2015) Due to a large number of electronic documents appearing on

63

[ September 4 2020 at 1017 ndash classicthesis ]

64 foundations

the web daily several machine learning techniques were developed tofind patterns of word occurrences in those documents using hierarchi-cal probabilistic models These approaches are called ndash topic modelsand they allow the analysis of extensive textual collections and valu-able insights from them The main advantage of topic models is theirability to discover hidden patterns of word-use and connecting docu-ments containing similar patterns In other words the idea of a topicmodel is that documents are a mixture of topics where a topic is de-fined as a probability distribution over words (Alghamdi and Alfalqi2015)

Topics have a property of being able to evolve thus modeling top-ics without considering time may confuse the precise topic identifi-cation Whereas modeling topics by considering time is called topicevolution modeling and it can reveal important hidden informationin the document collection allowing recognizing topics with the ap-pearance of time This approach also provides the ability to check theevolution of topics during a given time window and examine whethera given topic is gaining interest and popularity over time (ie becom-ing an emerging trend) or if its development is stable Considering thelatter approach we turn to the task of emerging trend identification

How is an emerging trend defined An emerging trend is a topicthat is gaining interest and utility over time and it has two attributesnovelty and growth (Small Boyack and Klavans 2014 Tu and Seng2012) Rotolo Hicks and Martin (2015) explain the correlations ofthese two attributes in the following way Up to the point of theemergence a research topic is specified by a high level of novelty Atthat time it does not attract any considerable attention from the sci-entific community and due to the limited impact its growth is nearlyflat After some turning points (ie the appearance of significant sci-entific publications) the research topic starts to grow faster and maybecome a trend if the growth is fast however the level of a noveltydecreases continuously once the emergence becomes clear Then atthe final phase the topic becomes well-established while its noveltylevels out (He and Chen 2018)

To detect trends in textual data one employs computational anal-ysis techniques and Natural Language Processing (NLP) In the sub-sequent section we give an overview of the existing approaches fortopic modeling and trend detection

52 detection of emerging research trends

The task of trend detection is closely associated with the topic mod-eling task since in the first instance it detects the latent topics in thesubstantial collection of data It secondarily employs techniques toinvestigate the evolution of the topic and whether it has the potentialto become a trend

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 65

Topic modeling has an extensive history of applications in the scien-tific domain including but not limited to studies of temporal trends(Griffiths and Steyvers 2004 Wang and McCallum 2006) and investi-gation of the impact prediction (Yogatama et al 2011) Below we givea general overview of the most significant methods used in this do-main explain their importance and if any the potential limitations

521 Methods for Topic Modeling

In order to extract a topic from textual documents (ie research pa-pers blog posts) the precise definition of the model for topic repre-sentation is crucial

latent semantic analysis or formerly called Latent SemanticIndexing is one of the fundamental methods in the topic model-ing domain proposed by Deerwester et al in 1990 Its primarygoal is to generate vector-based representations for documentsand terms and then employ these representations to computethe similarity between documents (resp terms) in order to se-lect the most related ones In this regard Latent Semantic Anal-ysis (LSA) takes a matrix of documents and terms and then de-composes it into a separate document-topic matrix and a topic-term matrix Before turning to the central task of finding latenttopics that capture the relationships among the words and docu-ments it also performs dimensionality reduction utilizing trun-cated Singular Value Decomposition This technique is usuallyapplied to reduce the sparseness noisiness and redundancyacross dimensions in the document-topic matrix Finally usingthe resulting document and term vectors one applies measuressuch as cosine similarity to evaluate the similarity of differentdocuments words or queries

One of the extensions to traditional LSA is probabilistic LatentSemantic Analysis (pLSA) that was introduced by Hofmann in1999 to fix several shortcomings The foremost advantage ofthe follow-up method was that it could successfully automatedocument indexing which in turn improved the LSA in a prob-abilistic sense by using a generative model (Alghamdi and Al-falqi 2015) Furthermore the pLSA can differentiate betweendiverse contexts of word usage without utilizing dictionariesor a thesaurus That affects better polysemy disambiguationand discloses typical similarities by grouping words that sharea similar context Among the works that employ pLSA is theapproach proposed by Gohr et al (2009) where authors applypLSA in a window that slides across the stream of documents toanalyze the evolution of topics Also Mei et al (2008) uses thepLSA in order to create a network of topics

[ September 4 2020 at 1017 ndash classicthesis ]

66 foundations

latent dirichlet allocation was developed by Blei Ng andJordan (2003) to overcome the limitations of both LSA and pLSA

models Nowadays Latent Dirichlet Allocation (LDA) is ex-pected to be one of the most utilized and robust probabilistictopic models The fundamental concept of LDA is the assump-tion that every document is represented as a mixture of topicswhere each topic is a discrete probability distribution that de-fines the probability of each word to appear in a given topicThese topic probabilities provide a compressed representationof a document In other words LDA models all of d documentsin the collection as a mixture over n latent topics where eachof the topics describes a multinomial distribution over awwordin the vocabulary (Alghamdi and Alfalqi 2015)

LDA fixes such weaknesses of the pLSA like the incapacity toassign a probability to previously unseen documents (becausepLSA learns the topic mixtures only for documents seen in thetraining phase) and the overfitting problem (ie the number ofparameters in pLSA increases linearly with the size of the corpus)(Salatino 2015)

An extension of the conventional LDA is the hierarchical LDA

(Griffiths et al 2004) where topics are grouped in hierarchiesEach node in the hierarchy is associated with a topic and a topicis a distribution across words Another comparable approach isthe Relational Topic Model (Chang and Blei 2010) which is amixture of a topic model and a network model for groups oflinked documents The attribute of every document is its text(ie discrete observations taken from a fixed vocabulary) andthe links between documents are hyperlinks or citations

LDA is a prevalent method for discovering topics (Blei and Laf-ferty 2006 Bolelli Ertekin and Giles 2009 Bolelli et al 2009Hall Jurafsky and Manning 2008 Wang and Blei 2011 Wangand McCallum 2006) However it was explored that to trainand fine-tune a stable LDA model could be very challenging andtime-consuming (Agrawal Fu and Menzies 2018)

correlated topic model One of the main drawbacks of the LDA

is its incapability of capturing dependencies among the topics(Alghamdi and Alfalqi 2015 He et al 2017) Therefore Bleiand Lafferty (2007) proposed the Correlated Topic Model as anextension of the LDA approach In this model the authors re-placed the Dirichlet distribution with a logistic-normal distribu-tion that models pairwise topic correlations with the Gaussiancovariance matrix (He et al 2017)

However one of the shortcomings of this model is the compu-tational cost The number of parameters in the covariance ma-trix increases quadratic to the number of topics and parameterestimation for the full-rank matrix can be inaccurate in high-dimensional space (He et al 2017) Therefore He et al (2017)

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 67

proposed their method to overcome this problem The authorsintroduced the model which learns dense topic embeddingsand captures topic correlations through the similarity betweenthe topic vectors According to He et al their method allowsefficient inference in the low-dimensional embedding space andsignificantly reduces previous cubic or quadratic time complex-ity to linear concerning the topic number

522 Topic Evolution of Scientific Publications

Although the topic modeling task is essential it has a significant off-spring task namely the modeling of topic evolution Methods able toidentify topics within the context of time are highly applicable in thescientific domain They allow understanding of the topic lineage andreveal how research on one topic influences another Generally thesemethods could be classified according to the primary sources of in-formation they employ text citations or key phrases

text-based Extensive textual collections have dynamic co-occurre-nces of word patterns that change over time Thus models thattrack topics evolution over time should consider both word co-occurrence patterns and the timestamps Blei and Lafferty 2006

In 2006 Wang and McCallum proposed a Non-Markov Continu-ous Time model for the detection of topical trends in the col-lection of NeurIPS publications (the overall timestamp of 17years was considered) The authors made the following as-sumption If the pattern of the word co-occurrence exists fora short time then the system creates a narrow-time-distributiontopic Whereas for the long-term patterns system generated abroad- time-distribution topic The principal objective of this ap-proach is that it models topic evolution without discretizing thetime ie the state at time t + t1 is independent of the state attime t (Wang and McCallum 2006)

Another work done by Blei and Lafferty (2006) proposed the Dy-namic Topic Models The authors assumed that the collectionof documents is organized based on certain time spans and thedocuments belonging to every time span are represented witha K-component model Thereby the topics are associated withtime span t and emerge from topics corresponding to span timetminus 1 One of the main differences of this model compared toother existing approaches is that it utilizes Gaussian distribu-tion for the topic parameters instead of the conventional Dirich-let distribution and therefore can capture the evolution overvarious time spans

[ September 4 2020 at 1017 ndash classicthesis ]

68 foundations

citation-based analysis is the second popular direction that isconsidered to be effective for trend identification He et al 2009Le Ho and Nakamori 2005 Shibata Kajikawa and Takeda2009 Shibata et al 2008 Small 2006 This method assumes thatcitations in their various forms (ie bibliographic citations co-citation networks citation graphs) indicate the meaningful rela-tionship between topics and uses citations to model the topicevolution in the scientific domain Nie and Sun (2017) utilizethis approach along with the Latent Dirichlet Allocation (LDA)and k-means clustering to identify research trends The authorsfirst use LDA to extract features and determine the optimal num-ber of topics Then they use k-means to obtain thematic clus-ters and finally they compute citation functions to identify thechanges of clusters over time

Though the citation-based approachrsquos main drawback is that aconsistent collection of publications associated with a specificresearch topic is required for these techniques to detect the clus-ter and novelty of the particular topic He and Chen 2018 Fur-thermore there are also many trend detection scenarios (egresearch news articles or research blog posts) in which citationsare not readily available That makes this approach not applica-ble to this kind of data

keyphrase-based Another standard solution is based on the useof keyphrase information extracted from research papers In thiscase every keyword is representative of a single research topicSystems like Saffron (Monaghan et al 2010) as well as Saffron-based models use this approach Saffron is a research toolkitthat provides algorithms for keyphrase extraction entity link-ing and taxonomy extraction Asooja et al (2016) utilize Saffronto extract the keywords from LREC proceedings and then pro-posed trend forecast based on regression models as an extensionto Saffron

However this method raises many questions including the fol-lowing how to deal with the noisiness of keywords (ie notrelevant keywords) and how to handle keywords that have theirhierarchies (ie sub-areas) Other drawbacks of this approachinclude the ambiguity of keywords (ie ldquoJavardquo as an island andldquoJavardquo as a programming language) or necessity to deal withsynonyms which could be treated as different topics

53 summary

An emerging trend detection algorithm aims to recognize topics thatwere earlier inappreciable but now gaining the importance and in-terest within the specific domain Knowledge of emerging trends isespecially essential for stakeholders researchers academic publish-ers and funding organizations Assume a business analyst in the

[ September 4 2020 at 1017 ndash classicthesis ]

53 summary 69

biotech company who is interested in the analysis of technical arti-cles for recent trends that could potentially impact the companiesinvestments A manual review of all the available information in aspecific domain would be extremely time-consuming or even not fea-sible However this process could be solved through Intelligent Pro-cess Automation (IPA) algorithms Such software would assist humanexperts who are tasked with identifying emerging trends through au-tomatic analysis of extensive textual collections and visualization ofoccurrences and tendencies of a given topic

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

6B E N C H M A R K F O R S C I E N T I F I C T R E N D D E T E C T I O N

This chapter covers work already published at interna-tional peer-reviewed conferences The relevant publica-tion is Moiseeva and Schuumltze 2020 The research describedin this chapter was carried out in its entirety by the authorof the thesis The other author of the publication acted asan advisor

Computational analysis and modeling of the evolution of trendsis a significant area of research because of its socio-economic impactHowever no large publicly available benchmark for trend detectioncurrently exists making a comparative evaluation of methods impos-sible We remedy this situation by publishing the benchmark TREND-NERT consisting of a set of gold trends (resp downtrends) and doc-ument labels that is available as an unlimited download and a largeunderlying document collection that can also be obtained for free Wepropose Mean Average Precision (MAP) as an evaluation measure fortrend detection and apply this measure in an investigation of severalbaselines

61 motivation

What is an emerging research topic and how is it defined in the litera-ture At the time of emergence a research topic often does not attractmuch recognition from the scientific society and is exemplified by justa few publications He and Chen (2018) associate the appearance ofa trend to some triggering event eg the publication of an article ina high-impact journal Later the research topic starts to grow fasterand becomes a trend (He and Chen 2018)

We adopt this as our key representation of trend in this paper atrend is a research topic with a strongly increasing number of publi-cations in a particular time interval In opposite to trends there arealso downtrends which refer to the topics that move lower in theirdemand or growth as they evolve Therefore we define a downtrendas the converse to a trend a downtrend is a research topic that has astrongly decreasing number of publications in a particular time inter-val Downtrends can be as important to detect as trends eg for afunding agency planning its budget

To detect both types of topics (ie trends and downtrends) intextual data one employs computational analysis techniques and NLPThese methods enable the analysis of extensive textual collectionsand gain valuable insights into topical trendiness and evolution over

71

[ September 4 2020 at 1017 ndash classicthesis ]

72 benchmark for scientific trend detection

time Despite the overall importance of trend analysis there is to ourknowledge no large publicly available benchmark for trend detectionThis makes a comparative evaluation of methods and algorithms im-possible Most of the previous work has used proprietary or moder-ate datasets or evaluation measures were not directly bound to theobjectives of trend detection (Gollapalli and Li 2015)

We remedy this situation by publishing the benchmark TREND-NERT1 The benchmark consists of the following

1 A set of gold trends compiled based on an extensive analysis ofthe meta-literature

2 A set of labeled documents (based on more than one millionscientific publications) where each document is assigned to atrend a downtrend or the class of flat topics and supported bythe topic-title

3 An evaluation script to compute Mean Average Precision (MAP)as the measure for the accuracy of trend detection

We make the benchmark available as unlimited downloads2 Theunderlying research corpus can be obtained for free from SemanticScholar3 (Ammar et al 2018)

611 Outline and Contributions

In this work we address the challenge of the benchmark creationfor (down)trend detection This chapter is structured as follows Sec-tion 62 introduces our model for corpus creation and its fundamen-tal components In Section 63 we present the crowdsourcing proce-dure for the benchmark labeling Section 65 outlines the proposedbaseline for trend detection our experimental setup and outcomesFinally we illustrate the details on the distributed benchmark in Sec-tion 66 and conclude in Section 67

Our contributions within this work are as follows

bull TRENDNERT is the first publicly available benchmark for (down)-trend detection

bull TRENDNERT is based on a collection of more than one milliondocuments It is among the largest that has been used for trenddetection and therefore offers a realistic setting for developingtrend detection algorithms

bull TRENDNERT addresses the task of detecting both trends anddowntrends To the best of our knowledge the task of down-trend detection has not been addressed before

1The name is a concatenation of trend and its ananym dnert Because it supportsthe evaluation of both trends and downtrends

2httpsdoiorg1017605OSFIODHZCT3httpsapisemanticscholarorgcorpus

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 73

62 corpus creation

In this section we explain the methodology we used to create theTRENDNERT benchmark

621 Underlying Data

We employ a corpus provided by Semantic Scholar4 It includes morethan one million papers published mostly between 2000 and 2015

in about 5000 computer science journals and conference proceedingsEvery record in the collection consists of a title key phrases abstractfull text and metadata

622 Stratification

The distribution of documents over time in the entire original datasetis skewed (see Figure 11) We discovered that clustering the entire col-lection as well as a random sample produces corrupt results becausemore weight is given to later years than to earlier years Thereforewe first generated a stratified sample of the original document collec-tion To this end we randomly pick 10 000 documents for each yearbetween 2000 and 2016 for an overall sample size of 160 000

Figure 11 Overall distribution of papers in the entire dataset Years 1975 to2016

4httpswwwsemanticscholarorg

[ September 4 2020 at 1017 ndash classicthesis ]

74 benchmark for scientific trend detection

623 Document representations amp Clustering

As we mentioned this workrsquos primary focus was the creation of abenchmark not the development of new algorithms or models fortrend detection Therefore for our experiments we have selected analgorithm based on k-means clustering as a simple baseline methodfor trend detection

In conventional document clustering documents are usually repre-sented as bag-of-words (BOW) feature vectors (Manning Raghavanand Schuumltze 2008) Those however have a major weakness theyneglect the semantics of words (Le and Mikolov 2014) Still the re-cent work in representation learning domain and particularly doc2vecndash the method proposed by Le and Mikolov (2014) ndash can provide rep-resentations to document clustering that overcome this weakness

The doc2vec5 algorithm is inspired by techniques for learning wordvectors (Mikolov et al 2013) and is capable of capturing semantic reg-ularities in textual collections This is an unsupervised approach forlearning continuous distributed vector representations for text (ieparagraphs documents) This approach maps documents into a vec-tor space such that semantically related documents are assigned sim-ilar vector representations (eg an article about ldquogenomicsrdquo is closerto an article about ldquogene expressionrdquo than to an article about ldquofuzzysetsrdquo) Formally each paragraph is mapped to the individual vectorrepresented by a column in matrix D and every word is also mappedto the individual vector represented by a column in matrix W Theparagraph vector and word vectors are concatenated to predict thenext word in a context (Le and Mikolov 2014) Other work has al-ready successfully employed this type of document representationsfor topic modeling combining them with both Latent Dirichlet Allo-cation (LDA) and clustering approaches (Curiskis et al 2019 DiengRuiz and Blei 2019 Moody 2016 Xie and Xing 2013)

In our work we run doc2vec on the stratified sample of 160 000papers and represent each document as a length-normalized vector(ie document embedding) These vectors are then clustered intok = 1000 clusters using the scikit-learn6 implementation of k-means(MacQueen 1967) with default parameters The combination of doc-ument representations with a clustering algorithm is conceptuallysimple and interpretable Also the comprehensive comparative eval-uation of topic modeling methods utilizing document embeddingsperformed by Curiskis et al (2019) showed that doc2vec feature repre-sentations with k-means clustering outperform several other methods7

on three evaluation measures (Normalized Mutual Information Ad-justed Mutual Information and Adjusted Rand Index)

5We use the implementation provided by Gensim httpsradimrehurekcom

gensimmodelsdoc2vechtml6httpsscikit-learnorgstable7hierarchical clustering k-medoids NMF and LDA

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 75

We run 10 trials of stratification and clustering resulting in 10 dif-ferent clusterings We do this to protect against the variability ofclustering and because we do not want to rely on a single clusteringfor proposing (down)trends for the benchmark

624 Trend amp Downtrend Estimation

Recalling our definitions of trend and downtrend A trend (resp down-trend) is defined as a research topic that has a strongly increasing(resp decreasing) number of publications in a particular time inter-val

Linear trend estimation is a widely used technique in predictive(time-series) analysis that proves statements about tendencies in thedata by correlating the measurements to the times at which they oc-curred (Hess Iyer and Malm 2001) Considering the definition oftrend (resp downtrend) and the main concept of linear trend esti-mation models we utilize the linear regression to identify trend anddowntrend candidates in resulting clustering sets

Specifically we count the number of documents per year for a clus-ter and then estimate the parameters of the best fitting line (ie slope)Clusters are then ranked according to the resulting slope and then = 150 clusters with the largest (resp smallest) slope are selectedas trend (resp downtrend) candidates Thus our definition of a single-clustering (down)trend candidate is a cluster with extreme ascendingor descending slope Figure 12 and Figure 13 give two examples of(down)trend candidates

625 (Down)trend Candidates Validation

As detailed in Section 623 we run ten series of clustering result-ing in ten discrete clustering sets We then estimated the (down)trendcandidates according to the procedure described in Section 624 Con-sequently for each of the ten clustering sets we obtained 150 trendand 150 downtrend candidates

Nonetheless for the benchmark among all the (down)trend can-didates across the ten clustering sets we want to sort out only thosethat are consistent overall sets To obtain consistent (down)trend candi-dates we apply a Jaccard coefficient metric that is used for measuringthe similarity and diversity of sample sets We define an equivalencerelation simR two candidates (ie clusters) are equivalent if their Jac-card coefficient is gt τsim where τsim = 05 Following this notion wecompute the equivalence of (down)trend candidates across the tensets

Finally we define an equivalence class as an equivalence class trendcandidate (resp equivalence class downtrend candidate) if it contains trend(resp downtrends) candidates in at least half of the clusterings This

[ September 4 2020 at 1017 ndash classicthesis ]

76 benchmark for scientific trend detection

Figure 12 A trend candidate (positive slope) Topic Sentiment and EmotionAnalysis (using Social Media Channels)

Figure 13 A downtrend candidate (negative slope) Topic XML

[ September 4 2020 at 1017 ndash classicthesis ]

63 benchmark annotation 77

procedure gave us in total 110 equivalence class trend candidates and107 equivalence class downtrend candidates that we then annotatefor our benchmark8

63 benchmark annotation

To annotate the equivalence class (down)trend candidates obtainedfrom our model we used the Figure-Eight9 platform It provides anintegrated quality-control mechanism and a training phase before theactual annotation (Kolhatkar Zinsmeister and Hirst 2013) whichminimizes the rate of poorly qualified annotators

We design our annotation task as follows A worker is shown adescription of a particular (down)trend candidate along with the list ofpossible (down)trend tags Descriptions are mixed and presented inrandom order The temporal presentation order of candidates is alsoarbitrary The worker is asked to pick from the list with (down)trendtags an item that matches the cluster (resp candidate) best Evenif there are several items that are a possible match the worker mustchoose only one If there is no matching item the worker can selectthe option other Three different workers evaluated each cluster andthe final label is the majority label If there is no majority label wepresent the cluster repeatedly to the workers until there is a majority

The descriptors of candidates are collected through the followingprocedure

bull First we review the documents assigned to a candidate cluster

bull Then we extract keywords and titles of the papers attributed tothe cluster Semantic Scholar provides keywords and titles as apart of the metadata

bull Finally we compute the top 15 most frequent keywords andrandomly select 15 titles These keywords and titles are thedescriptor of the cluster candidate presented to crowd-workers

We create the (down)trend tags based on keywords titles and meta-data such as publication venue The cluster candidates were manu-ally labeled by graduate employees (ie domain experts in the com-puter science domain) considering the abovementioned items

631 Inter-annotator Agreement

To ensure the quality of obtained annotations we computed the InterAnnotator Agreement (IAA) - a measure of how well two (or more)

8Note that the overall number of 217 refers to the number of the (down)trendcandidates and not to the number of documents Each candidate contains hundredsof documents

9Formerly called CrowdFlower

[ September 4 2020 at 1017 ndash classicthesis ]

78 benchmark for scientific trend detection

annotators can make the same decision for a particular category Todo that we used the following metrics

krippendorffrsquos α is a reliability coefficient developed to measurethe agreement among observers or measuring instruments draw-ing distinctions among typically unstructured phenomena orassign computable values to them (Krippendorff 2011) Wechoose Krippendorffrsquos α as an inter-annotator agreement mea-sure because it ndash unlike other specialized coefficients ndash is a gen-eralization of several reliability criteria and applies to situationslike ours where we have more than two annotators and a givenannotator only labels a subset of the data (ie we have to dealwith missing values)

Krippendorff (2011) has proposed several metric variations eachof which is associated with a specific set of weights We useKrippendorffrsquos α considering any number of observers and miss-ing data In this case Krippendorffrsquos α for the overall numberof 217 annotated documents is α = 0798 According to Landisand Koch (1977) this value corresponds to a substantial level ofagreement

figure-eight agreement rate Figure-Eight10 provides an agree-ment score c for each annotated unit u which is based onthe majority vote of the trusted workers The score has beenproven to perform well compared to the classic metrics (Kol-hatkar Zinsmeister and Hirst 2013) We computed the averageC for the 217 annotated documents C is 0799

Since both Krippendorffrsquos α and internal Figure Eight IAA are ratherhigh we consider the obtained annotations for the benchmark to beof good quality

632 Gold Trends

We compiled a list of computer science trends for the years 2016based on the analysis of the twelve survey publications (Ankerholz2016 Augustin 2016 Brooks 2016 Frot 2016 Harriet 2015 IEEEComputer Society 2016 Markov 2015 Meyerson and Mariette 2016Nessma 2015 Reese 2015 Rivera 2016 Zaino 2016) For this yearwe identified a total of 31 trends see Table 20

Since there is a variation as to the name of a specific trend wecreated a unique name for each of them Out of the 31 trends 28(90 3) are instantiated as trends by our model that we confirmedin the crowdsourcing procedure11 We searched for the three missingtrends using a variety of techniques (ie inspection of non-trendsdowntrends random browsing of documents not assigned to trend

10Former CrowdFlower platform11The author verified this based on her judgment of equivalence of one of the 31

gold trend names in the literature with one of the trend tags

[ September 4 2020 at 1017 ndash classicthesis ]

64 evaluation measure for trend detection 79

candidates and keyword searches) Two of the missing trends donot seem to occur in the collection at all ldquoHuman Augmentationrdquo andldquoFood and Water Technologyrdquo The third missing trend ldquoCryptocurrencyand Cryptographyrdquo does occur in the collection but the occurrencerate is very small (ca 001) We do not consider these three trendsin the rest of the paper

Computer Science Trend Computer Science Trend

Autonomous Agents and Systems Natural Language Processing

Autonomous Vehicles Open Source

Big Data Analytics Privacy

Bioinformatics and (Bio) Neuroscience Quantum Computing

Biometrics and Personal Identification Recommender Systems and Social Networks

Cloud Computing and Software as a Service Reinforcement Learning

Cryptocurrency and Cryptographylowast Renewable Energy

Cyber Security Robotics

E-business Semantic Web

Food and Water Technologylowast Sentiment and Emotion Analysis

Game-based Learning Smart Cities

Games and (Virtual) Augmented Reality Supply Chains and RFIDs

Human Augmentationlowast Technology for Climate Change

MachineDeep Learning Transportation and Energy

Medical Advances and DNA Computing Wearables

Mobile Computing

Table 20 31 areas identified as computer science trends for year 2016 in themedia 28 of these trends are instantiated by our model as welland thus covered in our benchmark Topics marked with asterisk(lowast) were not found in our underlying data collection

In summary based on our analysis of meta-literature we identified28 gold trends that also occur in our collection We will use them asthe gold standard that trend detection algorithms should aim to discover

64 evaluation measure for trend detection

Whether something is a trend is a graded concept For this reasonwe adopt an evaluation measure based on a ranking of candidatesspecifically Mean Average Precision (MAP) The score denotes theaverage of the precision values of ranks of correctly identified andnon-redundant trends

We approach trend detection as computing a ranked list of sets ofdocuments where each set is a trend candidate We refer to docu-ments as trendy downtrendy and flat depending on which class theyare assigned to in our benchmark We consider a trend candidate

[ September 4 2020 at 1017 ndash classicthesis ]

80 benchmark for scientific trend detection

c as a correct recognition of gold trend t if it satisfies the followingrequirements

bull c is trendy

bull t is the largest trend in c

bull |tcap c||c| gt ρ where ρ is a coverage parameter

bull t was not earlier recognized in the list

These criteria give us a true positive (ie recognition of a gold trend)or false positive (otherwise) for every position of the ranked list anda precision score for each trend Precision for a trend that was notobserved is 0 We then compute MAP see Table 21 for an example

Gold Trends Gold Downtrends

Cloud Computing Compilers

Bioinformatics Petri Nets

Sentiment Analysis Fuzzy Sets

Privacy Routing Protocols

Trend Candidate |tcap c||c| tpfp P

c1 Cloud Computing 060 tp 100

c2 Privacy 020 fp 050

c3 Cloud Computing 070 fp 033

c4 Bioinformatics 055 tp 050

c5 Sentiment Analysis 095 tp 060

c6 Sentiment Analysis 059 fp 050

c7 Bioinformatics 041 fp 043

c8 Compilers 060 fp 038

c9 Petri Nets 080 fp 033

c10 Privacy 033 fp 030

Table 21 Example of proposed MAP evaluation (with ρ = 05) MAP is theaverage of the precision values 10 (Cloud Computing) 05 (Bioin-formatics) 06 (Sentiment Analysis) and 00 (Privacy) ie MAP =214 = 0525 True positive tp False positive fp Precision P

65 trend detection baselines

In this section we investigate the impact of four different configura-tion choices on the performance of trend detection by way of ablation

[ September 4 2020 at 1017 ndash classicthesis ]

65 trend detection baselines 81

651 Configurations

abstracts vs full texts The underlying data collection containsboth abstracts and full texts12 Our initial expectation was thatthe full text of a document comprises more information thanthe abstract and it should be a more reliable basis for trend de-tection This is one of the configurations we test in the ablationWe employ our proposed method on full text (solely documenttexts without abstracts) and abstract (only abstract texts) collec-tion separately and observe the results

stratification Due to the uneven distribution of documents overthe years the last years (with increased volume of publications)may get too much impact on results compared to earlier yearsTo investigate this we conduct an ablation experiment wherewe compare (i) randomly sampled 160k documents from theentire collection and (ii) stratified sampling In stratified sam-pling we select 10k documents for each of the years 2000 ndash 2015resulting in a sampled collection of 160k

length Lt of interval Clusters are ranked by trendiness and theresulting ranked list is evaluated To measure the trendiness orgrowth of topics over time we fit a line to an interval (ini|i0 6i lt i0 + Lt by linear regression where Lt is the length of theinterval i is one of the 16 years (2000 2015) and ni is thenumber of documents that were assigned to cluster c in thatyear As a simple default baseline we apply the regression to ahalf of an entire interval ie Lt = 8 There are nine such inter-vals in our 16-year period As the final measure of growth forthe cluster we take the maximal of the nine individual slopesTo determine how much this configuration choice affects our re-sults we also test a linear regression over the entire time spanie Lt = 16 In this case there is a single interval

clustering method We consider Gaussian Mixture Models (GMM)and k-means We use GMM with spherical type of covariancewhere each component has the same variance For both weproceed as described in Section 623

652 Experimental Setup and Results

We conducted experiments on the Semantic Scholar corpus and eval-uated a ranked list of trend candidates against the benchmark con-sidering the configurations mentioned above We adopted Mean Av-erage Precision (MAP) as the principal evaluation measure As a sec-ondary evaluation measure we computed recall at 50 (R50) whichestimates the percentage of gold trends found in the list of 50 highest-ranked trend candidates We found that the configuration (0) in Ta-

12Semantic Scholar provided the underlying dataset at the beginning of our workin 2016

[ September 4 2020 at 1017 ndash classicthesis ]

82 benchmark for scientific trend detection

ble 22 works best We then conducted ablation experiments to dis-cover the importance of the configuration choices

(0) (1) (2) (3) (4)

document part A F A A A

stratification yes yes no yes yes

Lt 8 8 8 16 8

clustering GMMsph GMMsph GMMsph GMMsph kmicro

MAP 36 (03) 07 (19) 30 (03) 32 (04) 34 (21)

R50 avg 50 (012) 25 (018) 50 (016) 46 (018) 53(014)

R50 max 61 37 53 61 61

Table 22 Ablation results Standard deviations are in parentheses GMMclustering performed with the spherical (sph) type of covarianceK-means clustering is denoted as kmicro in the ablation table

comparing (0) and (1) 13 we observed that abstracts (A) are amore reliable representation for our trend detection baselinethan full texts (F) The possible explanation for that observa-tion is that an abstract is a summary of a scientific paper thatcovers only the central points and is semantically very conciseand rich In opposite the full text contains numerous parts thatare secondary (ie future work related work) and thus mayskew the documentrsquos overall meaning

comparing (0) and (2) we recognized that stratification (yes) im-proves results compared to the setting with non-stratified (no)randomly sampled data We would expect the effect of stratifi-cation to be even more substantial for collections in which thedistribution of documents over time is more skewed than in our

comparing (0) and (3) the length of the interval is a vital config-uration choice As we assumed the 16 year interval is ratherlong and 8 year interval could be a better choice primarily ifone aims to find short-term trends

comparing (0) and (4) we recognized that topics we obtain fromk-means have a similar nature to those from GMM HoweverGMM that take variance and covariance into account and esti-mate a soft assignment still perform slightly better

66 benchmark description

Below we report the contents of the TRENDNERT benchmark14

1 Two records containing information on documents One filewith the documents assigned to (down)trends and the otherfile with documents assigned to flat topics

13Numbers (0) (1) (2) (3) (4) are the configuration choices in the ablation table14httpsdoiorg1017605OSFIODHZCT

[ September 4 2020 at 1017 ndash classicthesis ]

67 summary 83

2 A script to produce document hash codes

3 A file to map every hash code to internal IDs in the benchmark

4 A script to compute MAP (default setting is ρ = 25)

5 READMEmd a file with guidance notes

Every document in the benchmark contains the following informa-tion

bull Paper ID internal ID of documents assigned to each documentin the original Semantic Scholar collection

bull Cluster ID unique ID of meta-cluster where the document wasobserved

bull LabelTag Name of the gold (down)trend assigned by crowd-workers (eg Cyber Security Fuzzy Sets and Systems)

bull Type of (down)trend candidate trend (T ) downtrend (D) or flattopic (F)

bull Hash ID15 of each paper from the original Semantic Scholarcollection

67 summary

Emerging trend detection is a promising task for Intelligent ProcessAutomation (IPA) that allows companies and funding agencies to au-tomatically analyze extensive textual collections in order to detectemerging fields and forecast the future employing machine learningalgorithms Thus the availability of publicly available benchmarksfor trend detection is of significant importance as only thereby canone evaluate the performance and robustness of the techniques em-ployed

Within this work we release TRENDNERT ndash the first publicly availablebenchmark for (down)trend detection that offers a realistic setting fordeveloping trend detection algorithms TRENDNERT also supportsdowntrend detection ndash a significant problem that was not addressedbefore We also present several experimental findings on trend detec-tion First stratification improves trend detection if the distribution ofdocuments is skewed over time Third abstract-based trend detectionperforms better than full text if straightforward models are used

68 future work

There are possible directions for future work

bull The presented approach to estimate the trendiness of the topicby means of linear regression is straightforward and can be

15MD5-hash of First Author + Title + Year

[ September 4 2020 at 1017 ndash classicthesis ]

84 benchmark for scientific trend detection

replaced by more sophisticated versions like Kendallrsquos τ cor-relation coefficient or the Least Squares Regression Smoother(LOESS) (Cleveland 1979)

bull It also would be challenging to replace the k-means clustering al-gorithm that we adopted within this work with the LDA modelwhile retaining the document representations as it was donein Dieng Ruiz and Blei 2019 The case to investigate wouldbe then whether the topic modeling outperforms the simpleclustering algorithm in this particular setting And would theresulting system benefit from the topic modeling algorithm

bull Furthermore it would be of interest to conduct more researchon the following What kind of document representation canmake effective use of the information in full texts Recentlyproposed contextualised word embeddings could be a possibledirection for these experiments

[ September 4 2020 at 1017 ndash classicthesis ]

7D I S C U S S I O N A N D F U T U R E W O R K

As we observed in the two parts of this thesis Intelligent Process Au-tomation (IPA) is a challenging research area due to its multifaceted-ness and appliance in a real-world scenario Its main objective is toautomatize time-consuming and routine tasks and free up the knowl-edge worker for more cognitively demanding tasks However thistask addresses many difficulties such as a lack of resources for bothtraining and evaluation of algorithms as well as an insufficient per-formance of machine learning techniques when applied to real-worlddata and practical problems In this thesis we investigated two IPA

applications within the Natural Language Processing (NLP) domainndash conversational agents and predictive analytics ndash as well as addressedsome of the most common issues

bull In the first part of the thesis we have addressed the challenge ofimplementing a robust conversational system for IPA purposeswithin the practical e-learning domain considering the condi-tions of missing training data The primary purpose of the re-sulting system is to perform the onboarding process betweenthe tutors and students This onboarding reduces repetitiveand time-consuming activities while allowing tutors to focuson solely mathematical questions which is the core task withthe most impact on students Besides that by interacting withusers the system augments the resources with structured andlabeled training data that we used to implement learnable dia-logue components

The realization of such a system was associated with severalchallenges Among others were missing structured data andambiguous user-generated content that requires a precise lan-guage understanding Also the system that simultaneouslycommunicates with a large number of users has to maintainadditional meta policies such as fallback and check-up policiesto hold the conversation intelligently

bull Moreover we have shown the rule-based dialogue systemrsquos po-tential extension using transfer learning We utilized a smalldataset of structured dialogues obtained in a trial-run of therule-based system and applied the current state-of-the-art Bidi-rectional Encoder Representations from Transformers (BERT) ar-chitecture to solve the domain-specific tasks Specifically weretrained two of three components in the conversational systemndash Named Entity Recognition (NER) and Dialogue Manager (DM)(ie NAP task) ndash and confirmed the applicability of such algo-

85

[ September 4 2020 at 1017 ndash classicthesis ]

86 discussion and future work

rithms in a real-world low-resource scenario by showing thereasonably high performance of resulting models

bull Last but not least in the second part of the thesis we addressedthe issue of missing publicly available benchmarks for the eval-uation of machine learning algorithms for the trend detectiontask To the best of our knowledge we released the first openbenchmark for this purpose which offers a realistic setting fordeveloping trend detection algorithms Besides that this bench-mark supports downtrend detection ndash a significant problemthat was not sufficiently addressed before We additionally pro-posed the evaluation measure and presented several experimen-tal findings on trend detection

The ultimate objective of Intelligent Process Automation (IPA) shouldbe a creation of robust algorithms in low-resource and domain-specificconditions This is still a challenging task however some of the theexperiments presented in this thesis revealed that this goal could bepotentially achieved by means of Transfer Learning (TL) techniques

[ September 4 2020 at 1017 ndash classicthesis ]

B I B L I O G R A P H Y

Agrawal Amritanshu Wei Fu and Tim Menzies (2018) bdquoWhat iswrong with topic modeling And how to fix it using search-basedsoftware engineeringldquo In Information and Software Technology 98pp 74ndash88 (cit on p 66)

Alan M (1950) bdquoTuringldquo In Computing machinery and intelligenceMind 59236 pp 433ndash460 (cit on p 7)

Alghamdi Rubayyi and Khalid Alfalqi (2015) bdquoA survey of topicmodeling in text miningldquo In Int J Adv Comput Sci Appl(IJACSA)61 (cit on pp 64ndash66)

Ammar Waleed Dirk Groeneveld Chandra Bhagavatula Iz BeltagyMiles Crawford Doug Downey Jason Dunkelberger Ahmed El-gohary Sergey Feldman Vu Ha et al (2018) bdquoConstruction ofthe literature graph in semantic scholarldquo In arXiv preprint arXiv1805 02262 (cit on p 72)

Ankerholz Amber (2016) 2016 Future of Open Source Survey Says OpenSource Is The Modern Architecture httpswwwlinuxcomnews2016-future-open-source-survey-says-open-source-modern-

architecture (cit on p 78)Asooja Kartik Georgeta Bordea Gabriela Vulcu and Paul Buitelaar

(2016) bdquoForecasting emerging trends from scientific literatureldquoIn Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) pp 417ndash420 (cit on p 68)

Augustin Jason (2016) Emergin Science and Technology Trends A Syn-thesis of Leading Forecast httpwwwdefenseinnovationmarket-placemilresources2016_SciTechReport_16June2016pdf (citon p 78)

Blei David M and John D Lafferty (2006) bdquoDynamic topic modelsldquoIn Proceedings of the 23rd international conference on Machine learn-ing ACM pp 113ndash120 (cit on pp 66 67)

Blei David M John D Lafferty et al (2007) bdquoA correlated topic modelof scienceldquo In The Annals of Applied Statistics 11 pp 17ndash35 (citon p 66)

Blei David M Andrew Y Ng and Michael I Jordan (2003) bdquoLatentdirichlet allocationldquo In Journal of machine Learning research 3Janpp 993ndash1022 (cit on p 66)

Bloem Peter (2019) Transformers from Scratch httpwwwpeterbloemnlblogtransformers (cit on p 45)

Bocklisch Tom Joey Faulkner Nick Pawlowski and Alan Nichol(2017) bdquoRasa Open source language understanding and dialoguemanagementldquo In arXiv preprint arXiv171205181 (cit on p 15)

Bolelli Levent Seyda Ertekin and C Giles (2009) bdquoTopic and trenddetection in text collections using latent dirichlet allocationldquo InAdvances in Information Retrieval pp 776ndash780 (cit on p 66)

87

[ September 4 2020 at 1017 ndash classicthesis ]

88 Bibliography

Bolelli Levent Seyda Ertekin Ding Zhou and C Lee Giles (2009)bdquoFinding topic trends in digital librariesldquo In Proceedings of the 9thACMIEEE-CS joint conference on Digital libraries ACM pp 69ndash72

(cit on p 66)Bordes Antoine Y-Lan Boureau and Jason Weston (2016) bdquoLearning

end-to-end goal-oriented dialogldquo In arXiv preprint arXiv160507683(cit on pp 16 17)

Brooks Chuck (2016) 7 Top Tech Trends Impacting Innovators in 2016httpinnovationexcellencecomblog201512267-top-

tech-trends-impacting-innovators-in-2016 (cit on p 78)Burtsev Mikhail Alexander Seliverstov Rafael Airapetyan Mikhail

Arkhipov Dilyara Baymurzina Eugene Botvinovsky NickolayBushkov Olga Gureenkova Andrey Kamenev Vasily Konovalovet al (2018) bdquoDeepPavlov An Open Source Library for Conver-sational AIldquo In (cit on p 15)

Chang Jonathan David M Blei et al (2010) bdquoHierarchical relationalmodels for document networksldquo In The Annals of Applied Statis-tics 41 pp 124ndash150 (cit on p 66)

Chen Hongshen Xiaorui Liu Dawei Yin and Jiliang Tang (2017)bdquoA survey on dialogue systems Recent advances and new fron-tiersldquo In Acm Sigkdd Explorations Newsletter 192 pp 25ndash35 (citon pp 14ndash16)

Chen Lingzhen Alessandro Moschitti Giuseppe Castellucci AndreaFavalli and Raniero Romagnoli (2018) bdquoTransfer Learning for In-dustrial Applications of Named Entity Recognitionldquo In NL4AIAI IA pp 129ndash140 (cit on p 43)

Cleveland William S (1979) bdquoRobust locally weighted regression andsmoothing scatterplotsldquo In Journal of the American statistical asso-ciation 74368 pp 829ndash836 (cit on p 84)

Collobert Ronan Jason Weston Leacuteon Bottou Michael Karlen KorayKavukcuoglu and Pavel Kuksa (2011) bdquoNatural language pro-cessing (almost) from scratchldquo In Journal of machine learning re-search 12Aug pp 2493ndash2537 (cit on p 16)

Cuayaacutehuitl Heriberto Simon Keizer and Oliver Lemon (2015) bdquoStrate-gic dialogue management via deep reinforcement learningldquo InarXiv preprint arXiv151108099 (cit on p 14)

Cui Lei Shaohan Huang Furu Wei Chuanqi Tan Chaoqun Duanand Ming Zhou (2017) bdquoSuperagent A customer service chat-bot for e-commerce websitesldquo In Proceedings of ACL 2017 SystemDemonstrations pp 97ndash102 (cit on p 8)

Curiskis Stephan A Barry Drake Thomas R Osborn and Paul JKennedy (2019) bdquoAn evaluation of document clustering and topicmodelling in two online social networks Twitter and Redditldquo InInformation Processing amp Management (cit on p 74)

Dai Zihang Zhilin Yang Yiming Yang William W Cohen Jaime Car-bonell Quoc V Le and Ruslan Salakhutdinov (2019) bdquoTransformer-xl Attentive language models beyond a fixed-length contextldquo InarXiv preprint arXiv190102860 (cit on pp 43 44)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 89

Daniel Ben (2015) bdquoBig Data and analytics in higher education Op-portunities and challengesldquo In British journal of educational tech-nology 465 pp 904ndash920 (cit on p 22)

Deerwester Scott Susan T Dumais George W Furnas Thomas K Lan-dauer and Richard Harshman (1990) bdquoIndexing by latent seman-tic analysisldquo In Journal of the American society for information sci-ence 416 pp 391ndash407 (cit on p 65)

Deoras Anoop Kaisheng Yao Xiaodong He Li Deng Geoffrey Ger-son Zweig Ruhi Sarikaya Dong Yu Mei-Yuh Hwang and Gre-goire Mesnil (2015) Assignment of semantic labels to a sequence ofwords using neural network architectures US Patent App 14016186

(cit on p 13)Devlin Jacob Ming-Wei Chang Kenton Lee and Kristina Toutanova

(2018) bdquoBert Pre-training of deep bidirectional transformers forlanguage understandingldquo In arXiv preprint arXiv181004805 (citon pp 42ndash45)

Dhingra Bhuwan Lihong Li Xiujun Li Jianfeng Gao Yun-NungChen Faisal Ahmed and Li Deng (2016) bdquoTowards end-to-end re-inforcement learning of dialogue agents for information accessldquoIn arXiv preprint arXiv160900777 (cit on p 16)

Dieng Adji B Francisco JR Ruiz and David M Blei (2019) bdquoTopicModeling in Embedding Spacesldquo In arXiv preprint arXiv190704907(cit on pp 74 84)

Donahue Jeff Yangqing Jia Oriol Vinyals Judy Hoffman Ning ZhangEric Tzeng and Trevor Darrell (2014) bdquoDecaf A deep convolu-tional activation feature for generic visual recognitionldquo In Inter-national conference on machine learning pp 647ndash655 (cit on p 43)

Eric Mihail and Christopher D Manning (2017) bdquoKey-value retrievalnetworks for task-oriented dialogueldquo In arXiv preprint arXiv 170505414 (cit on p 16)

Fang Hao Hao Cheng Maarten Sap Elizabeth Clark Ari HoltzmanYejin Choi Noah A Smith and Mari Ostendorf (2018) bdquoSoundingboard A user-centric and content-driven social chatbotldquo In arXivpreprint arXiv180410202 (cit on p 10)

Friedrich Fabian Jan Mendling and Frank Puhlmann (2011) bdquoPro-cess model generation from natural language textldquo In Interna-tional Conference on Advanced Information Systems Engineering pp 482

ndash496 (cit on p 2)Frot Mathilde (2016) 5 Trends in Computer Science Research https

wwwtopuniversitiescomcoursescomputer-science-info-

rmation-systems5-trends-computer-science-research (citon p 78)

Glasmachers Tobias (2017) bdquoLimits of end-to-end learningldquo In arXivpreprint arXiv170408305 (cit on pp 16 17)

Goddeau David Helen Meng Joseph Polifroni Stephanie Seneffand Senis Busayapongchai (1996) bdquoA form-based dialogue man-ager for spoken language applicationsldquo In Proceeding of FourthInternational Conference on Spoken Language Processing ICSLPrsquo96Vol 2 IEEE pp 701ndash704 (cit on p 14)

[ September 4 2020 at 1017 ndash classicthesis ]

90 Bibliography

Gohr Andre Alexander Hinneburg Rene Schult and Myra Spiliopou-lou (2009) bdquoTopic evolution in a stream of documentsldquo In Pro-ceedings of the 2009 SIAM International Conference on Data MiningSIAM pp 859ndash870 (cit on p 65)

Gollapalli Sujatha Das and Xiaoli Li (2015) bdquoEMNLP versus ACLAnalyzing NLP research over timeldquo In EMNLP pp 2002ndash2006

(cit on p 72)Griffiths Thomas L and Mark Steyvers (2004) bdquoFinding scientific top-

icsldquo In Proceedings of the National academy of Sciences 101suppl 1pp 5228ndash5235 (cit on p 65)

Griffiths Thomas L Michael I Jordan Joshua B Tenenbaum andDavid M Blei (2004) bdquoHierarchical topic models and the nestedChinese restaurant processldquo In Advances in neural information pro-cessing systems pp 17ndash24 (cit on p 66)

Hall David Daniel Jurafsky and Christopher D Manning (2008) bdquoStu-dying the history of ideas using topic modelsldquo In Proceedingsof the conference on empirical methods in natural language processingAssociation for Computational Linguistics pp 363ndash371 (cit onp 66)

Harriet Taylor (2015) Privacy will hit tipping point in 2016 httpwwwcnbccom20151109privacy-will-hit-tipping-point-

in-2016html (cit on p 78)He Jiangen and Chaomei Chen (2018) bdquoPredictive Effects of Novelty

Measured by Temporal Embeddings on the Growth of ScientificLiteratureldquo In Frontiers in Research Metrics and Analytics 3 p 9

(cit on pp 64 68 71)He Junxian Zhiting Hu Taylor Berg-Kirkpatrick Ying Huang and

Eric P Xing (2017) bdquoEfficient correlated topic modeling with topicembeddingldquo In Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ACM pp 225ndash233 (cit on pp 66 67)

He Qi Bi Chen Jian Pei Baojun Qiu Prasenjit Mitra and Lee Giles(2009) bdquoDetecting topic evolution in scientific literature How cancitations helpldquo In Proceedings of the 18th ACM conference on In-formation and knowledge management ACM pp 957ndash966 (cit onp 68)

Henderson Matthew Blaise Thomson and Jason D Williams (2014)bdquoThe second dialog state tracking challengeldquo In Proceedings of the15th Annual Meeting of the Special Interest Group on Discourse andDialogue (SIGDIAL) pp 263ndash272 (cit on p 15)

Henderson Matthew Blaise Thomson and Steve Young (2013) bdquoDeepneural network approach for the dialog state tracking challengeldquoIn Proceedings of the SIGDIAL 2013 Conference pp 467ndash471 (cit onp 14)

Hess Ann Hari Iyer and William Malm (2001) bdquoLinear trend analy-sis a comparison of methodsldquo In Atmospheric Environment 3530Visibility Aerosol and Atmospheric Optics pp 5211 ndash5222 issn1352-2310 doi httpsdoiorg101016S1352- 2310(01)00342-9 url httpwwwsciencedirectcomsciencearticlepiiS1352231001003429 (cit on p 75)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 91

Hofmann Thomas (1999) bdquoProbabilistic latent semantic analysisldquo InProceedings of the Fifteenth conference on Uncertainty in artificial in-telligence Morgan Kaufmann Publishers Inc pp 289ndash296 (cit onp 65)

Honnibal Matthew and Ines Montani (2017) bdquospacy 2 Natural lan-guage understanding with bloom embeddings convolutional neu-ral networks and incremental parsingldquo In To appear 7 (cit onp 15)

Howard Jeremy and Sebastian Ruder (2018) bdquoUniversal languagemodel fine-tuning for text classificationldquo In arXiv preprint arXiv180106146 (cit on p 44)

IEEE Computer Society (2016) Top 9 Computing Technology Trends for2016 httpswwwscientificcomputingcomnews201601top-9-computing-technology-trends-2016 (cit on p 78)

Ivancic Lucija Dalia Suša Vugec and Vesna Bosilj Vukšic (2019) bdquoRobo-tic Process Automation Systematic Literature Reviewldquo In In-ternational Conference on Business Process Management Springerpp 280ndash295 (cit on pp 1 2)

Jain Mohit Pratyush Kumar Ramachandra Kota and Shwetak NPatel (2018) bdquoEvaluating and informing the design of chatbotsldquoIn Proceedings of the 2018 Designing Interactive Systems ConferenceACM pp 895ndash906 (cit on p 8)

Khanpour Hamed Nishitha Guntakandla and Rodney Nielsen (2016)bdquoDialogue act classification in domain-independent conversationsusing a deep recurrent neural networkldquo In Proceedings of COL-ING 2016 the 26th International Conference on Computational Lin-guistics Technical Papers pp 2012ndash2021 (cit on p 13)

Kim Young-Bum Karl Stratos Ruhi Sarikaya and Minwoo Jeong(2015) bdquoNew transfer learning techniques for disparate label setsldquoIn Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference onNatural Language Processing (Volume 1 Long Papers) pp 473ndash482

(cit on p 43)Kolhatkar Varada Heike Zinsmeister and Graeme Hirst (2013) bdquoAn-

notating anaphoric shell nouns with their antecedentsldquo In Pro-ceedings of the 7th Linguistic Annotation Workshop and Interoperabil-ity with Discourse pp 112ndash121 (cit on pp 77 78)

Kontostathis April Leon M Galitsky William M Pottenger SomaRoy and Daniel J Phelps (2004) bdquoA survey of emerging trend de-tection in textual data miningldquo In Survey of text mining Springerpp 185ndash224 (cit on p 63)

Krippendorff Klaus (2011) bdquoComputing Krippendorffrsquos alpha- relia-bilityldquo In Annenberg School for Communication (ASC) DepartmentalPapers 43 (cit on p 78)

Kurata Gakuto Bing Xiang Bowen Zhou and Mo Yu (2016) bdquoLever-aging sentence-level information with encoder lstm for semanticslot fillingldquo In arXiv preprint arXiv160101530 (cit on p 13)

Landis J Richard and Gary G Koch (1977) bdquoThe measurement ofobserver agreement for categorical dataldquo In biometrics pp 159ndash174 (cit on p 78)

[ September 4 2020 at 1017 ndash classicthesis ]

92 Bibliography

Le Minh-Hoang Tu-Bao Ho and Yoshiteru Nakamori (2005) bdquoDe-tecting emerging trends from scientific corporaldquo In InternationalJournal of Knowledge and Systems Sciences 22 pp 53ndash59 (cit onp 68)

Le Quoc V and Tomas Mikolov (2014) bdquoDistributed Representationsof Sentences and Documentsldquo In ICML Vol 14 pp 1188ndash1196

(cit on p 74)Lee Ji Young and Franck Dernoncourt (2016) bdquoSequential short-text

classification with recurrent and convolutional neural networksldquoIn arXiv preprint arXiv160303827 (cit on p 13)

Leopold Henrik Han van der Aa and Hajo A Reijers (2018) bdquoIden-tifying candidate tasks for robotic process automation in textualprocess descriptionsldquo In Enterprise Business-Process and Informa-tion Systems Modeling Springer pp 67ndash81 (cit on p 2)

Levenshtein Vladimir I (1966) bdquoBinary codes capable of correctingdeletions insertions and reversalsldquo In Soviet physics doklady 8pp 707ndash710 (cit on p 25)

Liao Vera Masud Hussain Praveen Chandar Matthew Davis Yasa-man Khazaeni Marco Patricio Crasso Dakuo Wang Michael Mu-ller N Sadat Shami Werner Geyer et al (2018) bdquoAll Work andNo Playldquo In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems ACM p 3 (cit on p 8)

Lowe Ryan Thomas Nissan Pow Iulian Vlad Serban Laurent Char-lin Chia-Wei Liu and Joelle Pineau (2017) bdquoTraining end-to-enddialogue systems with the ubuntu dialogue corpusldquo In Dialogueamp Discourse 81 pp 31ndash65 (cit on p 22)

Luger Ewa and Abigail Sellen (2016) bdquoLike having a really bad PAthe gulf between user expectation and experience of conversa-tional agentsldquo In Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems ACM pp 5286ndash5297 (cit on p 8)

MacQueen James (1967) bdquoSome methods for classification and analy-sis of multivariate observationsldquo In Proceedings of the fifth Berkeleysymposium on mathematical statistics and probability Vol 1 OaklandCA USA pp 281ndash297 (cit on p 74)

Manning Christopher D Prabhakar Raghavan and Hinrich Schuumltze(2008) Introduction to Information Retrieval Web publication at in-formationretrievalorg Cambridge University Press (cit on p 74)

Markov Igor (2015) 13 Of 2015rsquos Hottest Topics In Computer ScienceResearch httpswwwforbescomsitesquora2015042213-of-2015s-hottest-topics-in-computer-science-research

fb0d46b1e88d (cit on p 78)Martin James H and Daniel Jurafsky (2009) Speech and language pro-

cessing An introduction to natural language processing computationallinguistics and speech recognition PearsonPrentice Hall Upper Sad-dle River

Mei Qiaozhu Deng Cai Duo Zhang and ChengXiang Zhai (2008)bdquoTopic modeling with network regularizationldquo In Proceedings ofthe 17th international conference on World Wide Web ACM pp 101ndash110 (cit on p 65)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 93

Meyerson Bernard and DiChristina Mariette (2016) These are the top10 emerging technologies of 2016 httpwww3weforumorgdocsGAC16_Top10_Emerging_Technologies_2016_reportpdf (cit onp 78)

Mikolov Tomas Kai Chen Greg Corrado and Jeffrey Dean (2013)bdquoEfficient estimation of word representations in vector spaceldquo InarXiv preprint arXiv13013781 (cit on p 74)

Miller Alexander Adam Fisch Jesse Dodge Amir-Hossein KarimiAntoine Bordes and Jason Weston (2016) bdquoKey-value memorynetworks for directly reading documentsldquo In (cit on p 16)

Mohanty Soumendra and Sachin Vyas (2018) bdquoHow to Compete inthe Age of Artificial Intelligenceldquo In (cit on pp III V 1 2)

Moiseeva Alena and Hinrich Schuumltze (2020) bdquoTRENDNERT A Bench-mark for Trend and Downtrend Detection in a Scientific DomainldquoIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial In-telligence (cit on p 71)

Moiseeva Alena Dietrich Trautmann Michael Heimann and Hin-rich Schuumltze (2020) bdquoMultipurpose Intelligent Process Automa-tion via Conversational Assistantldquo In arXiv preprint arXiv200102284 (cit on pp 21 41)

Monaghan Fergal Georgeta Bordea Krystian Samp and Paul Buite-laar (2010) bdquoExploring your research Sprinkling some saffron onsemantic web dog foodldquo In Semantic Web Challenge at the Inter-national Semantic Web Conference Vol 117 Citeseer pp 420ndash435

(cit on p 68)Monostori Laacuteszloacute Andraacutes Maacuterkus Hendrik Van Brussel and E West-

kaumlmpfer (1996) bdquoMachine learning approaches to manufactur-ingldquo In CIRP annals 452 pp 675ndash712 (cit on p 15)

Moody Christopher E (2016) bdquoMixing dirichlet topic models andword embeddings to make lda2vecldquo In arXiv preprint arXiv160502019 (cit on p 74)

Mou Lili Zhao Meng Rui Yan Ge Li Yan Xu Lu Zhang and ZhiJin (2016) bdquoHow transferable are neural networks in nlp applica-tionsldquo In arXiv preprint arXiv160306111 (cit on p 43)

Mrkšic Nikola Diarmuid O Seacuteaghdha Tsung-Hsien Wen Blaise Thom-son and Steve Young (2016) bdquoNeural belief tracker Data-drivendialogue state trackingldquo In arXiv preprint arXiv160603777 (citon p 14)

Nessma Joussef (2015) Top 10 Hottest Research Topics in Computer Sci-ence http www pouted com top - 10 - hottest - research -

topics-in-computer-science (cit on p 78)Nie Binling and Shouqian Sun (2017) bdquoUsing text mining techniques

to identify research trends A case study of design researchldquo InApplied Sciences 74 p 401 (cit on p 68)

Pan Sinno Jialin and Qiang Yang (2009) bdquoA survey on transfer learn-ingldquo In IEEE Transactions on knowledge and data engineering 2210pp 1345ndash1359 (cit on p 41)

Qu Lizhen Gabriela Ferraro Liyuan Zhou Weiwei Hou and Tim-othy Baldwin (2016) bdquoNamed entity recognition for novel types

[ September 4 2020 at 1017 ndash classicthesis ]

94 Bibliography

by transfer learningldquo In arXiv preprint arXiv161009914 (cit onp 43)

Radford Alec Jeffrey Wu Rewon Child David Luan Dario Amodeiand Ilya Sutskever (2019) bdquoLanguage models are unsupervisedmultitask learnersldquo In OpenAI Blog 18 (cit on p 44)

Reese Hope (2015) 7 trends for artificial intelligence in 2016 rsquoLike 2015on steroidsrsquo httpwwwtechrepubliccomarticle7-trends-for-artificial-intelligence-in-2016-like-2015-on-steroids

(cit on p 78)Rivera Maricel (2016) Is Digital Game-Based Learning The Future Of

Learning httpselearningindustrycomdigital-game-based-

learning-future (cit on p 78)Rotolo Daniele Diana Hicks and Ben R Martin (2015) bdquoWhat is an

emerging technologyldquo In Research Policy 4410 pp 1827ndash1843

(cit on p 64)Salatino Angelo (2015) bdquoEarly detection and forecasting of research

trendsldquo In (cit on pp 63 66)Serban Iulian V Alessandro Sordoni Yoshua Bengio Aaron Courville

and Joelle Pineau (2016) bdquoBuilding end-to-end dialogue systemsusing generative hierarchical neural network modelsldquo In Thirti-eth AAAI Conference on Artificial Intelligence (cit on p 11)

Sharif Razavian Ali Hossein Azizpour Josephine Sullivan and Ste-fan Carlsson (2014) bdquoCNN features off-the-shelf an astoundingbaseline for recognitionldquo In Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops pp 806ndash813 (citon p 43)

Shibata Naoki Yuya Kajikawa and Yoshiyuki Takeda (2009) bdquoCom-parative study on methods of detecting research fronts using dif-ferent types of citationldquo In Journal of the American Society for in-formation Science and Technology 603 pp 571ndash580 (cit on p 68)

Shibata Naoki Yuya Kajikawa Yoshiyuki Takeda and KatsumoriMatsushima (2008) bdquoDetecting emerging research fronts basedon topological measures in citation networks of scientific publica-tionsldquo In Technovation 2811 pp 758ndash775 (cit on p 68)

Small Henry (2006) bdquoTracking and predicting growth areas in sci-enceldquo In Scientometrics 683 pp 595ndash610 (cit on p 68)

Small Henry Kevin W Boyack and Richard Klavans (2014) bdquoIden-tifying emerging topics in science and technologyldquo In ResearchPolicy 438 pp 1450ndash1467 (cit on p 64)

Su Pei-Hao Nikola Mrkšic Intildeigo Casanueva and Ivan Vulic (2018)bdquoDeep Learning for Conversational AIldquo In Proceedings of the 2018Conference of the North American Chapter of the Association for Com-putational Linguistics Tutorial Abstracts pp 27ndash32 (cit on p 12)

Sutskever Ilya Oriol Vinyals and Quoc V Le (2014) bdquoSequence tosequence learning with neural networksldquo In Advances in neuralinformation processing systems pp 3104ndash3112 (cit on pp 16 17)

Trautmann Dietrich Johannes Daxenberger Christian Stab HinrichSchuumltze and Iryna Gurevych (2019) bdquoRobust Argument Unit Recog-nition and Classificationldquo In arXiv preprint arXiv190409688 (citon p 41)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 95

Tu Yi-Ning and Jia-Lang Seng (2012) bdquoIndices of novelty for emerg-ing topic detectionldquo In Information processing amp management 482pp 303ndash325 (cit on p 64)

Turing Alan (1949) London Times letter to the editor (cit on p 7)Ultes Stefan Lina Barahona Pei-Hao Su David Vandyke Dongho

Kim Inigo Casanueva Paweł Budzianowski Nikola Mrkšic Tsu-ng-Hsien Wen et al (2017) bdquoPydial A multi-domain statistical di-alogue system toolkitldquo In Proceedings of ACL 2017 System Demon-strations pp 73ndash78 (cit on p 15)

Vaswani Ashish Noam Shazeer Niki Parmar Jakob Uszkoreit LlionJones Aidan N Gomez Łukasz Kaiser and Illia Polosukhin (2017)bdquoAttention is all you needldquo In Advances in neural information pro-cessing systems pp 5998ndash6008 (cit on p 45)

Vincent Pascal Hugo Larochelle Yoshua Bengio and Pierre-AntoineManzagol (2008) bdquoExtracting and composing robust features withdenoising autoencodersldquo In Proceedings of the 25th internationalconference on Machine learning ACM pp 1096ndash1103 (cit on p 44)

Vodolaacuten Miroslav Rudolf Kadlec and Jan Kleindienst (2015) bdquoHy-brid dialog state trackerldquo In arXiv preprint arXiv151003710 (citon p 14)

Wang Alex Amanpreet Singh Julian Michael Felix Hill Omer Levyand Samuel R Bowman (2018) bdquoGlue A multi-task benchmarkand analysis platform for natural language understandingldquo InarXiv preprint arXiv180407461 (cit on p 44)

Wang Chong and David M Blei (2011) bdquoCollaborative topic modelingfor recommending scientific articlesldquo In Proceedings of the 17thACM SIGKDD international conference on Knowledge discovery anddata mining ACM pp 448ndash456 (cit on p 66)

Wang Xuerui and Andrew McCallum (2006) bdquoTopics over time anon-Markov continuous-time model of topical trendsldquo In Pro-ceedings of the 12th ACM SIGKDD international conference on Knowl-edge discovery and data mining ACM pp 424ndash433 (cit on pp 65ndash67)

Wang Zhuoran and Oliver Lemon (2013) bdquoA simple and generic be-lief tracking mechanism for the dialog state tracking challengeOn the believability of observed informationldquo In Proceedings ofthe SIGDIAL 2013 Conference pp 423ndash432 (cit on p 14)

Weizenbaum Joseph et al (1966) bdquoELIZAmdasha computer program forthe study of natural language communication between man andmachineldquo In Communications of the ACM 91 pp 36ndash45 (cit onp 7)

Wen Tsung-Hsien Milica Gasic Nikola Mrksic Pei-Hao Su DavidVandyke and Steve Young (2015) bdquoSemantically conditioned lstm-based natural language generation for spoken dialogue systemsldquoIn arXiv preprint arXiv150801745 (cit on p 14)

Wen Tsung-Hsien David Vandyke Nikola Mrksic Milica Gasic LinaM Rojas-Barahona Pei-Hao Su Stefan Ultes and Steve Young(2016) bdquoA network-based end-to-end trainable task-oriented dia-logue systemldquo In arXiv preprint arXiv160404562 (cit on pp 1116 17 22)

[ September 4 2020 at 1017 ndash classicthesis ]

96 Bibliography

Werner Alexander Dietrich Trautmann Dongheui Lee and RobertoLampariello (2015) bdquoGeneralization of optimal motion trajecto-ries for bipedal walkingldquo In pp 1571ndash1577 (cit on p 41)

Williams Jason D Kavosh Asadi and Geoffrey Zweig (2017) bdquoHybridcode networks practical and efficient end-to-end dialog controlwith supervised and reinforcement learningldquo In arXiv preprintarXiv170203274 (cit on p 17)

Williams Jason Antoine Raux and Matthew Henderson (2016) bdquoThedialog state tracking challenge series A reviewldquo In Dialogue ampDiscourse 73 pp 4ndash33 (cit on p 12)

Wu Chien-Sheng Richard Socher and Caiming Xiong (2019) bdquoGlobal-to-local Memory Pointer Networks for Task-Oriented DialogueldquoIn arXiv preprint arXiv190104713 (cit on pp 15ndash17)

Wu Yonghui Mike Schuster Zhifeng Chen Quoc V Le Moham-mad Norouzi Wolfgang Macherey Maxim Krikun Yuan CaoQin Gao Klaus Macherey et al (2016) bdquoGooglersquos neural ma-chine translation system Bridging the gap between human andmachine translationldquo In arXiv preprint arXiv160908144 (cit onp 45)

Xie Pengtao and Eric P Xing (2013) bdquoIntegrating Document Cluster-ing and Topic Modelingldquo In Proceedings of the 30th Conference onUncertainty in Artificial Intelligence (cit on p 74)

Yan Zhao Nan Duan Peng Chen Ming Zhou Jianshe Zhou andZhoujun Li (2017) bdquoBuilding task-oriented dialogue systems foronline shoppingldquo In Thirty-First AAAI Conference on Artificial In-telligence (cit on p 14)

Yogatama Dani Michael Heilman Brendan OrsquoConnor Chris DyerBryan R Routledge and Noah A Smith (2011) bdquoPredicting a scien-tific communityrsquos response to an articleldquo In Proceedings of the Con-ference on Empirical Methods in Natural Language Processing Asso-ciation for Computational Linguistics pp 594ndash604 (cit on p 65)

Young Steve Milica Gašic Blaise Thomson and Jason D Williams(2013) bdquoPomdp-based statistical spoken dialog systems A reviewldquoIn Proceedings of the IEEE 1015 pp 1160ndash1179 (cit on p 17)

Zaino Jennifer (2016) 2016 Trends for Semantic Web and Semantic Tech-nologies http www dataversity net 2017 - predictions -

semantic-web-semantic-technologies (cit on p 78)Zhou Hao Minlie Huang et al (2016) bdquoContext-aware natural lan-

guage generation for spoken dialogue systemsldquo In Proceedingsof COLING 2016 the 26th International Conference on ComputationalLinguistics Technical Papers pp 2032ndash2041 (cit on pp 14 15)

Zhu Yukun Ryan Kiros Rich Zemel Ruslan Salakhutdinov RaquelUrtasun Antonio Torralba and Sanja Fidler (2015) bdquoAligningbooks and movies Towards story-like visual explanations by watch-ing movies and reading booksldquo In Proceedings of the IEEE interna-tional conference on computer vision pp 19ndash27 (cit on p 44)

[ September 4 2020 at 1017 ndash classicthesis ]

  • Abstract
  • Abstract
  • Acknowledgements
  • Contents
  • List of Figures
    • List of Figures
      • List of Tables
        • List of Tables
          • Acronyms
            • Acronyms
              • 1 Introduction
                • 11 Outline
                  • E-learning Conversational Assistant
                    • 2 Foundations
                      • 21 Introduction
                      • 22 Conversational Assistants
                        • 221 Taxonomy
                        • 222 Conventional Architecture
                        • 223 Existing Approaches
                          • 23 Target Domain amp Task Definition
                          • 24 Summary
                            • 3 Multipurpose IPA via Conversational Assistant
                              • 31 Introduction
                                • 311 Outline and Contributions
                                  • 32 Model
                                    • 321 OMB+ Design
                                    • 322 Preprocessing
                                    • 323 Natural Language Understanding
                                    • 324 Dialogue Manager
                                    • 325 Meta Policy
                                    • 326 Response Generation
                                      • 33 Evaluation
                                        • 331 Automated Evaluation
                                        • 332 Human Evaluation amp Error Analysis
                                          • 34 Structured Dialogue Acquisition
                                          • 35 Summary
                                          • 36 Future Work
                                            • 4 Deep Learning Re-Implementation of Units
                                              • 41 Introduction
                                                • 411 Outline and Contributions
                                                  • 42 Related Work
                                                  • 43 BERT Bidirectional Encoder Representations from Transformers
                                                  • 44 Data amp Descriptive Statistics
                                                  • 45 Named Entity Recognition
                                                    • 451 Model Settings
                                                      • 46 Next Action Prediction
                                                        • 461 Model Settings
                                                          • 47 Evaluation and Results
                                                          • 48 Error Analysis
                                                          • 49 Summary
                                                          • 410 Future Work
                                                              • Clustering-based Trend Analyzer
                                                                • 5 Foundations
                                                                  • 51 Motivation and Challenges
                                                                  • 52 Detection of Emerging Research Trends
                                                                    • 521 Methods for Topic Modeling
                                                                    • 522 Topic Evolution of Scientific Publications
                                                                      • 53 Summary
                                                                        • 6 Benchmark for Scientific Trend Detection
                                                                          • 61 Motivation
                                                                            • 611 Outline and Contributions
                                                                              • 62 Corpus Creation
                                                                                • 621 Underlying Data
                                                                                • 622 Stratification
                                                                                • 623 Document representations amp Clustering
                                                                                • 624 Trend amp Downtrend Estimation
                                                                                • 625 (Down)trend Candidates Validation
                                                                                  • 63 Benchmark Annotation
                                                                                    • 631 Inter-annotator Agreement
                                                                                    • 632 Gold Trends
                                                                                      • 64 Evaluation Measure for Trend Detection
                                                                                      • 65 Trend Detection Baselines
                                                                                        • 651 Configurations
                                                                                        • 652 Experimental Setup and Results
                                                                                          • 66 Benchmark Description
                                                                                          • 67 Summary
                                                                                          • 68 Future Work
                                                                                            • 7 Discussion and Future Work
                                                                                            • Bibliography
Page 2: Statistical Natural Language Processing Methods for

Erstgutachter Prof Dr Hinrich Schuumltze

1Korreferent Prof Dr Alexander M Fraser2Korreferent Prof Dr Rudolf Seiler

tag der muumlndlichen pruumlfung 19 05 2020

[ September 4 2020 at 1017 ndash classicthesis ]

A B S T R A C T

Nowadays digitization is transforming the way businesses work Re-cently Artificial Intelligence (AI) techniques became an essential partof the automation of business processes In addition to cost advan-tages these techniques offer fast processing times and higher cus-tomer satisfaction rates thus ultimately increasing sales One of theintelligent approaches for accelerating digital transformation in com-panies is the Robotic Process Automation (RPA) A RPA-system is asoftware tool that robotizes routine and time-consuming responsibil-ities such as email assessment various calculations or creation ofdocuments and reports (Mohanty and Vyas 2018) Its main objectiveis to organize a smart workflow and therethrough to assist employeesby offering them more scope for cognitively demanding and engag-ing work

Intelligent Process Automation (IPA) offers all these advantages aswell however it goes beyond the RPA by adding AI componentssuch as Machine- and Deep Learning techniques to conventional au-tomation solutions Previously IPA approaches were primarily em-ployed within the computer vision domain However in recent timesNatural Language Processing (NLP) became one of the potential appli-cations for IPA as well due to its ability to understand and interprethuman language Usually NLP methods are used to analyze largeamounts of unstructured textual data and to respond to various in-quiries However one of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots virtual agents)that are used to enable human-to-machine communication Nowa-days conversational agents gain enormous demand due to their abil-ity to support a large number of users simultaneously while com-municating in a natural language The implementation of a conver-sational agent comprises multiple stages and involves diverse typesof NLP sub-tasks starting with natural language understanding (egintent recognition named entity extraction) and going towards dia-logue management (ie determining the next possible bots action)and response generation Typical dialogue system for IPA purposesundertakes straightforward customer support requests (eg FAQs)allowing human workers to focus on more complicated inquiries

In this thesis we are addressing two potential Intelligent ProcessAutomation (IPA) applications and employing statistical Natural Lan-guage Processing (NLP) methods for their implementation

The first block of this thesis (Chapter 2 ndash Chapter 4) deals withthe development of a conversational agent for IPA purposes within thee-learning domain As already mentioned chatbots are one of thecentral applications for the IPA domain since they can effectively per-form time-consuming tasks while communicating in a natural lan-

III

[ September 4 2020 at 1017 ndash classicthesis ]

guage Within this thesis we realized the IPA conversational botthat takes care of routine and time-consuming tasks regularly per-formed by human tutors of an online mathematical course Thisbot is deployed in a real-world setting within the OMB+ mathemat-ical platform Conducting experiments for this part we observedtwo possibilities to build the conversational agent in industrial set-tings ndash first with purely rule-based methods considering the miss-ing training data and individual aspects of the target domain (iee-learning) Second we re-implemented two of the main system com-ponents (ie Natural Language Understanding (NLU) and DialogueManager (DM) units) using the current state-of-the-art deep-learningarchitecture (ie Bidirectional Encoder Representations from Trans-formers (BERT)) and investigated their performance and potential useas a part of a hybrid model (ie containing both rule-based and ma-chine learning methods)

The second part of the thesis (Chapter 5 ndash Chapter 6) considers anIPA subproblem within the predictive analytics domain and addressesthe task of scientific trend forecasting Predictive analytics forecastsfuture outcomes based on historical and current data Therefore us-ing the benefits of advanced analytics models an organization canfor instance reliably determine trends and emerging topics and thenmanipulate it while making significant business decisions (ie invest-ments) In this work we dealt with the trend detection task ndash specif-ically we addressed the lack of publicly available benchmarks forevaluating trend detection algorithms We assembled the benchmarkfor the detection of both scientific trends and downtrends (ie topicsthat become less frequent overtime) To the best of our knowledgethe task of downtrend detection has not been addressed before Theresulting benchmark is based on a collection of more than one mil-lion documents which is among the largest that has been used fortrend detection before and therefore offers a realistic setting for thedevelopment of trend detection algorithms

IV

[ September 4 2020 at 1017 ndash classicthesis ]

Z U S A M M E N FA S S U N G

Robotergesteuerte Prozessautomatisierung (RPA) ist eine Art von Soft-ware-Bots die manuelle menschliche Taumltigkeiten wie die Eingabevon Daten in das System die Anmeldung in Benutzerkonten oderdie Ausfuumlhrung einfacher aber sich wiederholender Arbeitsablaumlufenachahmt (Mohanty and Vyas 2018) Einer der Hauptvorteile undgleichzeitig Nachteil der RPA-bots ist jedoch deren Faumlhigkeit die ge-stellte Aufgabe punktgenau zu erfuumlllen Einerseits ist ein solches Sys-tem in der Lage die Aufgabe akkurat sorgfaumlltig und schnell auszu-fuumlhren Andererseits ist es sehr anfaumlllig fuumlr Veraumlnderungen in definier-ten Szenarien Da der RPA-Bot fuumlr eine bestimmte Aufgabe konzipiertist ist es oft nicht moumlglich ihn an andere Domaumlnen oder sogar fuumlreinfache Aumlnderungen in einem Arbeitsablauf anzupassen (Mohantyand Vyas 2018) Diese Unfaumlhigkeit sich an veraumlnderte Bedingungenanzupassen fuumlhrte zu einem weiteren Verbesserungsbereich fuumlr RPA-bots ndash den Intelligenten Prozessautomatisierungssystemen (IPA)

IPA-Bots kombinieren RPA mit Kuumlnstlicher Intelligenz (AI) und koumln-nen komplexe und kognitiv anspruchsvollere Aufgaben erfuumlllen dieuA Schlussfolgerungen und natuumlrliches Sprachverstaumlndnis erfordernDiese Systeme uumlbernehmen zeitaufwaumlndige und routinemaumlszligige Auf-gaben ermoumlglichen somit einen intelligenten Arbeitsablauf und be-freien Fachkraumlfte fuumlr die Durchfuumlhrung komplizierterer AufgabenBisher wurden die IPA-Techniken hauptsaumlchlich im Bereich der Bild-verarbeitung eingesetzt In der letzten Zeit wurde die natuumlrlicheSprachverarbeitung (NLP) jedoch auch zu einem der potenziellen An-wendungen fuumlr IPA und zwar aufgrund von der Faumlhigkeit die mensch-liche Sprache zu interpretieren NLP-Methoden werden eingesetztum groszlige Mengen an Textdaten zu analysieren und auf verschiedeneAnfragen zu reagieren Auch wenn die verfuumlgbaren Daten unstruk-turiert sind oder kein vordefiniertes Format haben (zB E-Mails) oderwenn die in einem variablen Format vorliegen (zB Rechnungen ju-ristische Dokumente) dann werden ebenfalls die NLP Techniken an-gewendet um die relevanten Informationen zu extrahieren die dannzur Loumlsung verschiedener Probleme verwendet werden koumlnnen

NLP im Rahmen von IPA beschraumlnkt sich jedoch nicht auf die Extrak-tion relevanter Daten aus Textdokumenten Eine der zentralen An-wendungen von IPA sind Konversationsagenten die zur Interaktionzwischen Mensch und Maschine eingesetzt werden Konversations-agenten erfahren enorme Nachfrage da sie in der Lage sind einegroszlige Anzahl von Benutzern gleichzeitig zu unterstuumltzen und dabeiin einer natuumlrlichen Sprache kommunizieren Die Implementierungeines Chatsystems umfasst verschiedene Arten von NLP-Teilaufgabenbeginnend mit dem Verstaumlndnis der natuumlrlichen Sprache (zB Ab-sichtserkennung Extraktion von Entitaumlten) uumlber das Dialogmanage-

V

[ September 4 2020 at 1017 ndash classicthesis ]

ment (zB Festlegung der naumlchstmoumlglichen Bot-Aktion) bis hin zurResponse-Generierung Ein typisches Dialogsystem fuumlr IPA-Zweckeuumlbernimmt in der Regel unkomplizierte Kundendienstanfragen (zBBeantwortung von FAQs) so dass sich die Mitarbeiter auf komplexereAnfragen konzentrieren koumlnnen

Diese Dissertation umfasst zwei Bereiche die durch das breitereThema vereint sind naumlmlich die Intelligente Prozessautomatisierung(IPA) unter Verwendung statistischer Methoden der natuumlrlichen Sprach-verarbeitung (NLP)

Der erste Block dieser Arbeit (Kapitel 2 ndash Kapitel 4) befasst sichmit der Impementierung eines Konversationsagenten fuumlr IPA-Zweckeinnerhalb der E-Learning-Domaumlne Wie bereits erwaumlhnt sind Chat-bots eine der zentralen Anwendungen fuumlr die IPA-Domaumlne da siezeitaufwaumlndige Aufgaben in einer natuumlrlichen Sprache effektiv aus-fuumlhren koumlnnen Der IPA-Kommunikationsbot der in dieser Arbeitrealisiert wurde kuumlmmert sich ebenfalls um routinemaumlszligige und zeit-aufwaumlndige Aufgaben die sonst von Tutoren in einem Online-Mathe-matikkurs in deutscher Sprache durchgefuumlhrt werden Dieser Botist in der taumlglichen Anwendung innerhalb der mathematischen Platt-form OMB+ eingesetzt Bei der Durchfuumlhrung von Experimenten be-obachteten wir zwei Moumlglichkeiten den Konversationsagenten im in-dustriellen Umfeld zu entwickeln ndash zunaumlchst mit rein regelbasiertenMethoden unter Bedingungen der fehlenden Trainingsdaten und be-sonderer Aspekte der Zieldomaumlne (dh E-Learning) Zweitens habenwir zwei der Hauptsystemkomponenten (SprachverstaumlndnismodulDialog-Manager) mit dem derzeit fortschrittlichsten Deep LearningAlgorithmus reimplementiert und die Performanz dieser Komponen-ten untersucht

Der zweite Teil der Doktorarbeit (Kapitel 5 ndash Kapitel 6) betrachtetein IPA-Problem innerhalb des Vorhersageanalytik-Bereichs Vorher-sageanalytik zielt darauf ab Prognosen uumlber zukuumlnftige Ergebnisseauf der Grundlage von historischen und aktuellen Daten zu erstellenDaher kann ein Unternehmen mit Hilfe der Vorhersagesysteme zBdie Trends oder neu entstehende Themen zuverlaumlssig bestimmen unddiese Informationen dann bei wichtigen Geschaumlftsentscheidungen (zBInvestitionen) einsetzen In diesem Teil der Arbeit beschaumlftigen wiruns mit dem Teilproblem der Trendprognose ndash insbesondere mit demFehlen oumlffentlich zugaumlnglicher Benchmarks fuumlr die Evaluierung vonTrenderkennungsalgorithmen Wir haben den Benchmark zusammen-gestellt und veroumlffentlicht um sowohl Trends als auch Abwaumlrtstrendszu erkennen Nach unserem besten Wissen ist die Aufgabe der Ab-waumlrtstrenderkennung bisher nicht adressiert worden Der resultie-rende Benchmark basiert auf einer Sammlung von mehr als einerMillion Dokumente der zu den groumlszligten gehoumlrt die bisher fuumlr dieTrenderkennung verwendet wurden und somit einen realistischenRahmen fuumlr die Entwicklung von Trenddetektionsalgorithmen bietet

VI

[ September 4 2020 at 1017 ndash classicthesis ]

A C K N O W L E D G E M E N T S

This thesis was possible due to several people who contributed signif-icantly to my work I owe my gratitude to each of them for guidanceencouragement and inspiration during my doctoral study

First of all I would like to thank my advisor Prof Dr HinrichSchuumltze for his mentorship support and thoughtful guidance Myspecial gratitude goes for Prof Dr Ruedi Seiler and his fantastic teamat Integral-Learning GmbH I am thankful for your valuable inputinterest in our collaborative work and for your continuous supportI would also like to show gratitude to Prof Dr Alexander Fraser forco-advising support and helpful feedback

Finally none of this would be possible without my familyrsquos sup-port there are no words to express the depth of gratitude I feel forthem My earnest appreciation goes to my husband Dietrich thankyou for all the time you were by my side and for the encouragementyou provided me through these years I am very grateful to my par-ents for their understanding and backing at any moment I appreciatethe love of my grandparents it is hard to realize how significant theircare is for me Besides that I am also grateful to my sister Alexan-dra elder brother Denis and parents-in-law for merely being alwaysthere for me

VII

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

C O N T E N T S

List of Figures XIList of Tables XIIAcronyms XIII1 introduction 1

11 Outline 3

i e-learning conversational assistant 5

2 foundations 7

21 Introduction 7

22 Conversational Assistants 9

221 Taxonomy 9

222 Conventional Architecture 12

223 Existing Approaches 15

23 Target Domain amp Task Definition 18

24 Summary 19

3 multipurpose ipa via conversational assistant 21

31 Introduction 21

311 Outline and Contributions 22

32 Model 23

321 OMB+ Design 23

322 Preprocessing 23

323 Natural Language Understanding 25

324 Dialogue Manager 26

325 Meta Policy 30

326 Response Generation 32

33 Evaluation 37

331 Automated Evaluation 37

332 Human Evaluation amp Error Analysis 37

34 Structured Dialogue Acquisition 39

35 Summary 39

36 Future Work 40

4 deep learning re-implementation of units 41

41 Introduction 41

411 Outline and Contributions 42

42 Related Work 43

43 BERT Bidirectional Encoder Representations from Trans-formers 44

44 Data amp Descriptive Statistics 46

45 Named Entity Recognition 48

451 Model Settings 49

46 Next Action Prediction 49

461 Model Settings 49

47 Evaluation and Results 50

48 Error Analysis 58

IX

[ September 4 2020 at 1017 ndash classicthesis ]

X contents

49 Summary 58

410 Future Work 59

ii clustering-based trend analyzer 61

5 foundations 63

51 Motivation and Challenges 63

52 Detection of Emerging Research Trends 64

521 Methods for Topic Modeling 65

522 Topic Evolution of Scientific Publications 67

53 Summary 68

6 benchmark for scientific trend detection 71

61 Motivation 71

611 Outline and Contributions 72

62 Corpus Creation 73

621 Underlying Data 73

622 Stratification 73

623 Document representations amp Clustering 74

624 Trend amp Downtrend Estimation 75

625 (Down)trend Candidates Validation 75

63 Benchmark Annotation 77

631 Inter-annotator Agreement 77

632 Gold Trends 78

64 Evaluation Measure for Trend Detection 79

65 Trend Detection Baselines 80

651 Configurations 81

652 Experimental Setup and Results 81

66 Benchmark Description 82

67 Summary 83

68 Future Work 83

7 discussion and future work 85

bibliography 87

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F F I G U R E S

Figure 1 Taxonomy of dialogue systems 10

Figure 2 Example of a conventional dialogue architecture 12

Figure 3 OMB+ online learning platform 24

Figure 4 Ruled-Based System Dialogue flow 36

Figure 5 Macro F1 evaluation NAP ndash default modelComparison between German and multilingualBERT 52

Figure 6 Macro F1 test NAP ndash default model Compari-son between German and multilingual BERT 53

Figure 7 Macro F1 evaluation NAP ndash extended model 54

Figure 8 Macro F1 test NAP ndash extended model 55

Figure 9 Macro F1 evaluation NER ndash Comparison be-tween German and multilingual BERT 56

Figure 10 Macro F1 test NER ndash Comparison between Ger-man and multilingual BERT 57

Figure 11 (Down)trend detection Overall distribution ofpapers in the entire dataset 73

Figure 12 A trend candidate (positive slope) Topic Sen-timent and Emotion Analysis (using Social Me-dia Channels) 76

Figure 13 A downtrend candidate (negative slope) TopicXML 76

XI

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F TA B L E S

Table 1 Natural language representation of a sentence 13

Table 2 Rule 1 Admissible configurations 27

Table 3 Rule 2 Admissible configurations 27

Table 4 Rule 3 Admissible configurations 27

Table 5 Rule 4 Admissible configurations 28

Table 6 Rule 5 Admissible configurations 28

Table 7 Case 1 ID completeness 31

Table 8 Case 2 ID completeness 31

Table 9 Showcase 1 Short flow 33

Table 10 Showcase 2 Organisational question 33

Table 11 Showcase 3 Fallback and human-request poli-cies 34

Table 12 Showcase 4 Contextual question 34

Table 13 Showcase 5 Long flow 35

Table 14 General statistics for conversational dataset 46

Table 15 Detailed statistics on possible systems actionsin conversational dataset 48

Table 16 Detailed statistics on possible named entitiesin conversational dataset 48

Table 17 F1-results for the Named Entity Recognition task 51

Table 18 F1-results for the Next Action Prediction task 51

Table 19 Average dialogue accuracy computed for theNext Action Prediction task 51

Table 20 Trend detection List of gold trends 79

Table 21 Trend detection Example of MAP evaluation 80

Table 22 Trend detection Ablation results 82

XII

[ September 4 2020 at 1017 ndash classicthesis ]

acronyms XIII

AI Artificial Intelligence

API Application Programming Interface

BERT Bidirectional Encoder Representations from Transformers

BiLSTM Bidirectional Long Short Term Memory

CRF Conditional Random Fields

DM Dialogue Manager

DST Dialogue State Tracker

DSTC Dialogue State Tracking Challenge

EC Equivalence Classes

ES ElasticSearch

GM Gaussian Models

IAA Inter Annotator Agreement

ID Informational Dictionary

IPA Intelligent Process Automation

ITS Intelligent Tutoring System

LDA Latent Dirichlet Allocation

LSA Latent Semantic Analysis

LSTM Long Short Term Memory

MAP Mean Average Precision

MER Mutually Exclusive Rules

NAP Next Action Prediction

NER Named Entity Recognition

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OMB+ Online Mathematik Bruumlckenkurs

PL Policy Learning

pLSA probabilistic Latent Semantic Analysis

REST Representational State Transfer

RNN Recurrent Neural Network

RPA Robotic Process Automation

RegEx Regular Expressions

TL Transfer Learning

ToDS Task-oriented Dialogue System

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

1I N T R O D U C T I O N

Robotic Process Automation (RPA) is a type of software bots that im-itates manual human activities like entering data into a system log-ging into accounts or executing simple but repetitive workflows (Mo-hanty and Vyas 2018) In other words RPA-systems allow businessapplications to work without human involvement by replicating hu-man worker activities A further benefit of RPA-bots is that they aremuch cheaper compared to the employment of a human worker andthey are also able to work around-the-clock Overall advantages ofsoftware bots are hugely appealing and include cost savings accuracyand compliance while executing processes improved responsiveness(ie bots are faster than human workers) and finally such bots areusually agile and multi-skilled (Ivancic Vugec and Vukšic 2019)

However one of the main benefits and at the same time ndash draw-backs of the RPA-systems is their ability to precisely fulfill the as-signed task On the one hand the bot can carry out the task accu-rately and diligently On the other hand it is highly susceptible tochanges in defined scenarios Being designed for a particular taskthe RPA-bot is often not adaptable to other domains or even simplechanges in a workflow (Mohanty and Vyas 2018) This inability toadapt to changing conditions gave rise to a further area of improve-ment for RPA-bots ndash Intelligent Process Automation (IPA) systems

IPA-bots combine Robotic Process Automation (RPA) with ArtificialIntelligence (AI) and thus can perform complex and more cognitivelydemanding tasks that require reasoning and language understand-ing Hence IPA-bots went beyond automating simple ldquoclick tasksrdquo (asin RPA) and can accomplish jobs more intelligently ndash employing ma-chine learning algorithms and advanced analytics Such IPA-systemsundertake time-consuming and routine tasks and thus enable smartworkflows and free up skilled workers to accomplish higher-valueactivities

Previously IPA techniques were primarily employed within the com-puter vision domain Starting with the automation of simple displayclick-tasks and going towards sophisticated self-driving cars withtheir ability to interpret a stream of situational data obtained fromsensors and cameras However in recent times Natural LanguageProcessing (NLP) became one of the potential applications for IPA aswell due to its ability to understand and interpret human languageNLP methods are used to analyze large amounts of textual data andto respond to various inquiries Furthermore if the available datais unstructured and does not have any predefined format (ie cus-tomer emails) or if it is available in a variable format (ie invoices

1

[ September 4 2020 at 1017 ndash classicthesis ]

2 introduction

legal documents) then the Natural Language Processing (NLP) tech-niques are applied to extract the multiple relevant information Thisdata then can be utilized to solve distinct problems (ie natural lan-guage understanding predictive analytics) (Mohanty and Vyas 2018)For instance in the case of predictive analytics such a bot wouldmake forecasts about the future using current and historical data andtherethrough assist with sales or investments

However NLP in the context of Intelligent Process Automation (IPA)is not limited to the extraction of relevant data and insights from tex-tual documents One of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots) that are used toenable human-to-machine interaction This type of software is com-monly used for customer support or within recommendation- andbooking systems Conversational agents gain enormous demand dueto their ability to simultaneously support a large number of userswhile communicating in a natural language In a conventional chat-bot system a user provides input in a natural language the bot thenprocesses this query to extract potentially useful information and re-sponds with a relevant reply This process comprises multiple stagesand involves different types of NLP subtasks starting with NaturalLanguage Understanding (NLU) (eg intent recognition named en-tity extraction) and going towards dialogue management (ie deter-mining the next possible bots action considering the dialogue his-tory) and response generation (eg converting the semantic repre-sentation of next bots action to a natural language utterance) Typicaldialogue system for Intelligent Process Automation (IPA) purposesusually undertakes shallow customer support requests (eg answer-ing of FAQs) allowing human workers to focus on more sophisticatedinquiries

Natural Language Processing (NLP) techniques may also be usedindirectly for IPA purposes There are attempts to automatically iden-tify and classify tasks extracted from textual process descriptions asmanual user or automated The goal of such an NLP applicationis to reduce the effort required to identify suitable candidates for fur-ther robotic process automation (Friedrich Mendling and Puhlmann2011 Leopold Aa and Reijers 2018)

Intelligent Process Automation (IPA) is an emerging technology andevolves mainly due to the recognized potential benefit of combiningRobotic Process Automation (RPA) and Artificial Intelligence (AI) tech-niques to solve industrial problems (Ivancic Vugec and Vukšic 2019Mohanty and Vyas 2018) Industries are typically interested in be-ing responsive to customers and markets (Mohanty and Vyas 2018)Thus most customer support applications need to be real-time activeand highly robust to achieve higher client satisfaction rates The man-ual accomplishment of such tasks is often time-consuming expensiveand error-prone Furthermore due to the massive amounts of textualdata available for analysis it seems not to be feasible to process thisdata entirely by human means

[ September 4 2020 at 1017 ndash classicthesis ]

11 outline 3

11 outline

This thesis covers two topics united by the broader theme which isIntelligent Process Automation (IPA) utilizing statistical Natural Lan-guage Processing (NLP) methods

the first block of the thesis (Chapter 2 ndash Chapter 4) addressesthe challenge of implementing a conversational agent for Intelli-gent Process Automation (IPA) purposes within the e-learningfield As mentioned above chatbots are one of the central ap-plications of the IPA domain since they can effectively performtime-consuming tasks requiring natural language understand-ing allowing human workers to concentrate on more valuabletasks The IPA conversational bot that was realized within thisthesis takes care of routine and time-consuming tasks performedby human tutors within an online mathematical course Thisbot is deployed in a real-world setting and interact with stu-dents within the OMB+ mathematical platform Conducting ex-periments we observed two possibilities to build the conversa-tional agent in industrial settings ndash first with purely rule-basedmethods considering the missing training data and individualaspects of the target domain (ie e-learning) Second we re-implemented two of the main system components (ie Natu-ral Language Understanding (NLU) and Dialogue Manager (DM)units) using the current state-of-the-art deep-learning algorithm(ie Bidirectional Encoder Representations from Transformers(BERT)) and investigated their performance and potential use asa part of a hybrid model (ie containing both rule-based andmachine learning methods)

the second block of the thesis (Chapter 5 ndash Chapter 6) con-siders an Intelligent Process Automation (IPA) problem withinthe predictive analytics domain and addresses the task of scien-tific trend detection Predictive analytics makes forecasts aboutfuture outcomes based on historical and current data Henceorganizations can benefit through the advanced analytics mod-els by detecting trends and their behaviors and then employ-ing this information while making crucial business decisions(ie investments) In this work we deal with the subproblemof trend detection task ndash specifically we address the lack ofpublicly available benchmarks for the evaluation of trend de-tection algorithms Therefore we assembled the benchmark forboth trends and downtrends (ie topics that become less frequentovertime) detection To the best of our knowledge the task ofdowntrend detection has not been well addressed before Fur-thermore the resulting benchmark is based on a collection ofmore than one million documents which is among the largestthat has been used for trend detection before and therefore of-fers a realistic setting for developing trend detection algorithms

[ September 4 2020 at 1017 ndash classicthesis ]

4 introduction

All main chapters contain their specific introduction contribution listrelated work section and summary

[ September 4 2020 at 1017 ndash classicthesis ]

Part I

E - L E A R N I N G C O N V E R S AT I O N A L A S S I S TA N T

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

2F O U N D AT I O N S

In this block we address the challenge of designing and implement-ing a conversational assistant for Intelligent Process Automation (IPA)purposes within the e-learning domain The system aims to supporthuman tutors of the Online Mathematik Bruumlckenkurs (OMB+) plat-form during the tutoring round by undertaking routine and time-consuming responsibilities Specifically the system intelligently auto-mates the process of information accumulation and validation thatprecede every conversation Therethrough the IPA-bot frees up tutorsand allows them to concentrate on more complicated and cognitivelydemanding tasks

In this chapter we introduce the fundamental concepts required bythe topic of the first part of the thesis We begin with the foundationsto conversational assistants in Section 22 where we present the taxon-omy of existing dialogue systems in Section 221 give an overview ofthe conventional dialogue architecture in Section 222 with its mostessential components and describe the existing approaches for build-ing a conversational agent in Section 223 We finally introduce thetarget domain for our experiments in Section 23

21 introduction

Conversational Assistants - a type of software that interacts with peo-ple via written or spoken natural language have become widely pop-ularized in recent times Todayrsquos conversational assistants support abroad range of everyday skills including but not limited to creatinga reminder answering factoid questions and controlling smart homedevices

Initially the notion of artificial companion capable of conversingin a natural language was anticipated by Alan Turing in 1949 (Tur-ing 1949) Still the first significant step in this direction was donein 1966 with the appearance of Weizenbaums system ELIZA (Weizen-baum 1966) Its successor was the ALICE (Artificial LinguisticInternet Computer Entity) ndash a natural language processing dialoguesystem that conversed with a human by applying heuristical patternmatching rules Despite being remarkably popular and technologi-cally advanced for their days both systems were unable to pass theTuring test (Alan 1950) Following a long silent period the nextwave of growth of intelligent agents was caused by the advances inNatural Language Processing (NLP) This was enabled by access tolow-cost memory higher processing speed vast amounts of available

7

[ September 4 2020 at 1017 ndash classicthesis ]

8 foundations

data and sophisticated machine learning algorithms Hence one ofthe most notable leaps in the history of conversational agents beganin the year 2011 with intelligent personal assistants In that time Ap-ples Siri followed by Microsofts Cortana and Amazons Alexa in 2014and then Google Assistant in 2016 came to the scene The main focusof these agents was not to mimic humans as in ELIZA or ALICEbut to assist a human by answering simple factoid questions and set-ting notification tasks across a broad range of content Another typeof conversational agent called Task-oriented Dialogue System (ToDS)became a focus of industries in the year 2016 For the most part thesebots aimed to support customer service (Cui et al 2017) and respondto Frequently Asked Questions (Liao et al 2018) Their conversa-tional style provided more depth than those of personal assistantsbut a narrower range since they were patterned for task solving andnot for open-domain discussions

One of the central advantages of conversational agents and the rea-son why they are attracting considerable interest is their ability togive attention to several users simultaneously while supporting nat-ural language communication Due to this conversational assistantsgain momentum in numerous kinds of practical support applications(ie customer support) where a high number of users are involved atthe same time Classical engagement of human assistants is very cost-efficient and such support is usually not full-time accessible There-fore to automate human assistance is desirable but also an ambi-tious task Due to the lack of solid Natural Language Understand-ing (NLU) advancing beyond primary tasks is still challenging (Lugerand Sellen 2016) and even the most prosperous dialogue systemsoften fail to meet userrsquos expectations (Jain et al 2018) Significantchallenges involved in the design and implementation of a conversa-tional agent varying from domain engineering to Natural LanguageUnderstanding (NLU) and designing of an adaptive domain-specificconversational flow The quintessential problems and how-to ques-tions while building conversational assistants include

bull How to deal with variability and flexibility of language

bull How to get high-quality in-domain data

bull How to build meaningful representations

bull How to integrate commonsense and domain knowledge

bull How to build a robust system

Besides that current Natural Language Processing (NLP) and Ar-tificial Intelligence (AI) techniques have weaknesses when applied todomain-specific task-oriented problems especially if underlying datacomes from an industrial source As machine learning techniquesheavily rely on structured and cleaned training data to deliver consis-tent performance typically noisy real-world data without access toadditional knowledge databases negatively impacts the classificationaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 9

Despite that conversational agents could be successfully employedto solve less cognitive but still highly relevant tasks In this case adialogue agent could be operated as a part of the Intelligent ProcessAutomation (IPA) system where it takes care of straightforward butroutine and time-consuming functions performed in a natural lan-guage while allowing a human worker to focus on more cognitivedemanding processes

22 conversational assistants

Development and design of a particular dialogue system usually varyby its objectives among the most fundamental are

bull Purpose Should it be an open-ended dialogue system (egchit-chat) task-oriented system (eg customer support) or apersonal assistant (eg Google Assistant)

bull Skills What kind of behavior should a system demonstrate

bull Data Which data is required to develop (resp train) specificblocks and how could it be assembled and curated

bull Deployment Will the final system be integrated into a partic-ular software platform What are the challenges in choosingsuitable tools How will the deployment happen

bull Implementation Should it be a rule-based information-retrievalmachine-learning or a hybrid system

In the following chapter we will take a look at the most commontaxonomies of dialogue systems and different design and implemen-tation strategies

221 Taxonomy

According to the goals conversational agents could be classified intotwo main categories task-oriented systems (eg for customer support)and social bots (eg for chit-chat purposes) Furthermore systemsalso vary regarding their implementation characteristics Below weprovide a detailed overview of the most common variations

task-oriented dialogue systems are famous for their abilityto lead a dialogue with the ultimate goal of solving a userrsquos problem(see Figure 1) A customer support agent or a flightrestaurant book-ing system exemplify this class of conversational assistants Suchsystems are more meaningful and useful as those with an entirely so-cial goal however they usually affect more complications during theimplementation stage Since Task-oriented Dialogue System (ToDS) re-quire a precise understanding of the userrsquos intent (ie goal) the Natu-ral Language Understanding (NLU) segment must perform flawlessly

[ September 4 2020 at 1017 ndash classicthesis ]

10 foundations

Besides that if applied in an industrial setting ToDS are usually builtmodular which means they mostly rely on handcrafted rules and pre-defined templates and are restricted in their abilities Personal assis-tants (ie Google Assistant) are members of the task-oriented familyas well Though compared to standard agents they involve more so-cial conversation and could be used for entertainment purposes (egldquoOk Google tell me a jokerdquo) The latter is typically not available inconventional task-oriented systems

social bots and chit-chat systems in contrast regularly donot hold any specific goal and are designed for small-talk and enter-tainment conversations (see Figure 1) Those agents are usually end-to-end trainable and highly data-driven which means they requirelarge amounts of training data However due to entirely learnabletraining such systems often produce inappropriate responses mak-ing them extremely unreliable for practical applications Alike chit-chat systems social bots are rich in social conversation however theypossess an ability to solve uncomplicated user requests and conducta more meaningful dialogue (Fang et al 2018) Those requests arenot the same as in task-oriented systems and are mostly pointed toanswer simple factoid questions

Figure 1 Taxonomy of dialogue systems according to the measure of theirtask accomplishment and social contribution Figure originatesfrom Fang et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 11

The aforementioned system types could be decomposed farther ac-cording to the implementation characteristics

open domain amp closed domain A Task-oriented Dialogue Sys-tem (ToDS) is typically built within a closed domain setting Thespace of potential inputs and outputs is restricted since the sys-tem attempts to achieve a specific goal Customer support orshopping assistants are cases of closed domain problems (Wenet al 2016) These systems do not need to be able to talk aboutoff-site topics but they have to fulfill their particular tasks asefficiently as possible

Whereas chit-chat systems do not assume a well-defined goalor intention and thus could be built within an open domainThis means the conversation can progress in any direction andwithout any or minimal functional purpose (Serban et al 2016)However the infinite number of various topics and the fact thata certain amount of commonsense is required to create reason-able responses makes an open-domain setting a hard problem

content-driven amp user-centric Social bots and chit-chat sys-tems are typically utilizing a content-driven dialogue mannerThat means that their responses are based on the large and dy-namic content derived from the daily web mining and knowl-edge graphs This approach builds a dialogue based on trendycontent from diverse sources and thus it is an ideal way forsocial bots to catch usersrsquo attention and bring a user into thedialogue

User-centric approaches in turn assume precise language un-derstanding to detect the sentiment of usersrsquo statements Ac-cording to the sentiment the dialogue manager learns usersrsquopersonalities handles rapid topic changes and tracks engage-ment In this case the dialogue builds on specific user interests

long amp short conversations Models with a short conversationstyle attempt to create a single response to a single input (egweather forecast) In the case with a long conversation a systemhas to go through multiple turns and to keep track of what hasbeen said before (ie dialogue history) The longer the conver-sation the more challenging it is to train a system Customersupport conversations are typically long conversational threadswith multiple questions

response generation One of the essential elements of a dialoguesystem is a response generation This could be either done in aretrieval-based manner or by using generative models (ie NaturalLanguage Generation (NLG))

The former usually utilizes predefined responses and heuristicsto select a relevant response based on the input and context Theheuristic could be a rule-based expression match (ie RegularExpressions (RegEx)) or an ensemble of machine learning clas-

[ September 4 2020 at 1017 ndash classicthesis ]

12 foundations

sifiers (eg entity extraction prediction of next action) Suchsystems do not generate any original text instead they select aresponse from a fixed set of predefined candidates These mod-els are usually used in an industrial setting where the systemmust show high robustness performance and interpretabilityof the outcomes

The latter in contrast do not rely on predefined replies Sincesuch models are typically based on (deep-) machine learningtechniques they attempt to generate new responses from scratchThis is however a complicated task and is mostly used forvanilla chit-chat systems or in a research domain where thestructured training data is available

222 Conventional Architecture

In this and the following subsection we review the pipeline and meth-ods for Task-oriented Dialogue System (ToDS) Such agents are typi-cally composed of several components (Figure 2) addressing diversetasks required to facilitate a natural language dialogue (WilliamsRaux and Henderson 2016)

Figure 2 Example of a conventional dialogue architecture with its maincomponents in the bounding box Language Understanding (NLU)Dialogue Manager (DM) Response Generation (NLG) Figure origi-nates from NAACL 2018 Tutorial Su et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 13

A conventional dialogue architecture implements this with the fol-lowing components

natural language understanding unit has the following ob-jectives it parses the user input classifies the domain (ie cus-tomer support restaurant booking not required for a singledomain systems) and intent (ie find-flight book-taxi) and ex-tracts entities (ie Italian food Berlin)

Generally NLU unit maps every user utterance into semanticframes that are predefined according to different scenarios Ta-ble 1 illustrates an example of sentence representation wherethe departure-location (Berlin) and the arrival-location (Munich)are specified as slot values and the domain (airline travel) andintent (find-flight) are specified as well Typically there are twokinds of possible representations The first one is the utterancecategory represented in the form of the userrsquos intent The sec-ond could be the word-level information extraction in the formof Named Entity Recognition (NER) or Slot Filling tasks

Sentence Find flight from Berlin to Munich today

SlotsConcepts O O O B-dept O B-arr B-date

Named Entity O O O B-city O B-city O

Intent Find_Flight

Domain Airline Travel

Table 1 An illustrative example of a sentence representation1

A dialog act classification sub-module also known as intent clas-sification is dedicated to detecting the primary userrsquos goal atevery dialogue state It labels each userrsquos utterance with oneof the predefined intents Deep neural networks can be directlyapplied to this conventional classification problem (KhanpourGuntakandla and Nielsen 2016 Lee and Dernoncourt 2016)

Slot filling is another sub-module for language understandingUnlike intent detection (ie classification problem) slot fillingis defined as a sequence labeling problem where words in thesequence are labeled with semantic tags The input is a se-quence of words and the output is a sequence of slots onefor every word Variations of Recurrent Neural Network (RNN)architectures (LSTM BiLSTM) (Deoras et al 2015) and (attention-based) encoder-decoder architectures (Kurata et al 2016) havebeen employed for this task

[ September 4 2020 at 1017 ndash classicthesis ]

14 foundations

dialogue manager unit is subdivided into two components ndashfirst is a Dialogue State Tracker (DST) that holds the conver-sational state and second is a Policy Learning (PL) that definesthe systems next action

Dialogue State Tracker (DST) is the core component to guaranteea robust dialogue flow since it manages the userrsquos intent at ev-ery turn The conventional methods which are broadly appliedin commercial implementations often adopt handcrafted rulesto pick the most probable candidate (Goddeau et al 1996 Wangand Lemon 2013) among the predefined Though a variety ofstatistical approaches have emerged since the Dialogue StateTracking Challenge (DSTC)2 was arranged by Jason D Williamsin year 2013 Among the deep-learning based approaches isthe Belief Tracker proposed by Henderson Thomson and Young(2013) It employs a sliding window to produce a sequence ofprobability distributions over an arbitrary number of possiblevalues (Chen et al 2017) In the work of Mrkšic et al (2016) au-thors introduced a Neural Belief Tracker to identify the slot-valuepairs The system used the user intents preceding the user in-put the user utterance itself and a candidate slot-value pairwhich it needs to decide about as the input Then the systemiterated overall candidate slot-value pairs to determine whichones have just been expressed by the user (Chen et al 2017)Finally Vodolaacuten Kadlec and Kleindienst proposed a Hybrid Di-alog State Tracker that achieved the state-of-the-art performancefor DSTC2 shared task Their model used a separate-learned de-coder coupled with a rule-based system

Based on the state representation from the Dialogue State Tracker(DST) the PL unit generates the next system action that is thendecoded by the Natural Language Generation (NLG) moduleUsually a rule-based agent is used to initialize the system (Yanet al 2017) and then supervised learning is applied to theactions generated by these rules Alternatively the dialoguepolicy can be trained end-to-end with reinforcement learningto manage the system making policies toward the final perfor-mance (Cuayaacutehuitl Keizer and Lemon 2015)

natural language generation unit transforms the next actioninto a natural language response Conventional approaches toNatural Language Generation (NLG) typically adopt a template-based style where a set of rules is defined to map every nextaction to a predefined natural language response Such ap-proaches are simple generally error-free and easy to controlhowever their implementation is time-consuming and the styleis repetitive and hardly scalable

Among the deep learning techniques is the work by Wen etal (2015) that introduced neural network-based approaches toNLG with an LSTM-based structure Later Zhou and Huang

2Currently Dialog System Technology Challenge

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 15

(2016) adopted this model along with the question informa-tion semantic slot values and dialogue act types to generatemore precise answers Finally Wu Socher and Xiong (2019)proposed Global to Local Memory Pointer Networks (GLMP) ndash anarchitecture that incorporates knowledge bases directly into alearning framework (due to the Memory Networks component)and is able to re-use information from the user input (due tothe Pointer Networks component)

In particular cases dialogue flow can include speech recognitionand speech synthesis units (if managing a spoken natural languagesystem) and connection to third-party APIs to request informationstored in external databases (eg weather forecast) (as depicted inFigure 2)

223 Existing Approaches

As stated above a conventional task-oriented dialogue system is im-plemented in a pipeline manner That means that once the systemreceives a user query it interprets it and acts according to the dia-logue state and the corresponding policy Additionally based on un-derstanding it may access the knowledge base to find the demandedinformation there Finally a system transforms a systemrsquos next ac-tion (and if given an extracted information) into its surface form as anatural language response (Monostori et al 1996)

Individual components (ie Natural Language Understanding (NLU)Dialogue Manager (DM) Natural Language Generation (NLG)) of aparticular conversational system could be implemented using differ-ent approaches starting with entirely rule- and template-based meth-ods and going towards hybrid approaches (using learnable componentsalong with handcrafted units) and end-to-end trainable machine learn-ing methods

rule-based approaches Though many of the latest researchapproaches handle Natural Language Understanding (NLU) and Nat-ural Language Generation (NLG) units by using statistical NaturalLanguage Processing (NLP) models (Bocklisch et al 2017 Burtsevet al 2018 Honnibal and Montani 2017) most of the industriallydeployed dialogue systems still utilise handcrafted features and rulesfor the state tracking action prediction intent recognition and slotfilling tasks (Chen et al 2017 Ultes et al 2017) So most of thePyDial3 framework modules offer rule-based implementations usingRegular Expressions (RegEx) rule-based dialogue trackers (Hender-son Thomson and Williams 2014) and handcrafted policies Alsoin order to map the next action to text Ultes et al (2017) suggest rulesalong with a template-based response generation

3PyDial Framework httpwwwcamdialorgpydial

[ September 4 2020 at 1017 ndash classicthesis ]

16 foundations

The rule-based approach ensures robustness and stable performancethat is crucial for industrial systems that communicate with a largenumber of users simultaneously However it is highly expensive andtime-consuming to deploy a real dialogue system built in that man-ner The major disadvantage is that the usage of handcrafted systemsis restricted to a specific domain and possible adaptation requiresextensive manual engineering

end-to-end learning approaches Due to the recent advanceof end-to-end neural generative models (Collobert et al 2011) manyefforts have been made to build an end-to-end trainable architecturefor conversational agents Rather than using the traditional pipeline(see Figure 2) the end-to-end model is conceived as a single module(Chen et al 2017)

Wen et al (2016) followed by Bordes Boureau and Weston (2016)were among the pioneers who adapted an end-to-end trainable ap-proach for conversational systems Their approach was to treat dia-logue learning as the problem of learning a mapping from dialoguehistories to system responses and to apply an encoder-decoder model(Sutskever Vinyals and Le 2014) to train the entire system The pro-posed model still had its significant limitations (Glasmachers 2017)such as missing policies to learn a dialogue flow and the inability toincorporate additional knowledge that is especially crucial for train-ing a task-oriented agent To overcome the first problem Dhingraet al (2016) introduced an end-to-end reinforcement learning approachthat jointly trains dialogue state tracker and policy learning unitsThe second weakness was addressed by Eric and Manning (2017)who augmented existing recurrent network architectures with a dif-ferentiable attention-based key-value retrieval mechanism over theentries of a knowledge base This approach was inspired by Key-Value Memory Networks (Miller et al 2016) Later on Wu Socherand Xiong (2019) proposed Global to Local Memory Pointer Networks(GLMP) where the authors incorporated knowledge bases directlyinto a learning framework In their model a global memory encoderand a local memory decoder are employed to share external knowl-edge The encoder encodes dialogue history modifies global con-textual representation and generates a global memory pointer Thedecoder first produces a sketch response with empty slots Next ittransfers the global memory pointer to filter the external knowledgefor relevant information and then instantiates the slots via the localmemory pointers

Despite having better adaptability compared to any rule-based sys-tem and being simpler to train end-to-end approaches remain unattain-able for commercial conversational agents that operating on real-world data A well and carefully constructed task-oriented dialoguesystem in an observed domain using handcrafted rules (in NLU andDST units) and predefined responses for NLG still outperforms the

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 17

end-to-end systems due to its robustness (Glasmachers 2017 WuSocher and Xiong 2019)

hybrid approaches Though end-to-end learning is an attrac-tive solution for conversational systems current techniques are data-intensive and require large amounts of dialogues to learn simple be-haviors To overcome this barrier Williams Asadi and Zweig (2017)introduce Hybrid Code Networks (HCNs) which is an ensemble ofretrieval and trainable units System utilizes an Recurrent NeuralNetwork (RNN) for Natural Language Understanding (NLU) blockdomain-specific knowledge that are encoded as software (ie API

calls) and custom action templates used for Natural Language Gen-eration (NLG) unit Compared to existing end-to-end methods theauthors report that their approach considerably reduces the amountof data required for training (Williams Asadi and Zweig 2017) Ac-cording to the authors HCNs achieve state-of-the-art performance onthe bAbI dialog dataset (Bordes Boureau and Weston 2016) and out-perform two commercial customer dialogue systems4

Along with this work Wen et al (2016) propose a neural network-based model that is end-to-end trainable but still modularly con-nected In their work authors treat dialogue as a sequence to se-quence mapping problem (Sutskever Vinyals and Le 2014) aug-mented with the dialogue history and the current database searchoutcome (Wen et al 2016) The authors explain the learning processas follows At every turn the system takes a user input representedas a sequence of tokens and converts it into two internal represen-tations a distributed representation of the intent and a probabilitydistribution over slot-value pairs called the belief state (Young et al2013) The database operator then picks the most probable values inthe belief state to form the search result At the same time the intentrepresentation and the belief state are transformed and combined bya Policy Learning (PL) unit to form a single vector representing thenext system action This system action is then used to generate asketch response token by the token The final system response iscomposed by substituting the actual values of the database entriesinto the sketch sentence structure (Wen et al 2016)

Hybrid models appear to replace the established rule- and template-based approaches currently utilized in an industrial setting The per-formance of the most Natural Language Understanding (NLU) com-ponents (ie Named Entity Recognition (NER) domain and intentclassification) is reasonably high to apply them for the developmentof commercial conversational agents Whereas the Natural LanguageGeneration (NLG) unit could still be realized as a template-based so-lution by employing predefined response candidates and predictingthe next system action employing deep learning systems

4Internal Microsoft Research datasets in customer support and troubleshootingdomains were employed for training and evaluation

[ September 4 2020 at 1017 ndash classicthesis ]

18 foundations

23 target domain amp task definition

MUMIE5 is an open-source e-learning platform for studying and teach-ing mathematics and computer science Online Mathematik Bruumlck-enkurs (OMB+) is one of the projects powered by MUMIE SpecificallyOnline Mathematik Bruumlckenkurs (OMB+)6 is a German online learningplatform that assists students who are preparing for an engineeringor computer science study at a university

The coursersquos central purpose is to assist students in reviving theirmathematical skills so that they can follow the upcoming universitycourses This online course is thematically segmented into 13 partsand includes free mathematical classes with theoretical and practi-cal content Every chapter begins with an overview of its sectionsand consists of explanatory texts with built-in videos examples andquick checks of understanding Furthermore various examinationmodes to practice the content (ie training exercises) and to checkthe understanding of the theory (ie quiz) are available Besidesthat OMB+ provides a possibility to get assistance from a human tu-tor Usually the students and tutors interact via a chat interface inwritten form The language of communication is German

The current dilemma of the OMB+ platform is that the number ofstudents grows significantly every year but it is challenging to findqualified tutors ndash and it is expensive to hire more human tutors Thisresults in a longer waiting period for students until their problemscan be considered and processed In general all student questionscan be grouped into three main categories Organizational questions(ie course certificate) contextual questions (ie content theorem) andmathematical questions (exercises solutions) To assist a student witha mathematical question a tutor has to know the following regularinformation What kind of topic (or subtopic) a student has a problemwith At which examination mode (quiz chapter level training or exer-cise section level training or exercise or final examination) a studentis working right now And finally the exact question number and exactproblem formulation This means that a tutor has to ask the same on-boarding questions every time a new conversation starts This is how-ever very time-consuming and could be successfully solved within anIntelligent Process Automation (IPA) system

The work presented in Chapter 3 and Chapter 4 was enabled by thecollaboration with MUMIE and Online Mathematik Bruumlckenkurs (OMB+)team and addresses the abovementioned problem Within Chapter 3we implemented an Intelligent Process Automation (IPA) dialogue sys-tem with rule- and retrieval-based algorithms that interacts with stu-dents and performs the onboarding This system is multipurposeon the one hand it eases the assistance process for human tutors byreducing repetitive and time-consuming activities and therefore al-lows workers to focus on more challenging tasks Second interacting

5Mumie httpswwwmumienet6httpswwwombplusde

[ September 4 2020 at 1017 ndash classicthesis ]

24 summary 19

with students it augments the resources with structured and labeledtraining data The latter in turn allowed us to re-train specific sys-tem components utilizing machine and deep learning methods Wedescribe this procedure in Chapter 4

We report that the earlier collected human-to-human data from theOMB+ platform could not be used for training a machine learning sys-tem since it is highly unstructured and contains no direct question-answer matching which is crucial for trainable conversational algo-rithms We analyzed these conversations to get insight from themthat we used to define the dialogue flow

We also do not consider the automatization of mathematical ques-tions in this work since this task would require not only preciselanguage understanding and extensive commonsense knowledge butalso the accurate perception of mathematical notations The tutor-ing process at OMB+ differentiates a lot from the standard tutoringmethods Here instructors attempt to navigate students by givinghints and tips instead of providing an out-of-the-box solution Exist-ing human-to-human dialogues revealed that such kind of task couldbe even challenging for human tutors since it is not always directlyperceptible what precisely the student does not understand or hasdifficulties with Furthermore dialogues containing mathematical ex-planations could last for a long time and the topic (ie intent andfinal goal) can shift multiple times during the conversation

To our knowledge this task would not be feasible to sufficientlysolve with current machine learning algorithms especially consider-ing the missing training data A rule-based system would not bethe right solution for this purpose as well due to a large number ofcomplex rules which would need to be defined Thus in this workwe focus on the automatization of less cognitively demanding taskswhich are still relevant and must be solved in the first instance to easethe tutoring process for instructors

24 summary

Conversational agents are a highly demanding solution for indus-tries due to their potential ability to support a large number of usersmulti-threaded while performing a flawless natural language conver-sation Many messenger platforms are created and custom chatbotsin various constellations emerge daily Unfortunately most of thecurrent conversational agents are often disappointing to use due tothe lack of substantial natural language understanding and reason-ing Furthermore Natural Language Processing (NLP) and ArtificialIntelligence (AI) techniques have limitations when applied to domain-specific and task-oriented problems especially if no structured train-ing data is available

Despite that conversational agents could be successfully applied tosolve less cognitive but still highly relevant tasks ndash that is within the

[ September 4 2020 at 1017 ndash classicthesis ]

20 foundations

Intelligent Process Automation (IPA) domain Such IPA-bots wouldundertake tedious and time-consuming responsibilities to free up theknowledge worker for more complex and intricate tasks IPA-botscould be seen as a member of goal-oriented dialogue family and arefor the most part performed either in a rule-based manner or throughhybrid models when it comes to a real-world setting

[ September 4 2020 at 1017 ndash classicthesis ]

3M U LT I P U R P O S E I PA V I A C O N V E R S AT I O N A LA S S I S TA N T

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research describedin this chapter was carried out in its entirety by the au-thor of this thesis The other authors of the publicationacted as advisers At the time of writing this thesis thesystem described in this work was deployed at the OnlineMathematik Bruumlckenkurs (OMB+) platform

As we outlined in the introduction Intelligent Process Automa-tion (IPA) is an emerging technology with the primary goal of assist-ing the knowledge worker by taking care of repetitive routine andlow-cognitive tasks Conversational assistants that interact with usersin a natural language are a potential application for the IPA domainSuch smart virtual agents can support the user by responding to var-ious inquiries and performing routine tasks carried out in a naturallanguage (ie customer support)

In this work we tackle the challenge of implementing an IPA con-versational agent in a real-world industrial context and conditions ofmissing training data Our system has two meaningful benefits Firstit decreases monotonous and time-consuming tasks and hence letsworkers concentrate on more intellectual processes Second interact-ing with users it augments the resources with structured and labeledtraining data The latter in turn can be used to retrain a systemutilizing machine and deep learning methods

31 introduction

Conversational dialogue systems can give attention to several userssimultaneously while supporting natural communication They arethus exceptionally needed for practical assistance applications wherea high number of users are involved at the same time Regularly di-alogue agents require a lot of natural language understanding andcommonsense knowledge to hold a conversation reasonably and in-telligently Therefore nowadays it is still challenging to substitutehuman assistance with an intelligent agent completely However adialogue system could be successfully employed as a part of the In-telligent Process Automation (IPA) application In this case it wouldtake care of simple but routine and time-consuming tasks performed

21

[ September 4 2020 at 1017 ndash classicthesis ]

22 multipurpose ipa via conversational assistant

in a natural language while allowing a human worker to focus onmore cognitive demanding processes

Recent research in the dialogue generation domain is conducted byemploying Artificial Intelligence (AI) techniques like machine- anddeep learning (Lowe et al 2017 Wen et al 2016) However thosemethods require high-quality data for training that is often not di-rectly available for domain-specific and industrial problems Eventhough companies gather and store vast volumes of data it is of-ten low-quality with regards to machine learning approaches (Daniel2015) Especially if it concerns dialogue data which has to be prop-erly structured as well annotated Whereas if trained on artificiallycreated datasets (which is often the case in the research setting) thesystems can solve only specific problems and are hardly adaptableto other domains Hence despite the popularity of deep learningend-to-end models one still needs to rely on traditional pipelines inpractical dialogue engineering mainly while introducing a new do-main

311 Outline and Contributions

This work addresses the challenge of implementing a dialogue systemfor IPA purposes within the practical e-learning domain under condi-tions of missing training data The chapter is structured as followsIn Section 32 we introduce our method and describe the design ofthe rule-based dialogue architecture with its main components suchas Natural Language Understanding (NLU) Dialogue Manager (DM)and Natural Language Generation (NLG) units In Section 33 wepresent the results of our dual-focused evaluation and Section 34 de-scribes the format of collected structured data Finally we concludein Section 35

Our contributions within this work are as follows

bull We implemented a robust conversational system for IPA pur-poses within a practical e-learning domain and under the con-ditions of missing training (ie dialogue) data

bull The system has two main objectives

ndash First it reduces repetitive and time-consuming activitiesand allows workers of the e-learning platform to focussolely on mathematical inquiries

ndash Second by interacting with users it augments the resourceswith structured and partially labeled training data for fur-ther implementation of trainable dialogue components

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 23

32 model

The central purpose of the introduced system is to interact with stu-dents at the beginning of every chat conversation and gather informa-tion on the topic (and sub-topic) examination mode and level questionnumber and exact problem formulation Such onboarding process savestime for human tutors and allows them to handle mathematical ques-tions solely Furthermore the system is realized in a way such that itaccumulates labeled and structured dialogues in the background

Figure 4 displays the intact conversation flow After the systemaccepts user input it analyzes it at different levels and extracts infor-mation if such is provided If some of the information required fortutoring is missing the system asks the student to provide it Whenall the information points are collected they are automatically vali-dated and forwarded to a human tutor The latter can then directlyproceed with the assistance process In the following we explain thekey components of the system

321 OMB+ Design

Figure 3 illustrates the internal structure and design of the OMB+ plat-form It has topics and sub-topics as well as four various examinationmodes Each topic (Figure 3 tag 1) correspond to a chapter level and al-ways has sub-topics (Figure 3 tag 2) that correspond to a section levelExamination modes training and exercise are ambiguous because theycorrespond to either a chapter (Figure 3 tag 3) or a section (Figure 3tag 5) level and it is important to differentiate between them sincethey contain different types of content The mode final examination(Figure 3 tag 4) always corresponds to a chapter level whereas quiz(Figure 3 tag 5) can belong only to a section level According to thedesign of the OMB+ platform there are several ways of how a possibledialogue flow can proceed

322 Preprocessing

In a natural language conversation one may respond in various formsmaking the extraction of data from user-generated text challengingdue to potential misspellings or confusable spellings (ie Aufgabe 1aAufgabe 1 (a)) Hence to facilitate a substantial recognition of intentsand extraction of entities we normalize (eg correct misspellingsdetect synonyms) and preprocess every user input before moving for-ward to the Natural Language Understanding (NLU) module

[ September 4 2020 at 1017 ndash classicthesis ]

24 multipurpose ipa via conversational assistant

Figure 3 OMB+ Online Learning Platform where 1 is the Topic (corre-sponds to a chapter level) 2 is a Sub-Topic (corresponds to a sectionlevel) 3 is chapter level examination mode 4 is the Final Exam-ination Mode (available only for chapter level) 5 are the Examina-tion Modes Exercise Training (available at section levels) and Quiz(available only at section level) and 6 is the OMB+ Chat

We perform both linguistic and domain-specific preprocessing whichinclude the following steps

bull lowercasing and stemming of words in the input query

bull removal of stop words and punctuation

bull all mentions of X in mathematical formulations are removed toavoid confusion with roman number 10 (ldquoXrdquo)

bull in a combination of the type word ldquoKapitelAufgaberdquo + digitwritten as a word (ie ldquoeinsrdquo ldquozweitesrdquo etc) word is replacedwith a digit (ldquoKapitel einsrdquo rarr ldquoKapitel 1rdquo ldquoim fuumlnften Kapitelrdquo rarrldquoim Kapitel 5rdquo) roman numbers are replaced with a digit as well(ldquoKapitel IVrdquorarr ldquoKapitel 4rdquo)

bull detected ambiguities are normalized (eg ldquoTrainingsaufgaberdquorarrldquoTrainingrdquo)

bull recognized misspellings resp type errors are corrected (egldquoDifeernzialrechnungrdquorarr ldquoDifferentialrechnungrdquo)

bull permalinks are parsed and analyzed From each permalink it ispossible to extract topic examination mode and question num-ber

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 25

323 Natural Language Understanding

We implement the Natural Language Understanding (NLU) unit uti-lizing handcrafted rules Regular Expressions (RegEx) and Elasctic-search1 API

intent classification According to the design of the OMB+platform all student questions can be classified into three cate-gories organizational questions (ie course certificate) contextualquestions (ie content theorem) and mathematical questions (exer-cises solutions) To classify the input query by its intent (ie cat-egory) we employ weighted key-word information in the formof handcrafted rules We assume that specific words are explic-itly associated with a corresponding intent (eg theorem or rootdenote the mathematical question feedback or certificate denoteorganizational inquire) If no intent could be ordered then itis assumed that the NLU unit was not capable of understand-ing and the intent is unknown In this case the virtual agentrequests the user to provide an intent manually by picking onefrom the mentioned three options The queries from organiza-tional and theoretical categories are directly handed over to ahuman tutor while mathematical questions are analyzed by theautomated agent for further information extraction

entity extraction On the next step the system retrieves the en-tities from a user message We specify five following prerequi-site entities which have to be collected before the dialogue canbe handed over to a human tutor topic sub-topic examina-tion mode and level and question number This part is imple-mented using ElasticSearch (ES) and RegEx To facilitate the useof ES we indexed the OMB+ web-page to an internal databaseBesides indexing the titles of topics and sub-topics we also pro-vided supplementary information on possible synonyms andwriting styles We additionally filed OMB+ permalinks whichdirect to the site pages To query the resulting database we em-ploy the internal Elasticsearch multi match function and set theminimum should match parameter to 20 This parameter spec-ifies the number of terms that must match for a document tobe considered relevant Besides that we adopted fuzziness withthe maximum edit distance set to 2 characters The fuzzy queryuses similarity based on Levenshtein edit distance (Levenshtein1966) Finally the system produces a ranked list of potentialmatching entries found in the database within the predefinedrelevance threshold (we set it to θ=15) We then pick the mostprobable entry as the right one and select the correspondingentity from the user input

Overall the Natural Language Understanding (NLU) module re-ceives the user input as a preprocessed text and examines it across all

1httpswwwelasticcoproductselasticsearch

[ September 4 2020 at 1017 ndash classicthesis ]

26 multipurpose ipa via conversational assistant

predefined RegEx statements and for a match in the ElasticSearch (ES)database We use ES to retrieve entities for the topic sub-topic andexamination mode whereas RegEx is used to extract the informationon a question number Every time the entity is extracted it is filled inthe Informational Dictionary (ID) The ID has the following six slotsto be filled in topic sub-topic examination level examination modequestion number and exact problem formulation (this information isrequested on further steps)

324 Dialogue Manager

A conventional Dialogue Manager (DM) consists of two interactingcomponents The first component is the Dialogue State Tracker (DST)that maintains a representation of the current conversational stateand the second is the Policy Learning (PL) that defines the next systemaction (ie response)

In our model every agentrsquos next action is determined by the stateof the previously obtained information accumulated in the Informa-tional Dictionary (ID) For instance if the system recognizes that thestudent works on the final examination it also knows (defined by hand-coded logic in the predefined rules) that there is no necessity to askfor sub-topic because the final examination always corresponds to achapter level (due to the layout of OMB+ platform) If the system rec-ognizes that the user struggles in solving quiz it has to ask for boththe corresponding topic and sub-topic because the quiz always relatesto a section level

To discover all of the potential conversational flows we implementMutually Exclusive Rules (MER) which indicate that two events e1and e2 are mutually exclusive or disjoint since they cannot both oc-cur at the same time (ie the intersection of these events is emptyP(A cap B) = 0) Additionally we defined transition and mapping rulesHence only particular entities can co-occur while others must beexcluded from the specific rule combination Assume a topic t =

Geometry According to MER this topic can co-occur with one of thepertaining sub-topics defined at the OMB+ (ie s = Angel) Thus thelevel of the examination would be l = Section (but l 6= Chapter)because the student already reported a sub-section he or she is work-ing on In this specific case the examination mode could either bee isin [TrainingExerciseQuiz] but not e = Final_ExaminationThis is because e = Final_Examination can co-occur only with atopic (chapter-level) but not with a sub-topic (section-level) The ques-tion number q could be arbitrary and has to co-occur with every othercombination

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 27

This allows the formal solution to be found Assume a list of alltheoretically possible dialogue states S = [topic sub-topic trainingexercise chapter level section level quiz final examination questionnumber] and for each element sn in S is true that

sn =

1 if information is given

0 otherwise(1)

This gives all general (resp possible) dialogue states without ref-erence to the design of the OMB+ platform However to make thedialogue states fully suitable for the OMB+ from the general stateswe select only those which are valid To define the validness of thestate we specify the following five MERrsquos

R1 Topic

not T T

Table 2 Rule 1 ndash Admissible topic configurations

Rule (R1) in Table 2 denotes admissible configurations for topic andmeans that the topic could be either given (T ) or not (notT )

R2 Examination Mode

not TR not E not Q not FE

TR not E not Q not FE

not TR E not Q not FE

not TR not E Q not FE

not TR not E not Q FE

Table 3 Rule 2 ndash Admissible examination mode configurations

Rule (R2) in Table 3 denotes that either no information on the ex-amination mode is given or examination mode is Training (TR) orExercise (E) or Quiz (Q) or Final Examination (FE) but not more thanone mode at the same time

R3 Level

not CL not SL

CL not SL

not CL SL

Table 4 Rule 3 ndash Admissible level configurations

Rule (R3) in Table 4 indicates that either no level information isprovided or the level corresponds to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

[ September 4 2020 at 1017 ndash classicthesis ]

28 multipurpose ipa via conversational assistant

R4 Examination amp Level

not TR not E not SL not CL

TR not E SL not CL

TR not E not SL CL

TR not E not SL not CL

not TR E SL not CL

not TR E not SL CL

not TR E not SL not CL

Table 5 Rule 4 ndash Admissible examination mode (only for Training and Exer-cise) and corresponding level configurations

Rule (R4) in Table 5 means that Training (TR) and Exercise (E) ex-amination modes can either belong to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

R5 Topic amp Sub-Topic

not T not ST

T not ST

not T ST

T ST

Table 6 Rule 5 ndash Admissible topic and corresponding sub-topic configura-tions

Rule (R5) in Table 6 symbolizes that we could be either given onlya topic (T ) or the combination of topic and sub-topic (ST ) at the sametime or only sub-topic or no information on this point at all

We then define a valid dialogue state as a dialogue state that meetsall requirements of the abovementioned rules

foralls(State(s)and R1minus5(s))rarr Valid(s) (2)

Once we have the valid states for dialogues we perform a mappingfrom each valid dialogue state to the next possible systems action Forthat we first define five transition rules The order of transition rulesis important

T1 = notT (3)

means that no topic (T ) is found in the ID (ie could not be ex-tracted from user input)

T2 = notEM (4)

indicates that no examination mode (EM) is found in the ID

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 29

T3 = EM where EM isin [TRE] (5)

denotes that the extracted examination mode (EM) is either Train-ing (TR) or Exercise (E)

T4 =

T notST TR SL

T notST E SL

T notST Q

(6)

means that no sub-topic (ST ) is provided by a user but ID either al-ready contains the combination of topic (T ) training (TR) and sectionlevel (SL) or the combination of topic exercise (E) and section levelor the combination of topic and quiz (Q)

T5 = notQNR (7)

indicates that no question number (QNR) was provided by a stu-dent (or could not be successfully extracted)

Finally we assumed the list of possible next actions for the system

A =

Ask for Topic

Ask for Examination Mode

Ask for Level

Ask for Sub-Topic

Ask for Question Nr

(8)

Following the transition rules we map each valid dialogue state tothe possible next action am in A

existaforalls(Valid(s)and T1(s))rarr a(s T) (9)

in the case where we do not have any topic provided the nextaction is to ask for the topic (T )

existaforalls(Valid(s)and T2(s))rarr a(s EM) (10)

if no examination mode is provided by a user (or it could not besuccessfully extracted from the user query) the next action is definedas ask for examination mode (EM)

existaforalls(Valid(s)and T3(s))rarr a(s L) (11)

in case where we know the examination mode EM isin [TrainingExercise] we have to ask about the level (ie training at chapter levelor training at section level) thus the next action is ask for level (L)

[ September 4 2020 at 1017 ndash classicthesis ]

30 multipurpose ipa via conversational assistant

existaforalls(Valid(s)and T4(s))rarr a(s ST) (12)

if no sub-topic is provided but the examination mode EM isin [Train-ing Exercise] at section level the next action is defined as ask for sub-topic (ST )

existaforalls(Valid(s)and T5(s))rarr a(s QNR) (13)

if no question number is provided by a user then the next action isask for question number (QNR)

Following this scheme we generate 56 state transitions which spec-ify the next system actions Being on a new conversation state wecompare the extracted (or updated) data in the Informational Dictio-nary (ID) with the valid dialogue states and select the mapped actionas the next systemrsquos action

flexibility of dst The DST transitions outlined above are meantto maintain the current configuration of the OMB+ learning platformStill further MERs could be effortlessly generated to create new tran-sitions Examining this we performed tests with the previous de-sign of the platform where all the topics except for the first onehad sub-topics Whereas the first topic contained both sub-topics andsub-sub-topics We could produce the missing transitions with the de-scribed approach The number of permissible transitions in this caseincreased from 56 to 117

325 Meta Policy

Since the proposed system is expected to operate in a real-world set-ting we had to develop additional policies that manage the dialogueflow and verify the systemrsquos accuracy We explain these policies be-low

completeness of informational dictionary The model au-tomatically verifies the completeness of the Informational Dic-tionary (ID) which is determined by the number of essentialslots filled in the informational dictionary There are 6 discretecases when the ID is deemed to be complete For example ifa user works on a final examination mode the agent does nothas to request a sub-topic or examination level Hence the ID

has to be filled only with data for a topic examination typeand question number Whereas if the user works on trainingmode the system has to collect information about the topic sub-topic examination level examination mode and the questionnumber Once the ID is intact it is provided to the verificationstep Otherwise the agent proceeds according to the next ac-tion The system extracts the information in each dialogue stateand therefore if the user gives updated information on any sub-

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 31

ject later in the dialogue history the corresponding slot will beupdated in the ID

Below are examples of two final cases (out of six) where Infor-mational Dictionary (ID) is considered to be complete

Case1 =

intent = Math

topic 6= None

exam mode = Final Examination

level = None

question nr 6= None

(14)

Table 7 Case 1 ndash Any topic examination mode is final examination exami-nation level does not matter any question number

Case2 =

intent = Math

topic 6= None

sub-topic 6= None

exam mode isin [Training Exercise]

level = Section

question nr 6= None

(15)

Table 8 Case 2 Any topic any related sub-topic examination mode is ei-ther training or exercise examination level is section any questionnumber

verification step After the system has collected all the essentialdata (ie ID is complete) it continues to the final verification stepHere the obtained data is presented to the current student inthe dialogue session The student is asked to review the exactnessof the data and if any entries are faulty to correct them TheInformational Dictionary (ID) is where necessary updated withthe user-provided data This procedure repeats until the userconfirms the correctness of the compiled data

fallback policy In the cases where the system fails to retrieveinformation from a studentrsquos query it re-asks a user and at-tempts to retrieve information from a follow-up response Themaximum number of re-ask trials is a parameter and set to threetimes (r = 3) If the system is still incapable of extracting in-formation the user input is considered as the ground truth andfilled to the appropriate slot in ID An exception to this ruleapplies where the user has to specify the intent manually Afterthree unclassified trials a dialogues session is directly handedover to a human tutor

[ September 4 2020 at 1017 ndash classicthesis ]

32 multipurpose ipa via conversational assistant

human request In every dialogue state a user can shift to a hu-man tutor For this a user can enter the ldquohumanrdquo (or ldquoMenschrdquoin German language) key-word Therefore every user messageis investigated for the presence of this key-word

326 Response Generation

The semantic representation of the systems next action is transformedinto a natural language utterance Hence each possible action ismapped to precisely one response that is predefined in the templateSome of the responses are fixed (ie ldquoWelches Kapitel bearbeitest dugeraderdquo) 2 Others have placeholders for custom values In the lattercase the utterance can be formulated dependent on the InformationalDictionary (ID)

Assume predefined response ldquoDeine Angaben waren wie folgt a)Kapitel K b) Abschnitt A c) Aufgabentyp M d) Aufgabe Q e)Ebene E Habe ich dich richtig verstanden Bitte antworte mit Ja oderNeinrdquo3 The slots [KAMQE] are the placeholders for the informa-tion stored in the ID During the conversation the system selects thetemplate according to the next action and fills the slots with extractedvalues if required After the extracted information is filled in the cor-responding placeholders in the utterance the final response could beas follows ldquoDeine Angaben waren wie folgt a) Kapitel I ElementaresRechnen b) Abschnitt Rechenregel fuumlr Potenzen c) Aufgabentyp Quiz d)Aufgabe 1(a) e) Ebene Abschnitt Habe ich dich richtig verstanden Bitteantworte mit Ja oder Neinrdquo4

Below we demonstrate five sample dialogues in original Germanlanguage with variable flows

bull short dialogue where certain information is already providedin the first studentrsquos input

bull organizational question case

bull contextual question case

bull example with the manual intent classification

bull example with fallback and human policies

bull long flow with entry correction on the final verification phase

2Translates to ldquoWhich chapter are you currently working onrdquo3Translates to ldquoYour details were as follows a) Chapter b) Section c) Examina-

tion Mode d) Question Number e) Level Did I understand you correctly Pleaseanswer yes or no

4Translates to ldquoYour details were as follows a) Elementary Calculus b) PowerLaws c) Quiz d) Question 1 (a) e) Section Level Did I understand you correctlyPlease answer yes or no

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 33

Student Hallo ich habe Schwierigkeiten beim Kapitel I Elementares Rech-nen mit der Uumlbungsaufgabe 1 a

Next-Action [Ask for a Level Chapter or Section]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel ldquoGeometrierdquo) oder um eine Aufgabe auf der Ebene einesAbschnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt ldquoWinkelrdquoim Kapitel ldquoGeometrierdquo) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlrAbschnitt

Student Kapitel

Next-Action [Final-Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Aufgabentyp Exercisec) Aufgabe 1ad) Ebene KapitelHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yep

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 9 Showcase 1 Short flow most of the information was extracted fromthe initial user input In bold are the extracted entities

Student Hallo ich habe eine Frage bzgl eines Zertifikates und zwar wiekann ich es austellen lassen

Next-Action [Organisational Question]Bot Es scheint sich wohl um eine organisatorische Frage zu handeln Bittefasse deine Frage in einem kurzen Text zusammen damit ich sie an meinenmenschlichen Kollegen weiterleiten kann

Student Wie bekomme ich ein Zertifikat

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 10 Showcase 2 Organisational Question Underlined are the key-words which point on the organisational intent

[ September 4 2020 at 1017 ndash classicthesis ]

34 multipurpose ipa via conversational assistant

Student hallo

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student ma

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student math

Next-Action [Ask for Topic]Bot Welches Kapitel bearbeitest du gerade

Student Lineare Gleichungssysteme

Next-Action [Ask for Examination Mode]Bot Bearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfung

Student human

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 11 Showcase 3 Manual intent classification Fallback and Human Re-quest policies (in bold type)

Student Halli-Hallo koumlnnte mir bitte jemand die Regel 72 (vertikale streck-ung und stauchung des graphen) erklaumlren Danke im Voraus

Next-Action [Contextual Question]Bot Es scheint sich wohl um eine inhaltliche Frage zu handeln Bitte fassedeine Frage in einem kurzen Text zusammen damit ich sie an meinen men-schlichen Kollegen weiterleiten kann

Student Ich habe Schwierigkeiten beim Verstehen der Regel 72 Was istgenau die vertikale Streckung des Graphen

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 12 Showcase 4 Contextual Question Underlined are the key-wordswhich point on the contextual intent

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 35

Student Hallo ich kann beim Kapitel 1 nicht weiterkommen Trainingsauf-gabe 1 (a)

Next-Action [Ask for Level]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel Geometrie) oder um eine Aufgabe auf der Ebene eines Ab-schnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt Winkel imKapitel Geometrie) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlr Ab-schnitt

Student ich glaube es ist eine Sektion

Next-Action [Ask for Sub-Topic]Bot Wie heiszligt der Abschnitt in dem sich die Aufgabe befindet auf die sichdeine Frage bezieht

Student ich bearbeite rechenregeln und potenzen

Next-Action [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Quizd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student nope

Next-Action [Ask_For]Bot Welche der folgenden Punkten habe ich falsch erkannt Schreibe bittewelche der folgenden Punkte es sind a b c d e

Student c

State-Transition [Correct-Fields]Bot Gib bitte die korrekte Information fuumlr c) ein

Student Training

State-Transition [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Trainingd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yes

State-Transition [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 13 Showcase 5 Long Flow Correction of entries In bold type are theextracted entities

[ September 4 2020 at 1017 ndash classicthesis ]

36 multipurpose ipa via conversational assistant

Figure 4 Dialogue Flow Abbreviations UNK - unknown ID - informa-tional dictionary RegEx - regular expressions

[ September 4 2020 at 1017 ndash classicthesis ]

33 evaluation 37

33 evaluation

In order to get the feedback on the quality functionality and use-fulness of the proposed model we evaluated it in two steps firstwith an automated method (Section 331) utilizing 130 manually an-notated conversations to verify the robustness of the system Secondwith the help of human tutors (Section 332) from OMB+ to exam-ine the user experience We describe the details as well as the mostcommon errors below

331 Automated Evaluation

To manage an automated evaluation we manually assembled a datasetwith 130 conversational flows We predefined viable initial user ques-tions user answers as well as gold system responses These flowscover most frequent dialogues that we previously observed in human-to-human dialogue data We examined our system by implementinga self-chatting evaluation bot The evaluation cycle is as follows

bull The system accepts the first question from a predefined dia-logue via an API request preprocesses and analyzes it to de-termine the intent and extract entities

bull Then it estimates the appropriate next action and responds ac-cordingly

bull This response is then compared to the systems gold answerIf the predicted result is correct then the system receives thenext predefined user input from the templated dialogue and re-sponds again as defined above This procedure proceeds untilthe dialogue terminates (ie ID is complete) Otherwise thesystem reports an unsuccessful case number

The system successfully passed this evaluation for all 130 cases

332 Human Evaluation amp Error Analysis

To evaluate the system on irregular cases we carried out experimentswith human tutors from the OMB+ platform The tutors are experi-enced regarding rare or complex issues ambiguous responses mis-spellings and other infrequent but still relevant problems which oc-cur during a natural language dialogue In the following we reviewsome common errors and describe additional observations

misspellings and confusable spelling frequently befall in theuser-generated text Since we attempt to let the conversationremain natural to provide better user experience we do notinstruct students for formal writing Therefore we have to ac-count for various writing issues One of the prevalent problems

[ September 4 2020 at 1017 ndash classicthesis ]

38 multipurpose ipa via conversational assistant

are misspellings German words are commonly long and canbe complex and because users often type quickly it can lead tothe wrong order of characters within a given word To tacklethis challenge we utilized fuzzy match within ES Though themaximum allowed edit distance in ElasticSearch (ES) is set to 2characters which means that all the misspellings beyond thisthreshold could not be correctly identified by ES (eg Differenzial-rechnung vs Differnetialrechnung) Another representative exam-ple would be the formulation of the section or question num-ber The equivalent information can be communicated in sev-eral discrete ways which has to be considered by RegEx unit(eg Aufgabe 5 a Aufgabe V a Aufgabe 5 (a)) A similar prob-lem occurs with confusable spelling (ie Differenzialrechnung vsDifferenzialgleichung)

We analyzed the cases stated above and improved our modelby adding some of the most common misspellings to the Elas-ticSearch (ES) database or handling them with RegEx during thepreprocessing step

elasticsearch threshold We observed both cases where thesystem failed to extract information although the user providedit and examples where ES extracted information not mentionedin a user query According to our understanding these errorsoccur due to the relevancy scoring algorithm of ElasticSearch (ES)where a documentrsquos score is a combination of textual similarityand other metadata based scores Our examination revealedthat ES mostly fails to extract the information if the user mes-sage is rather short (eg about 5 tokens) To overcome this diffi-culty we coupled the current input ut with the dialogue history(h = utmiddotmiddotmiddot tminus2) That eliminated the problem and improved theretrieval quality

To solve the opposite problem where ES extracts false infor-mation (or information that was not mentioned in a query)was more challenging We learned that the problem comesfrom short words or sub-words (ie suffix prefix) which ES

locates in a database and considers them credible enough TheES documentation suggests getting rid of stop words to elim-inate this behavior However this did not improve the searchAlso fine-tuning of ES parameters such as the relevance thresholdprefix length5 and minimum should match6 parameter did not re-sult in notable improvements To cope with this problem weimplemented a verification step where a user can edit the erro-neously retrieved data

5The amount of fuzzified characters This parameter helps to decrease the num-ber of terms which must be reviewed

6Indicates a number of terms that must match for a document to be consideredrelevant

[ September 4 2020 at 1017 ndash classicthesis ]

34 structured dialogue acquisition 39

34 structured dialogue acquisition

As we stated the implemented system not only supports the humantutor by assisting students but also collects structured and labeledtraining data in the background In a trial run of the rule-based sys-tem we accumulated a toy-dataset with dialogues The assembleddataset includes the following

bull Plain dialogues with unique dialogue-indexes

bull Plain ID information collected for the whole conversation

bull Pairs of questions (ie user requests) and responses (ie botreplies) with the unique dialogue- and turn-indexes

bull Triples in the form of (Question Next Action Response) Informa-tion on the next systemrsquos action could be employed to train aDM unit with (deep-) machine learning algorithms

bull On every conversational state we keep the entities that the sys-tem was able to extract along with their position in the utter-ance This information could be used to train a custom domain-specific NER model

35 summary

In this work we realized a conversational agent for IPA purposesthat addresses two problems First it decreases repetitive and time-consuming activities which allows workers of the e-learning plat-form to focus solely on mathematical and hence cognitively demand-ing questions Second by interacting with users it augments theresources with structured and labeled training data The latter canbe utilized for further implementation of learnable dialogue compo-nents

The realization of such a system was connected with multiple chal-lenges Among others were missing structured conversational dataambiguous or erroneous user-generated text and the necessity todeal with existing corporate tools and their design

The proposed conversational agent enabled us to accumulate struc-tured labeled data without any special efforts from the human (ietutors) side (eg manual annotation and post-processing of existingdata change of the conversational structure within the tutoring pro-cess) Once we collected structured dialogues we were able to re-train particular segments of the system with deep learning methodsand achieved consistent performance for both of the proposed tasksWe report the re-implemented elements in Chapter 4

At the time of writing this thesis the system described in this workwas deployed at the OMB+ platform

[ September 4 2020 at 1017 ndash classicthesis ]

40 multipurpose ipa via conversational assistant

36 future work

Possible extensions of our work are

bull In this work we implemented a rule-based dialogue modelfrom scratch and adjusted it for the specific domain (ie e-learning) The core of the model ndash is a dialogue manager thatdetermines the current state of the conversation and the possi-ble next action Rule-based systems are generally consideredto be hardly adaptable to new domains however our dialoguemanager proved to be flexible to slight modifications in a work-flow One of the possible directions of future work would bethus the investigation of the general adaptability of the dialoguemanager core to other scenarios and domains (eg differentcourse)

bull A further possible extension would we the introduction of dif-ferent languages The current model is implemented for the Ger-man language but the OMB+ platform provides this course alsoin English and Chinese Therefore it would be interesting toinvestigate whether it is possible to adjust the existing systemto other languages without drastic changes in the model

[ September 4 2020 at 1017 ndash classicthesis ]

4D E E P L E A R N I N G R E - I M P L E M E N TAT I O N O F U N I T S

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research described inthis chapter was carried out in its entirety by the authorof this thesis The other authors of the publication actedas advisors

In this chapter we address the challenge of re-implementation oftwo central components of our IPA dialogue system described in Chap-ter 3 by employing deep learning techniques Since the data for thetraining was collected in a (short) trial-run of the rule-based systemit is not sufficient to train a neural network from scratch Thereforewe utilize Transfer Learning (TL) methods and employ the previouslypre-trained Bidirectional Encoder Representations from Transform-ers (BERT) model that we fine-tune for two of our custom tasks

41 introduction

Data mining and machine learning technologies have gained notableadvancement in many research fields including but not limited tocomputer vision robotics advanced analytics and computational lin-guistics (Pan and Yang 2009 Trautmann et al 2019 Werner et al2015) Besides that in recent years deep neural networks with theirability to accurately map from inputs to outputs (ie labels imagessentences) while analyzing patterns in data became a potential solu-tion for many industrial automatization tasks (eg fake news detec-tion sentiment analysis)

Nevertheless conventional supervised models still have limitationswhen applied to real-world data and industrial tasks The first chal-lenge here is a training phase since a robust model requires an im-mense amount of structured and labeled data Still there are just afew cases where such data is publicly available (eg research datasets)but for most of the tasks domains and languages the data has tobe gathered over a long time The second hurdle is that if trainedon existing data from a distinct domain a supervised model cannotgeneralize to conditions that are different from those faced in thetraining dataset Most of the machine learning methods perform cor-rectly only under a general assumption the training and test dataare mapped to the same feature space and the same distribution (Panand Yang 2009) When the distribution changes most statistical mod-els need to be retrained from scratch using newly obtained training

41

[ September 4 2020 at 1017 ndash classicthesis ]

42 deep learning re-implementation of units

samples It is especially crucial if applying such approaches to real-world data that is usually very noisy and contains a large numberof scenarios In this case a model would be ill-trained to make deci-sions about novel patterns Furthermore for most of the real-worldapplications it is expensive or even not feasible at all to recollect therequired training data to retrain the model Hence the possibility totransfer the knowledge obtained during the training on existing datato the new unseen domain is one of the possible solutions for indus-trial problems Such a technique is called Transfer Learning (TL) andit allows dealing with the abovementioned scenarios by leveragingexisting labeled data of some related tasks or domains

411 Outline and Contributions

This work addresses the challenge of re-implementing two out ofthree central components in our rule-based IPA conversational assis-tant with deep learning methods As we discussed in Chapter 2 hy-brid dialogue models (ie consisting of both rule-based and trainablecomponents) appear to replace the established rule- and template-based methods which are to the most part employed in the in-dustrial setting To investigate this assumption and examine currentstate-of-the-art deep learning techniques we conducted experimentson our custom domain-specific data Since the available data werecollected in a trial-run and the obtained dataset was relatively smallto train a conventional machine learning model from scratch we uti-lized the Transfer Learning (TL) approach and fine-tuned the existingpre-trained model for our target domain

For the experiments we defined two tasks

bull First we considered the NER problem in a custom domain set-ting We defined a sequence labeling task and employed a Bidirec-tional Encoder Representations from Transformers (BERT) (De-vlin et al 2018) as one of the transfer learning methods widelyutilized in the NLP area We applied the model on our datasetand fine-tuned it for seven (7) domain-specific (ie e-learning)entities

bull Second we examined the effectiveness of transfer learning forthe dialogue manager core The DM is one of the fundamentalparts of the conversational agent and requires compelling learn-ing techniques to hold the conversation intelligently For thatexperiment we defined a classification task and applied the BERT

model to predict the systems next action for every given userinput in a conversation We then computed the macro F1-scorefor 14 possible classes (ie actions) and an average dialogueaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

42 related work 43

The main characteristics of both settings are

bull the set of named entities and labels (ie actions) between sourceand target domain is different

bull for both settings we utilized the BERT model in German andmultilingual versions with additional parameters

bull the dataset used to train the target model is small and consistsof highly domain-specific labels The latter is often appearingin industrial scenarios

We finally verified that the selected method performed reasonablywell on both tasks We reached the performance of 093 macro F1

points for Named Entity Recognition (NER) and 075 macro F1 pointsfor the Next Action Prediction (NAP) task Therefore we concludethat both NER and NAP components could be employed to substituteor extend the existing rule-based modules

42 related work

As discussed in the previous section the main benefit of TransferLearning (TL) is that it allows the domains and tasks used duringtraining and testing phases to be different Initial steps in the for-mulation of transfer learning were made after NeurIPS-951 workshopthat was focused on the demand for lifelong machine-learning meth-ods that preserve and reuse previously learned knowledge (Chen etal 2018) Research on transfer learning has attracted more and moreattention since then especially in the computer vision domain andin particular for image recognition tasks (Donahue et al 2014 SharifRazavian et al 2014)

Recent studies have shown that TL can also be successfully appliedwithin the Natural Language Processing (NLP) domain especially onsemantically equivalent tasks (Dai et al 2019 Devlin et al 2018Mou et al 2016) Several research works were carried out specificallyon the Named Entity Recognition (NER) task and explored the capa-bilities of transfer learning when applied to different named entitycategories (ie different output spaces) For instance Qu et al (2016)pre-trained a linear-chain Conditional Random Fields (CRF) on a largeamount of labeled data in the source domain Then they introduceda two linear layer neural network that learns the difference betweenthe source and target label distributions Finally the authors initial-ized another CRF with parameters learned in linear layers to predictthe labels of the target domain Kim et al (2015) conducted experi-ments with transferring features and model parameters between re-lated domains where the label classes are different but may have asemantic similarity Their primary approach was to construct labelembeddings to automatically map the source and target label classesto improve the transfer (Chen et al 2018)

1Conference on Neural Information Processing Systems Formerly NIPS

[ September 4 2020 at 1017 ndash classicthesis ]

44 deep learning re-implementation of units

However there are also models trained in a manner such that theycould be applied to multiple NLP tasks simultaneously Among sucharchitectures are ULMFit (Howard and Ruder 2018) BERT (Devlin etal 2018) GPT2 (Radford et al 2019) and TransformerXL (Dai et al2019) ndash powerful models that were recently proposed for transferlearning within the NLP domain These architectures are called lan-guage models since they attempt to learn language structure from largetextual collections in an unsupervised manner and then predict thenext word in a sequence based on previously observed context wordsThe Bidirectional Encoder Representations from Transformers (BERT)(Devlin et al 2018) model is one of the most popular architectureamong the abovementioned and it is also one of the first that showeda close to human-level performance on the General Language Un-derstanding Evaluation (GLUE) benchmark2 (Wang et al 2018) thatincludes tasks on sentiment analysis question answering semanticsimilarity and language inference The architecture of BERT consistsof stacked transformer blocks that are pre-trained on a large corpusconsisting of 800M words from modern English books (ie BooksCor-pus by Zhu et al) and 2 500M words of text from pre-processed (iewithout markup) English Wikipedia articles Overall BERT advancedthe state-of-the-art performance for eleven (11) NLP tasks We describethis approach in detail in the subsequent section

43 bert bidirectional encoder representations from

transformers

Bidirectional Encoder Representations from Transformers (BERT) ar-chitecture is divided into two steps pre-training and fine-tuning Thepre-training step was enabled through two following tasks

bull Masked Language Model ndash the idea behind this approach is totrain a deep bidirectional representation that is different fromconventional language models which are either trained left-to-right or right-to-left To train a bidirectional model the authorsmask out a certain number of tokens in a sequence The modelis then asked to predict the original words for masked tokens Itis essential that in contrast to denoising auto-encoders (Vincentet al 2008) the model does not need to reconstruct the entireinput instead it attempts to predict the masked words Sincethe model does not know which words exactly it will be askedabout it learns a representation for every token in a sequence(Devlin et al 2018)

bull Next Sentence Prediction ndash aims to capture relationships be-tween two sentences A and B In order to train the model theauthors pre-trained a binarized next sentence prediction taskwhere pairs of sequences were labeled according to the criteriaA and B either follow each other directly in the corpus (label

2httpsgluebenchmarkcom

[ September 4 2020 at 1017 ndash classicthesis ]

43 bert bidirectional encoder representations from transformers 45

IsNext) or are both taken from random places (label NotNext)The model is then asked to predict whether the former or thelatter is the case The authors demonstrated that pre-trainingtowards this task is extremely useful for both question answeringand natural language inference tasks (Devlin et al 2018)

Furthermore BERT uses WordPiece tokenization for embeddings(Wu et al 2016) which segments words into subword-level tokensThis approach allows the model to make additional inferences basedon word structure (ie -ing ending denotes similar grammatical cat-egories)

The fine-tuning step follows the pre-training process For that theBERT model is initialized with parameters obtained during the pre-training step and a task-specific fully-connected layer is used aftertransformer blocks which maps the general representations to cus-tom labels (employing softmax operation)

self-attention The key operation of BERT architecture is theself-attention which is a sequence-to-sequence operation with conven-tional encoder-decoder structure (Vaswani et al 2017) The encodermaps an input sequence (x1 x2 xn) to a sequence of continuousrepresentations z = (z1 z2 zn) Given z the decoder then gener-ates an output sequence (y1y2 ym) of symbols one element ata time (Vaswani et al 2017) To produce output vector yi the selfattention operation takes a weighted average over all input vectors

yi =sumj

wijxj (16)

where j is an index over the entire sequence and the weights sumto one (1) over all j The weight wij is derived from a dot productfunction over xi and xj (Bloem 2019 Vaswani et al 2017)

w primeij = xTi xj (17)

The dot product returns a value between negative and positive in-finity and a softmax maps these values to [0 1] to ensure that theysum to one (1) over the entire sequence (Bloem 2019 Vaswani et al2017)

wij =expwlsquoijsumj expwlsquoij

(18)

A self-attention is not the only operation in transformers but is theessential one since it propagates information between vectors Otheroperations in the transformer are applied to every vector in the inputsequence without interactions between vectors (Bloem 2019 Vaswaniet al 2017)

[ September 4 2020 at 1017 ndash classicthesis ]

46 deep learning re-implementation of units

44 data amp descriptive statistics

The benchmark that we accumulated through the trial-run includes300 structured and partially labeled dialogues with the average lengthof dialogue being six (6) utterances All the conversations were heldin the German language Detailed general statistics can be found inTable 14

Value

Max Len Dialogue (in utterances) 15

Avg Len Dialogue (in utterances) 6

Max Len Utterance (in tokens) 100

Avg Len Utterance (in tokens) 9

Overall Unique Action Labels 13

Overall Unique Entity Labels 7

Train ndash Dialogues ( Utterances) 200 (1161)

Eval ndash Dialogues ( Utterances) 50 (279)

Test ndash Dialogues ( Utterances) 50 (300)

Table 14 General statistics for conversational dataset

The dataset covers 14 unique next actions and seven (7) uniquenamed entities Detailed information on next actions and entities canbe found in Table 15 and Table 16 Below we listed possible actionswith the corresponding explanation and predefined mapped state-ments

unk ndash requests to define the intent of the question mathematicalorganizational or contextual (ie fallback policy in case if theautomated intent recognition unit fails) rarr ldquoHast du eine Fragezu einer Aufgabe (MATH) zu einem Text im Kurs (TEXT) oder eineorganisatorische Frage (ORG)rdquo

org ndash handovers discussion to a human tutor in case of organiza-tional question rarr ldquoEs scheint sich wohl um eine organisatorischeFrage zu handeln Bitte fasse deine Frage in einem kurzen Text zusam-men damit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

text ndash handovers discussion to a human tutor in case of contextualquestion rarr ldquoEs scheint sich wohl um eine inhaltliche Frage zuhandeln Bitte fasse deine Frage in einem kurzen Text zusammendamit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

[ September 4 2020 at 1017 ndash classicthesis ]

44 data amp descriptive statistics 47

topic ndash requests information on the topic rarr ldquoWelches Kapitel bear-beitest du gerade Antworte bitte mit der Kapitelnummer zB IA IB II oder mit dem Kapitelnamen zB Geometrierdquo

examination ndash requests information about the examination moderarrldquoBearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfungrdquo

subtopic ndash requests information about the subtopic in case of quizor section-level trainingexercise modes rarr ldquoWie heiszligt der Ab-schnitt in dem sich die Aufgabe befindet auf die sich deine Fragebeziehtrdquo

level ndash requests information about the level of examination modechapter or section rarr ldquoGeht es um eine Aufgabe auf Kapitel-Ebene(zB um eine Trainingsaufgabe im Kapitel Geometrie) oder um eineAufgabe auf der Ebene eines Abschnitts in einem Kapitel (zB umeine Aufgabe zum Abschnitt Winkel im Kapitel Geometrie) Antwortebitte mit KAP fuumlr Kapitel und SEK fuumlr Abschnittrdquo

question number ndash requests information about the question num-ber rarr ldquoWelche Aufgabe bearbeitest du gerade Wenn du zB beiTeilaufgabe c in der zweiten Trainingsaufgabe (oder der zweiten Uumlbungs-aufgabe) bist antworte mit 2c Bearbeitest du gerade einen Quiz odereine Schlusspruumlfung gib bitte nur die Teilaufgabe anrdquo

final request ndash considers the Informational Dictionary (ID) to becomplete and initiates the verification step rarr ldquoDeine Angabenwaren wie folgt Habe ich dich richtig verstanden Bitte antwortemit Ja oder Neinrdquo

verify request ndash requests the student to verify the assembled in-formation rarr ldquoWelche der folgenden Punkte habe ich falsch erkanntSchreibe bitte welche der folgenden Punkte es sind [a b c ]3 rdquo

correct request ndash requests the student to provide a correct infor-mation if there is an error in the assembled data rarr ldquoGib bittedie korrekte Information fuumlr [a]4 einrdquo

exact question ndash requests the student to provide the detailed ex-planation of the problem rarr ldquoBitte fasse deine Frage in einemkurzen Text zusammen damit ich sie an meinen menschlichen Kolle-gen weiterleiten kannrdquo

human handover ndash finalizes the conversation rarr ldquoVielen Dankunsere menschlichen Kollegen melden sich gleich bei dirrdquo

3Variables that depend on the state of Informational Dictionary (ID)4Variable that was selected as erroneous and needs to be corrected

[ September 4 2020 at 1017 ndash classicthesis ]

48 deep learning re-implementation of units

Action Counts Action Counts

Final Request 321 Unk 80

Human Handover 300 Subtopic 55

Exact Question 286 Correct Request 40

Question Number 176 Verify Request 34

Examination 175 Org 17

Topic 137 Text 13

Level 130

Table 15 Detailed statistics on possible systems actions Columns Countsdenote the number of occurrences of each action in the entiredataset

Entity Counts

Question Nr 317

Chapter 311

Examination 303

Subtopic 198

Level 80

Intent 70

Table 16 Detailed statistics on possible named entities Column Counts de-note the number of occurrences of each entity in the entire dataset

45 named entity recognition

We defined a sequence labeling task to extract custom entities fromuser input We considered seven (7) viable entities (see Table 16) tobe recognized by the model topic subtopic examination mode and levelquestion number intent as well as the entity other for remaining wordsin the utterance Since the data collected from the rule-based sys-tem already includes information on the entities we able to train adomain-specific NER unit However since the original user-input wasinformal the same information could be provided in different writ-ing styles That means that a single entity could have different surfaceforms (eg synonyms writing styles) Therefore entities that we ex-tracted from the rule-based system were transformed into a universalstandard (eg official chapter names) To consider all of the variableentity forms while post-labeling the original dataset we determinedgeneric entity names (eg chapter question nr) and mapped vari-ations of entities from the user input (eg Chapter = [ElementaryCalculus Chapter I ]) to them The overall dataset consists of 300labeled dialogues where 200 (with 1161 utterances) were employedfor training and 100 for evaluation and test sets (50 dialogues withca 300 utterances for each set respectively)

[ September 4 2020 at 1017 ndash classicthesis ]

46 next action prediction 49

451 Model Settings

We conducted experiments with German and multilingual BERT im-plementations5 The capitalization of words is significant for the Ger-man language therefore we run the experiments on the capitalizeddata while preserving the original punctuation We employed the avail-able base model for both multilingual and German BERT implemen-tations in the cased version We initiated the learning rate for bothmodels to 1e minus 4 and the maximum length of the tokenized inputwas set to 128 tokens We conducted the experiments multiple timeswith different seeds for a maximum of 50 epochs with the trainingbatch size set to 32 We utilized AdamW as the optimizer and appliedearly stopping if the performance did not improve significantly after5 epochs

46 next action prediction

We defined a classification problem where we aim to predict the sys-temrsquos next action according to the given user input We assumed 14custom actions (see Table 15) that we considered being our classesWhile collecting the toy dataset every input was automatically la-beled by the rule-based system with the corresponding next actionand the dialogue-id Thus no additional post-labeling was requiredon this step

We investigated two settings for this task

bull Default Setting Utilizing a user input and the correspondingclass (ie next action) without any supplementary context Bydefault we conduct all of our experiments in this setting

bull Extended Setting Employing a user input corresponding nextaction and previous systems action as a source of supplementarycontext For this setting we experimented with the best per-forming model from the default setting

The overall dataset consists of 300 labeled dialogues where 200(with 1161 utterances) were employed for training and 100 for evalu-ation and test sets (50 dialogues with ca 300 utterances for each setrespectively)

461 Model Settings

For the NAP task we carried out experiments with German and multi-lingual BERT implementations as well We examined the performanceof both capitalized and lower-cased inputs as well as plain and prepro-cessed data For the multilingual BERT we used the base model inboth cased and uncased modifications For the German BERT we

5httpsgithubcomhuggingfacetransformers

[ September 4 2020 at 1017 ndash classicthesis ]

50 deep learning re-implementation of units

utilized the base model in the cased modification only6 For bothmodels we initiated the learning rate to 4e minus 5 and the maximumlength of the tokenized input was set to 128 tokens We conductedthe experiments multiple times with different seeds for a maximumof 300 epochs with the training batch size set to 32 We utilized AdamW

as the optimizer and applied early stopping if the performance didnot improve significantly after 15 epochs

47 evaluation and results

To evaluate the models we computed word-level macro F1 score forthe Named Entity Recognition (NER) task and utterance-level macro F1

score for the Next Action Prediction (NAP) task The word-level F1

is estimated as the average of the F1 scores per class each computedfrom all words in the evaluation and test sets The outcomes for theNER task are represented in Table 17 For utterance-level F1 a singleclass label (ie next action) is taken for the whole utterance Theresults for the NAP task are shown in Table 18

We additionally computed average dialogue accuracy for the best per-forming NAP models This score indicates how well the predictednext actions compose the conversational flow The average dialogueaccuracy was estimated for 50 conversations in the evaluation andtest sets respectively The results are displayed in Table 19

The obtained results for the NER task revealed that German BERT

performed significantly better than the multilingual BERT model Theperformance of the custom NER unit is at 093 macro F1 points for allpossible named entities (see Table 17) In contrast for the NAP taskthe multilingual BERT model obtained better performance than theGerman BERT model The best performing method in the default set-ting gained a macro F1 of 0677 points for 14 possible classes whereasthe model in the extended setting performed better ndash its best macroF1 score is 0752 for the equal number of classes (see Table 18)

Regarding the dialogue accuracy the extended system fine-tunedwith multilingual BERT obtained better outcomes as the default one(see Table 19) The overall conclusion for the NAP is that the capi-talized setting increased the performance of the model whereas theinclusion of punctuation has not influenced the results

6Uncased pre-trained modification of model was not available

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 51

Task Model Cased Punct F1

Eval Test

NER GER X X 0971 0930

NER Mult X X 0926 0905

Table 17 Word-level F1 for the Named Entity Recognition (NER) task Inbold best performance for evaluation and test sets

Task Model Cased Punct Extended F1

Eval Test

NAP GER X times times 0711 0673

NAP GER X X times 0701 0606

NAP Mult X X times 0688 0625

NAP Mult X times times 0769 0677

NAP Mult X times X 0810 0752

NAP Mult times X times 0664 0596

NAP Mult times times times 0742 0502

Table 18 Utterance-level F1 for the Next Action Prediction (NAP) task Un-derlined best performance for evaluation and test sets for defaultsetting (without previous action context) In bold best perfor-mance for evaluation and test sets on extended setting (with previ-ous action context)

Model Accuracy

Eval Test

NAP default 0765 0724

NAP extended 0813 0801

Table 19 Average dialogue accuracy computed for the Next Action Predic-tion (NAP) task for best preforming models In bold best perfor-mance for evaluation and test sets

[ September 4 2020 at 1017 ndash classicthesis ]

52 deep learning re-implementation of units

Figure5M

acroF1

evaluationN

AP

ndashdefault

modelC

omparison

between

Germ

anand

multilingual

BERT

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 53

Figu

re6

Mac

roF1

test

NA

Pndash

defa

ult

mod

elC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

54 deep learning re-implementation of units

Figure7M

acroF1

evaluationN

AP

ndashextended

model

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 55

Figu

re8

Mac

roF1

test

NA

Pndash

exte

nded

mod

el

[ September 4 2020 at 1017 ndash classicthesis ]

56 deep learning re-implementation of units

Figure9M

acroF1

evaluationN

ERndash

Com

parisonbetw

eenG

erman

andm

ultilingualBER

T

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 57

Figu

re1

0M

acroF1

test

NER

ndashC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

58 deep learning re-implementation of units

48 error analysis

We then investigated the cases where models failed to predict thecorrect action or labeled the named entity span erroneously Belowwe outline the most common errors

next action prediction One of the prevalent flaws in the de-fault model was the mismatch between two consecutive actionsndash the action question number and subtopic That is due to theposition of these actions in the conversational flow The occur-rence of both actions in the dialogue is not strict and largelydepends on the previous action To explain this we illustratethe following possible conversational scenarios

bull The system can either directly proceed with the request forthe question number if the examination mode is the finalexamination

bull If the examination mode is training or exercise the systemhas to ask about the level first (ie chapter or section)

bull If it is a chapter-level the system can again directly pro-ceed with the request for the question number

bull In the case of section-level the system has to ask aboutthe subsection first (if this information was not providedbefore)

The examination of the extended model showed that the inductionof supplementary context (ie previous action) improved theperformance of the system by about 60

named entity recognition The cases where the system failedcover mismatches between the labels chapter and other andthe labels question number and other This failure arose dueto the imperfectly labeled span of a multi-word named entity(eg ldquoElementary Calculus Proportionality Percentagerdquo) The firstor last words in the named entity were excluded from the spanand erroneously labeled with the other label

49 summary

In this chapter we re-implemented two of three fundamental unitsof our IPA conversational agent ndash Named Entity Recognition (NER)and Dialogue Manager (DM) ndash using the methods of Transfer Learn-ing (TL) specifically the Bidirectional Encoder Representations fromTransformers (BERT) model

We believe that the obtained results are reasonably high consider-ing a comparatively little amount of training data we employed tofine-tune the models Therefore we conclude that both NAP and NER

segments could be used to replace or extend the existing rule-based

[ September 4 2020 at 1017 ndash classicthesis ]

410 future work 59

modules Rule-based components as well as ElasticSearch (ES) arelimited in their capacities and not flexible to novel patterns Whereasthe trainable units generalize better and could possibly decrease theamount of inaccurate predictions in case of accidental dialogue be-havior Furthermore to increase the overall robustness rule-basedand trainable components could be used synchronously when onesystem fails (eg ElasticSearch (ES) module can not extract an entity)the conversation continues on the prediction obtained from the othermodel

410 future work

There are possible directions for future work

bull At the time of writing of this thesis both units were trained andinvestigated on its own ndash that means without being incorpo-rated into the initial IPA conversational system Thus one of thepotential directions of future work would be the embodimentof these units into the original system and the examination ofpossible failure cases Furthermore the resulting hybrid sys-tem could then imply the constraint of additional meta policiesto prevent an unexpected systems behavior

bull Further investigation could be towards the multi-language mo-dality for the re-implemented units Since the Online Mathe-matik Bruumlckenkurs (OMB+) platform also supports English andChinese it would be interesting to examine whether the sim-ple translation from target language (ie English Chinese) tosource language (ie German) would be sufficient to employalready-assembled dataset and pre-trained units Presumablyit should be achievable in the case of the Next Action Predic-tion (NAP) unit but could lead to a noticeable performance dropfor the Named Entity Recognition (NER) task since there couldbe a considerable gap between translated and original entitiesspans

bull Last but not least one of the further extensions for the IPA

would be the eventual applicability for more cognitively de-manding tasks For instance it is of interest to extend the modelfor more complex domains like handling of mathematical ques-tions Would it be possible to automate the human assistance atleast by less complicated mathematical inquiries or is it still un-achievable with currently existing NLP techniques and machinelearning methods

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

Part II

C L U S T E R I N G - B A S E D T R E N D A N A LY Z E R

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

5F O U N D AT I O N S

As discussed in Chapter 1 predictive analytics is one of the poten-tial Intelligent Process Automation (IPA) tasks due to its capabilityto forecast future events by using current and historical data andtherethrough assist knowledge workers with for instance sales or in-vestments Modern machine learning algorithms allow an intelligentand fast analysis of large amounts of data However the evaluationof such algorithms and models remains one of the grave problemsin this domain This is mostly due to the lack of publicly availablebenchmarks for the evaluation of topic modeling and trend detectionalgorithms Hence in this part of the thesis we are addressing thischallenge and assembling the benchmark for detecting both trendsand downtrends (ie topics that become less frequent overtime) Tothe best of our knowledge the task of downtrend detection has notbeen well addressed before

In this chapter we introduce the concepts required by the secondpart of the thesis We begin with the enlightenment to the topic mod-eling task and define the emerging trend in Section 51 Next weexamine the related work existing approaches to topic modeling andtrend detection and their limitations in Section 52 In Chapter 6we introduce our TRENDNERT benchmark for trend and downtrenddetection and describe the method we employed for its assembly

51 motivation and challenges

Science changes and evolves rapidly novel research areas emergewhile others fade away Keeping pace with these changes is challeng-ing Therefore recognizing and forecasting emerging research trends isof significant importance for researchers academic publishers as wellas for funding agencies stakeholders and innovation companies

Previously the task of trend detection was mostly solved by do-main experts who used specialized and cumbersome tools to inves-tigate the data to get useful insights from it (eg through data visu-alization or statistics) However manual analysis of large amountsof data could be time-consuming and hiring domain experts is verycostly Furthermore the overall increase of research data (ie GoogleScholar PubMed DBLP) in the past decade makes the approachbased on human domain experts less and less scalable In contrastautomated approaches become a more desirable solution for the taskof emerging trend identification (Kontostathis et al 2004 Salatino2015) Due to a large number of electronic documents appearing on

63

[ September 4 2020 at 1017 ndash classicthesis ]

64 foundations

the web daily several machine learning techniques were developed tofind patterns of word occurrences in those documents using hierarchi-cal probabilistic models These approaches are called ndash topic modelsand they allow the analysis of extensive textual collections and valu-able insights from them The main advantage of topic models is theirability to discover hidden patterns of word-use and connecting docu-ments containing similar patterns In other words the idea of a topicmodel is that documents are a mixture of topics where a topic is de-fined as a probability distribution over words (Alghamdi and Alfalqi2015)

Topics have a property of being able to evolve thus modeling top-ics without considering time may confuse the precise topic identifi-cation Whereas modeling topics by considering time is called topicevolution modeling and it can reveal important hidden informationin the document collection allowing recognizing topics with the ap-pearance of time This approach also provides the ability to check theevolution of topics during a given time window and examine whethera given topic is gaining interest and popularity over time (ie becom-ing an emerging trend) or if its development is stable Considering thelatter approach we turn to the task of emerging trend identification

How is an emerging trend defined An emerging trend is a topicthat is gaining interest and utility over time and it has two attributesnovelty and growth (Small Boyack and Klavans 2014 Tu and Seng2012) Rotolo Hicks and Martin (2015) explain the correlations ofthese two attributes in the following way Up to the point of theemergence a research topic is specified by a high level of novelty Atthat time it does not attract any considerable attention from the sci-entific community and due to the limited impact its growth is nearlyflat After some turning points (ie the appearance of significant sci-entific publications) the research topic starts to grow faster and maybecome a trend if the growth is fast however the level of a noveltydecreases continuously once the emergence becomes clear Then atthe final phase the topic becomes well-established while its noveltylevels out (He and Chen 2018)

To detect trends in textual data one employs computational anal-ysis techniques and Natural Language Processing (NLP) In the sub-sequent section we give an overview of the existing approaches fortopic modeling and trend detection

52 detection of emerging research trends

The task of trend detection is closely associated with the topic mod-eling task since in the first instance it detects the latent topics in thesubstantial collection of data It secondarily employs techniques toinvestigate the evolution of the topic and whether it has the potentialto become a trend

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 65

Topic modeling has an extensive history of applications in the scien-tific domain including but not limited to studies of temporal trends(Griffiths and Steyvers 2004 Wang and McCallum 2006) and investi-gation of the impact prediction (Yogatama et al 2011) Below we givea general overview of the most significant methods used in this do-main explain their importance and if any the potential limitations

521 Methods for Topic Modeling

In order to extract a topic from textual documents (ie research pa-pers blog posts) the precise definition of the model for topic repre-sentation is crucial

latent semantic analysis or formerly called Latent SemanticIndexing is one of the fundamental methods in the topic model-ing domain proposed by Deerwester et al in 1990 Its primarygoal is to generate vector-based representations for documentsand terms and then employ these representations to computethe similarity between documents (resp terms) in order to se-lect the most related ones In this regard Latent Semantic Anal-ysis (LSA) takes a matrix of documents and terms and then de-composes it into a separate document-topic matrix and a topic-term matrix Before turning to the central task of finding latenttopics that capture the relationships among the words and docu-ments it also performs dimensionality reduction utilizing trun-cated Singular Value Decomposition This technique is usuallyapplied to reduce the sparseness noisiness and redundancyacross dimensions in the document-topic matrix Finally usingthe resulting document and term vectors one applies measuressuch as cosine similarity to evaluate the similarity of differentdocuments words or queries

One of the extensions to traditional LSA is probabilistic LatentSemantic Analysis (pLSA) that was introduced by Hofmann in1999 to fix several shortcomings The foremost advantage ofthe follow-up method was that it could successfully automatedocument indexing which in turn improved the LSA in a prob-abilistic sense by using a generative model (Alghamdi and Al-falqi 2015) Furthermore the pLSA can differentiate betweendiverse contexts of word usage without utilizing dictionariesor a thesaurus That affects better polysemy disambiguationand discloses typical similarities by grouping words that sharea similar context Among the works that employ pLSA is theapproach proposed by Gohr et al (2009) where authors applypLSA in a window that slides across the stream of documents toanalyze the evolution of topics Also Mei et al (2008) uses thepLSA in order to create a network of topics

[ September 4 2020 at 1017 ndash classicthesis ]

66 foundations

latent dirichlet allocation was developed by Blei Ng andJordan (2003) to overcome the limitations of both LSA and pLSA

models Nowadays Latent Dirichlet Allocation (LDA) is ex-pected to be one of the most utilized and robust probabilistictopic models The fundamental concept of LDA is the assump-tion that every document is represented as a mixture of topicswhere each topic is a discrete probability distribution that de-fines the probability of each word to appear in a given topicThese topic probabilities provide a compressed representationof a document In other words LDA models all of d documentsin the collection as a mixture over n latent topics where eachof the topics describes a multinomial distribution over awwordin the vocabulary (Alghamdi and Alfalqi 2015)

LDA fixes such weaknesses of the pLSA like the incapacity toassign a probability to previously unseen documents (becausepLSA learns the topic mixtures only for documents seen in thetraining phase) and the overfitting problem (ie the number ofparameters in pLSA increases linearly with the size of the corpus)(Salatino 2015)

An extension of the conventional LDA is the hierarchical LDA

(Griffiths et al 2004) where topics are grouped in hierarchiesEach node in the hierarchy is associated with a topic and a topicis a distribution across words Another comparable approach isthe Relational Topic Model (Chang and Blei 2010) which is amixture of a topic model and a network model for groups oflinked documents The attribute of every document is its text(ie discrete observations taken from a fixed vocabulary) andthe links between documents are hyperlinks or citations

LDA is a prevalent method for discovering topics (Blei and Laf-ferty 2006 Bolelli Ertekin and Giles 2009 Bolelli et al 2009Hall Jurafsky and Manning 2008 Wang and Blei 2011 Wangand McCallum 2006) However it was explored that to trainand fine-tune a stable LDA model could be very challenging andtime-consuming (Agrawal Fu and Menzies 2018)

correlated topic model One of the main drawbacks of the LDA

is its incapability of capturing dependencies among the topics(Alghamdi and Alfalqi 2015 He et al 2017) Therefore Bleiand Lafferty (2007) proposed the Correlated Topic Model as anextension of the LDA approach In this model the authors re-placed the Dirichlet distribution with a logistic-normal distribu-tion that models pairwise topic correlations with the Gaussiancovariance matrix (He et al 2017)

However one of the shortcomings of this model is the compu-tational cost The number of parameters in the covariance ma-trix increases quadratic to the number of topics and parameterestimation for the full-rank matrix can be inaccurate in high-dimensional space (He et al 2017) Therefore He et al (2017)

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 67

proposed their method to overcome this problem The authorsintroduced the model which learns dense topic embeddingsand captures topic correlations through the similarity betweenthe topic vectors According to He et al their method allowsefficient inference in the low-dimensional embedding space andsignificantly reduces previous cubic or quadratic time complex-ity to linear concerning the topic number

522 Topic Evolution of Scientific Publications

Although the topic modeling task is essential it has a significant off-spring task namely the modeling of topic evolution Methods able toidentify topics within the context of time are highly applicable in thescientific domain They allow understanding of the topic lineage andreveal how research on one topic influences another Generally thesemethods could be classified according to the primary sources of in-formation they employ text citations or key phrases

text-based Extensive textual collections have dynamic co-occurre-nces of word patterns that change over time Thus models thattrack topics evolution over time should consider both word co-occurrence patterns and the timestamps Blei and Lafferty 2006

In 2006 Wang and McCallum proposed a Non-Markov Continu-ous Time model for the detection of topical trends in the col-lection of NeurIPS publications (the overall timestamp of 17years was considered) The authors made the following as-sumption If the pattern of the word co-occurrence exists fora short time then the system creates a narrow-time-distributiontopic Whereas for the long-term patterns system generated abroad- time-distribution topic The principal objective of this ap-proach is that it models topic evolution without discretizing thetime ie the state at time t + t1 is independent of the state attime t (Wang and McCallum 2006)

Another work done by Blei and Lafferty (2006) proposed the Dy-namic Topic Models The authors assumed that the collectionof documents is organized based on certain time spans and thedocuments belonging to every time span are represented witha K-component model Thereby the topics are associated withtime span t and emerge from topics corresponding to span timetminus 1 One of the main differences of this model compared toother existing approaches is that it utilizes Gaussian distribu-tion for the topic parameters instead of the conventional Dirich-let distribution and therefore can capture the evolution overvarious time spans

[ September 4 2020 at 1017 ndash classicthesis ]

68 foundations

citation-based analysis is the second popular direction that isconsidered to be effective for trend identification He et al 2009Le Ho and Nakamori 2005 Shibata Kajikawa and Takeda2009 Shibata et al 2008 Small 2006 This method assumes thatcitations in their various forms (ie bibliographic citations co-citation networks citation graphs) indicate the meaningful rela-tionship between topics and uses citations to model the topicevolution in the scientific domain Nie and Sun (2017) utilizethis approach along with the Latent Dirichlet Allocation (LDA)and k-means clustering to identify research trends The authorsfirst use LDA to extract features and determine the optimal num-ber of topics Then they use k-means to obtain thematic clus-ters and finally they compute citation functions to identify thechanges of clusters over time

Though the citation-based approachrsquos main drawback is that aconsistent collection of publications associated with a specificresearch topic is required for these techniques to detect the clus-ter and novelty of the particular topic He and Chen 2018 Fur-thermore there are also many trend detection scenarios (egresearch news articles or research blog posts) in which citationsare not readily available That makes this approach not applica-ble to this kind of data

keyphrase-based Another standard solution is based on the useof keyphrase information extracted from research papers In thiscase every keyword is representative of a single research topicSystems like Saffron (Monaghan et al 2010) as well as Saffron-based models use this approach Saffron is a research toolkitthat provides algorithms for keyphrase extraction entity link-ing and taxonomy extraction Asooja et al (2016) utilize Saffronto extract the keywords from LREC proceedings and then pro-posed trend forecast based on regression models as an extensionto Saffron

However this method raises many questions including the fol-lowing how to deal with the noisiness of keywords (ie notrelevant keywords) and how to handle keywords that have theirhierarchies (ie sub-areas) Other drawbacks of this approachinclude the ambiguity of keywords (ie ldquoJavardquo as an island andldquoJavardquo as a programming language) or necessity to deal withsynonyms which could be treated as different topics

53 summary

An emerging trend detection algorithm aims to recognize topics thatwere earlier inappreciable but now gaining the importance and in-terest within the specific domain Knowledge of emerging trends isespecially essential for stakeholders researchers academic publish-ers and funding organizations Assume a business analyst in the

[ September 4 2020 at 1017 ndash classicthesis ]

53 summary 69

biotech company who is interested in the analysis of technical arti-cles for recent trends that could potentially impact the companiesinvestments A manual review of all the available information in aspecific domain would be extremely time-consuming or even not fea-sible However this process could be solved through Intelligent Pro-cess Automation (IPA) algorithms Such software would assist humanexperts who are tasked with identifying emerging trends through au-tomatic analysis of extensive textual collections and visualization ofoccurrences and tendencies of a given topic

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

6B E N C H M A R K F O R S C I E N T I F I C T R E N D D E T E C T I O N

This chapter covers work already published at interna-tional peer-reviewed conferences The relevant publica-tion is Moiseeva and Schuumltze 2020 The research describedin this chapter was carried out in its entirety by the authorof the thesis The other author of the publication acted asan advisor

Computational analysis and modeling of the evolution of trendsis a significant area of research because of its socio-economic impactHowever no large publicly available benchmark for trend detectioncurrently exists making a comparative evaluation of methods impos-sible We remedy this situation by publishing the benchmark TREND-NERT consisting of a set of gold trends (resp downtrends) and doc-ument labels that is available as an unlimited download and a largeunderlying document collection that can also be obtained for free Wepropose Mean Average Precision (MAP) as an evaluation measure fortrend detection and apply this measure in an investigation of severalbaselines

61 motivation

What is an emerging research topic and how is it defined in the litera-ture At the time of emergence a research topic often does not attractmuch recognition from the scientific society and is exemplified by justa few publications He and Chen (2018) associate the appearance ofa trend to some triggering event eg the publication of an article ina high-impact journal Later the research topic starts to grow fasterand becomes a trend (He and Chen 2018)

We adopt this as our key representation of trend in this paper atrend is a research topic with a strongly increasing number of publi-cations in a particular time interval In opposite to trends there arealso downtrends which refer to the topics that move lower in theirdemand or growth as they evolve Therefore we define a downtrendas the converse to a trend a downtrend is a research topic that has astrongly decreasing number of publications in a particular time inter-val Downtrends can be as important to detect as trends eg for afunding agency planning its budget

To detect both types of topics (ie trends and downtrends) intextual data one employs computational analysis techniques and NLPThese methods enable the analysis of extensive textual collectionsand gain valuable insights into topical trendiness and evolution over

71

[ September 4 2020 at 1017 ndash classicthesis ]

72 benchmark for scientific trend detection

time Despite the overall importance of trend analysis there is to ourknowledge no large publicly available benchmark for trend detectionThis makes a comparative evaluation of methods and algorithms im-possible Most of the previous work has used proprietary or moder-ate datasets or evaluation measures were not directly bound to theobjectives of trend detection (Gollapalli and Li 2015)

We remedy this situation by publishing the benchmark TREND-NERT1 The benchmark consists of the following

1 A set of gold trends compiled based on an extensive analysis ofthe meta-literature

2 A set of labeled documents (based on more than one millionscientific publications) where each document is assigned to atrend a downtrend or the class of flat topics and supported bythe topic-title

3 An evaluation script to compute Mean Average Precision (MAP)as the measure for the accuracy of trend detection

We make the benchmark available as unlimited downloads2 Theunderlying research corpus can be obtained for free from SemanticScholar3 (Ammar et al 2018)

611 Outline and Contributions

In this work we address the challenge of the benchmark creationfor (down)trend detection This chapter is structured as follows Sec-tion 62 introduces our model for corpus creation and its fundamen-tal components In Section 63 we present the crowdsourcing proce-dure for the benchmark labeling Section 65 outlines the proposedbaseline for trend detection our experimental setup and outcomesFinally we illustrate the details on the distributed benchmark in Sec-tion 66 and conclude in Section 67

Our contributions within this work are as follows

bull TRENDNERT is the first publicly available benchmark for (down)-trend detection

bull TRENDNERT is based on a collection of more than one milliondocuments It is among the largest that has been used for trenddetection and therefore offers a realistic setting for developingtrend detection algorithms

bull TRENDNERT addresses the task of detecting both trends anddowntrends To the best of our knowledge the task of down-trend detection has not been addressed before

1The name is a concatenation of trend and its ananym dnert Because it supportsthe evaluation of both trends and downtrends

2httpsdoiorg1017605OSFIODHZCT3httpsapisemanticscholarorgcorpus

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 73

62 corpus creation

In this section we explain the methodology we used to create theTRENDNERT benchmark

621 Underlying Data

We employ a corpus provided by Semantic Scholar4 It includes morethan one million papers published mostly between 2000 and 2015

in about 5000 computer science journals and conference proceedingsEvery record in the collection consists of a title key phrases abstractfull text and metadata

622 Stratification

The distribution of documents over time in the entire original datasetis skewed (see Figure 11) We discovered that clustering the entire col-lection as well as a random sample produces corrupt results becausemore weight is given to later years than to earlier years Thereforewe first generated a stratified sample of the original document collec-tion To this end we randomly pick 10 000 documents for each yearbetween 2000 and 2016 for an overall sample size of 160 000

Figure 11 Overall distribution of papers in the entire dataset Years 1975 to2016

4httpswwwsemanticscholarorg

[ September 4 2020 at 1017 ndash classicthesis ]

74 benchmark for scientific trend detection

623 Document representations amp Clustering

As we mentioned this workrsquos primary focus was the creation of abenchmark not the development of new algorithms or models fortrend detection Therefore for our experiments we have selected analgorithm based on k-means clustering as a simple baseline methodfor trend detection

In conventional document clustering documents are usually repre-sented as bag-of-words (BOW) feature vectors (Manning Raghavanand Schuumltze 2008) Those however have a major weakness theyneglect the semantics of words (Le and Mikolov 2014) Still the re-cent work in representation learning domain and particularly doc2vecndash the method proposed by Le and Mikolov (2014) ndash can provide rep-resentations to document clustering that overcome this weakness

The doc2vec5 algorithm is inspired by techniques for learning wordvectors (Mikolov et al 2013) and is capable of capturing semantic reg-ularities in textual collections This is an unsupervised approach forlearning continuous distributed vector representations for text (ieparagraphs documents) This approach maps documents into a vec-tor space such that semantically related documents are assigned sim-ilar vector representations (eg an article about ldquogenomicsrdquo is closerto an article about ldquogene expressionrdquo than to an article about ldquofuzzysetsrdquo) Formally each paragraph is mapped to the individual vectorrepresented by a column in matrix D and every word is also mappedto the individual vector represented by a column in matrix W Theparagraph vector and word vectors are concatenated to predict thenext word in a context (Le and Mikolov 2014) Other work has al-ready successfully employed this type of document representationsfor topic modeling combining them with both Latent Dirichlet Allo-cation (LDA) and clustering approaches (Curiskis et al 2019 DiengRuiz and Blei 2019 Moody 2016 Xie and Xing 2013)

In our work we run doc2vec on the stratified sample of 160 000papers and represent each document as a length-normalized vector(ie document embedding) These vectors are then clustered intok = 1000 clusters using the scikit-learn6 implementation of k-means(MacQueen 1967) with default parameters The combination of doc-ument representations with a clustering algorithm is conceptuallysimple and interpretable Also the comprehensive comparative eval-uation of topic modeling methods utilizing document embeddingsperformed by Curiskis et al (2019) showed that doc2vec feature repre-sentations with k-means clustering outperform several other methods7

on three evaluation measures (Normalized Mutual Information Ad-justed Mutual Information and Adjusted Rand Index)

5We use the implementation provided by Gensim httpsradimrehurekcom

gensimmodelsdoc2vechtml6httpsscikit-learnorgstable7hierarchical clustering k-medoids NMF and LDA

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 75

We run 10 trials of stratification and clustering resulting in 10 dif-ferent clusterings We do this to protect against the variability ofclustering and because we do not want to rely on a single clusteringfor proposing (down)trends for the benchmark

624 Trend amp Downtrend Estimation

Recalling our definitions of trend and downtrend A trend (resp down-trend) is defined as a research topic that has a strongly increasing(resp decreasing) number of publications in a particular time inter-val

Linear trend estimation is a widely used technique in predictive(time-series) analysis that proves statements about tendencies in thedata by correlating the measurements to the times at which they oc-curred (Hess Iyer and Malm 2001) Considering the definition oftrend (resp downtrend) and the main concept of linear trend esti-mation models we utilize the linear regression to identify trend anddowntrend candidates in resulting clustering sets

Specifically we count the number of documents per year for a clus-ter and then estimate the parameters of the best fitting line (ie slope)Clusters are then ranked according to the resulting slope and then = 150 clusters with the largest (resp smallest) slope are selectedas trend (resp downtrend) candidates Thus our definition of a single-clustering (down)trend candidate is a cluster with extreme ascendingor descending slope Figure 12 and Figure 13 give two examples of(down)trend candidates

625 (Down)trend Candidates Validation

As detailed in Section 623 we run ten series of clustering result-ing in ten discrete clustering sets We then estimated the (down)trendcandidates according to the procedure described in Section 624 Con-sequently for each of the ten clustering sets we obtained 150 trendand 150 downtrend candidates

Nonetheless for the benchmark among all the (down)trend can-didates across the ten clustering sets we want to sort out only thosethat are consistent overall sets To obtain consistent (down)trend candi-dates we apply a Jaccard coefficient metric that is used for measuringthe similarity and diversity of sample sets We define an equivalencerelation simR two candidates (ie clusters) are equivalent if their Jac-card coefficient is gt τsim where τsim = 05 Following this notion wecompute the equivalence of (down)trend candidates across the tensets

Finally we define an equivalence class as an equivalence class trendcandidate (resp equivalence class downtrend candidate) if it contains trend(resp downtrends) candidates in at least half of the clusterings This

[ September 4 2020 at 1017 ndash classicthesis ]

76 benchmark for scientific trend detection

Figure 12 A trend candidate (positive slope) Topic Sentiment and EmotionAnalysis (using Social Media Channels)

Figure 13 A downtrend candidate (negative slope) Topic XML

[ September 4 2020 at 1017 ndash classicthesis ]

63 benchmark annotation 77

procedure gave us in total 110 equivalence class trend candidates and107 equivalence class downtrend candidates that we then annotatefor our benchmark8

63 benchmark annotation

To annotate the equivalence class (down)trend candidates obtainedfrom our model we used the Figure-Eight9 platform It provides anintegrated quality-control mechanism and a training phase before theactual annotation (Kolhatkar Zinsmeister and Hirst 2013) whichminimizes the rate of poorly qualified annotators

We design our annotation task as follows A worker is shown adescription of a particular (down)trend candidate along with the list ofpossible (down)trend tags Descriptions are mixed and presented inrandom order The temporal presentation order of candidates is alsoarbitrary The worker is asked to pick from the list with (down)trendtags an item that matches the cluster (resp candidate) best Evenif there are several items that are a possible match the worker mustchoose only one If there is no matching item the worker can selectthe option other Three different workers evaluated each cluster andthe final label is the majority label If there is no majority label wepresent the cluster repeatedly to the workers until there is a majority

The descriptors of candidates are collected through the followingprocedure

bull First we review the documents assigned to a candidate cluster

bull Then we extract keywords and titles of the papers attributed tothe cluster Semantic Scholar provides keywords and titles as apart of the metadata

bull Finally we compute the top 15 most frequent keywords andrandomly select 15 titles These keywords and titles are thedescriptor of the cluster candidate presented to crowd-workers

We create the (down)trend tags based on keywords titles and meta-data such as publication venue The cluster candidates were manu-ally labeled by graduate employees (ie domain experts in the com-puter science domain) considering the abovementioned items

631 Inter-annotator Agreement

To ensure the quality of obtained annotations we computed the InterAnnotator Agreement (IAA) - a measure of how well two (or more)

8Note that the overall number of 217 refers to the number of the (down)trendcandidates and not to the number of documents Each candidate contains hundredsof documents

9Formerly called CrowdFlower

[ September 4 2020 at 1017 ndash classicthesis ]

78 benchmark for scientific trend detection

annotators can make the same decision for a particular category Todo that we used the following metrics

krippendorffrsquos α is a reliability coefficient developed to measurethe agreement among observers or measuring instruments draw-ing distinctions among typically unstructured phenomena orassign computable values to them (Krippendorff 2011) Wechoose Krippendorffrsquos α as an inter-annotator agreement mea-sure because it ndash unlike other specialized coefficients ndash is a gen-eralization of several reliability criteria and applies to situationslike ours where we have more than two annotators and a givenannotator only labels a subset of the data (ie we have to dealwith missing values)

Krippendorff (2011) has proposed several metric variations eachof which is associated with a specific set of weights We useKrippendorffrsquos α considering any number of observers and miss-ing data In this case Krippendorffrsquos α for the overall numberof 217 annotated documents is α = 0798 According to Landisand Koch (1977) this value corresponds to a substantial level ofagreement

figure-eight agreement rate Figure-Eight10 provides an agree-ment score c for each annotated unit u which is based onthe majority vote of the trusted workers The score has beenproven to perform well compared to the classic metrics (Kol-hatkar Zinsmeister and Hirst 2013) We computed the averageC for the 217 annotated documents C is 0799

Since both Krippendorffrsquos α and internal Figure Eight IAA are ratherhigh we consider the obtained annotations for the benchmark to beof good quality

632 Gold Trends

We compiled a list of computer science trends for the years 2016based on the analysis of the twelve survey publications (Ankerholz2016 Augustin 2016 Brooks 2016 Frot 2016 Harriet 2015 IEEEComputer Society 2016 Markov 2015 Meyerson and Mariette 2016Nessma 2015 Reese 2015 Rivera 2016 Zaino 2016) For this yearwe identified a total of 31 trends see Table 20

Since there is a variation as to the name of a specific trend wecreated a unique name for each of them Out of the 31 trends 28(90 3) are instantiated as trends by our model that we confirmedin the crowdsourcing procedure11 We searched for the three missingtrends using a variety of techniques (ie inspection of non-trendsdowntrends random browsing of documents not assigned to trend

10Former CrowdFlower platform11The author verified this based on her judgment of equivalence of one of the 31

gold trend names in the literature with one of the trend tags

[ September 4 2020 at 1017 ndash classicthesis ]

64 evaluation measure for trend detection 79

candidates and keyword searches) Two of the missing trends donot seem to occur in the collection at all ldquoHuman Augmentationrdquo andldquoFood and Water Technologyrdquo The third missing trend ldquoCryptocurrencyand Cryptographyrdquo does occur in the collection but the occurrencerate is very small (ca 001) We do not consider these three trendsin the rest of the paper

Computer Science Trend Computer Science Trend

Autonomous Agents and Systems Natural Language Processing

Autonomous Vehicles Open Source

Big Data Analytics Privacy

Bioinformatics and (Bio) Neuroscience Quantum Computing

Biometrics and Personal Identification Recommender Systems and Social Networks

Cloud Computing and Software as a Service Reinforcement Learning

Cryptocurrency and Cryptographylowast Renewable Energy

Cyber Security Robotics

E-business Semantic Web

Food and Water Technologylowast Sentiment and Emotion Analysis

Game-based Learning Smart Cities

Games and (Virtual) Augmented Reality Supply Chains and RFIDs

Human Augmentationlowast Technology for Climate Change

MachineDeep Learning Transportation and Energy

Medical Advances and DNA Computing Wearables

Mobile Computing

Table 20 31 areas identified as computer science trends for year 2016 in themedia 28 of these trends are instantiated by our model as welland thus covered in our benchmark Topics marked with asterisk(lowast) were not found in our underlying data collection

In summary based on our analysis of meta-literature we identified28 gold trends that also occur in our collection We will use them asthe gold standard that trend detection algorithms should aim to discover

64 evaluation measure for trend detection

Whether something is a trend is a graded concept For this reasonwe adopt an evaluation measure based on a ranking of candidatesspecifically Mean Average Precision (MAP) The score denotes theaverage of the precision values of ranks of correctly identified andnon-redundant trends

We approach trend detection as computing a ranked list of sets ofdocuments where each set is a trend candidate We refer to docu-ments as trendy downtrendy and flat depending on which class theyare assigned to in our benchmark We consider a trend candidate

[ September 4 2020 at 1017 ndash classicthesis ]

80 benchmark for scientific trend detection

c as a correct recognition of gold trend t if it satisfies the followingrequirements

bull c is trendy

bull t is the largest trend in c

bull |tcap c||c| gt ρ where ρ is a coverage parameter

bull t was not earlier recognized in the list

These criteria give us a true positive (ie recognition of a gold trend)or false positive (otherwise) for every position of the ranked list anda precision score for each trend Precision for a trend that was notobserved is 0 We then compute MAP see Table 21 for an example

Gold Trends Gold Downtrends

Cloud Computing Compilers

Bioinformatics Petri Nets

Sentiment Analysis Fuzzy Sets

Privacy Routing Protocols

Trend Candidate |tcap c||c| tpfp P

c1 Cloud Computing 060 tp 100

c2 Privacy 020 fp 050

c3 Cloud Computing 070 fp 033

c4 Bioinformatics 055 tp 050

c5 Sentiment Analysis 095 tp 060

c6 Sentiment Analysis 059 fp 050

c7 Bioinformatics 041 fp 043

c8 Compilers 060 fp 038

c9 Petri Nets 080 fp 033

c10 Privacy 033 fp 030

Table 21 Example of proposed MAP evaluation (with ρ = 05) MAP is theaverage of the precision values 10 (Cloud Computing) 05 (Bioin-formatics) 06 (Sentiment Analysis) and 00 (Privacy) ie MAP =214 = 0525 True positive tp False positive fp Precision P

65 trend detection baselines

In this section we investigate the impact of four different configura-tion choices on the performance of trend detection by way of ablation

[ September 4 2020 at 1017 ndash classicthesis ]

65 trend detection baselines 81

651 Configurations

abstracts vs full texts The underlying data collection containsboth abstracts and full texts12 Our initial expectation was thatthe full text of a document comprises more information thanthe abstract and it should be a more reliable basis for trend de-tection This is one of the configurations we test in the ablationWe employ our proposed method on full text (solely documenttexts without abstracts) and abstract (only abstract texts) collec-tion separately and observe the results

stratification Due to the uneven distribution of documents overthe years the last years (with increased volume of publications)may get too much impact on results compared to earlier yearsTo investigate this we conduct an ablation experiment wherewe compare (i) randomly sampled 160k documents from theentire collection and (ii) stratified sampling In stratified sam-pling we select 10k documents for each of the years 2000 ndash 2015resulting in a sampled collection of 160k

length Lt of interval Clusters are ranked by trendiness and theresulting ranked list is evaluated To measure the trendiness orgrowth of topics over time we fit a line to an interval (ini|i0 6i lt i0 + Lt by linear regression where Lt is the length of theinterval i is one of the 16 years (2000 2015) and ni is thenumber of documents that were assigned to cluster c in thatyear As a simple default baseline we apply the regression to ahalf of an entire interval ie Lt = 8 There are nine such inter-vals in our 16-year period As the final measure of growth forthe cluster we take the maximal of the nine individual slopesTo determine how much this configuration choice affects our re-sults we also test a linear regression over the entire time spanie Lt = 16 In this case there is a single interval

clustering method We consider Gaussian Mixture Models (GMM)and k-means We use GMM with spherical type of covariancewhere each component has the same variance For both weproceed as described in Section 623

652 Experimental Setup and Results

We conducted experiments on the Semantic Scholar corpus and eval-uated a ranked list of trend candidates against the benchmark con-sidering the configurations mentioned above We adopted Mean Av-erage Precision (MAP) as the principal evaluation measure As a sec-ondary evaluation measure we computed recall at 50 (R50) whichestimates the percentage of gold trends found in the list of 50 highest-ranked trend candidates We found that the configuration (0) in Ta-

12Semantic Scholar provided the underlying dataset at the beginning of our workin 2016

[ September 4 2020 at 1017 ndash classicthesis ]

82 benchmark for scientific trend detection

ble 22 works best We then conducted ablation experiments to dis-cover the importance of the configuration choices

(0) (1) (2) (3) (4)

document part A F A A A

stratification yes yes no yes yes

Lt 8 8 8 16 8

clustering GMMsph GMMsph GMMsph GMMsph kmicro

MAP 36 (03) 07 (19) 30 (03) 32 (04) 34 (21)

R50 avg 50 (012) 25 (018) 50 (016) 46 (018) 53(014)

R50 max 61 37 53 61 61

Table 22 Ablation results Standard deviations are in parentheses GMMclustering performed with the spherical (sph) type of covarianceK-means clustering is denoted as kmicro in the ablation table

comparing (0) and (1) 13 we observed that abstracts (A) are amore reliable representation for our trend detection baselinethan full texts (F) The possible explanation for that observa-tion is that an abstract is a summary of a scientific paper thatcovers only the central points and is semantically very conciseand rich In opposite the full text contains numerous parts thatare secondary (ie future work related work) and thus mayskew the documentrsquos overall meaning

comparing (0) and (2) we recognized that stratification (yes) im-proves results compared to the setting with non-stratified (no)randomly sampled data We would expect the effect of stratifi-cation to be even more substantial for collections in which thedistribution of documents over time is more skewed than in our

comparing (0) and (3) the length of the interval is a vital config-uration choice As we assumed the 16 year interval is ratherlong and 8 year interval could be a better choice primarily ifone aims to find short-term trends

comparing (0) and (4) we recognized that topics we obtain fromk-means have a similar nature to those from GMM HoweverGMM that take variance and covariance into account and esti-mate a soft assignment still perform slightly better

66 benchmark description

Below we report the contents of the TRENDNERT benchmark14

1 Two records containing information on documents One filewith the documents assigned to (down)trends and the otherfile with documents assigned to flat topics

13Numbers (0) (1) (2) (3) (4) are the configuration choices in the ablation table14httpsdoiorg1017605OSFIODHZCT

[ September 4 2020 at 1017 ndash classicthesis ]

67 summary 83

2 A script to produce document hash codes

3 A file to map every hash code to internal IDs in the benchmark

4 A script to compute MAP (default setting is ρ = 25)

5 READMEmd a file with guidance notes

Every document in the benchmark contains the following informa-tion

bull Paper ID internal ID of documents assigned to each documentin the original Semantic Scholar collection

bull Cluster ID unique ID of meta-cluster where the document wasobserved

bull LabelTag Name of the gold (down)trend assigned by crowd-workers (eg Cyber Security Fuzzy Sets and Systems)

bull Type of (down)trend candidate trend (T ) downtrend (D) or flattopic (F)

bull Hash ID15 of each paper from the original Semantic Scholarcollection

67 summary

Emerging trend detection is a promising task for Intelligent ProcessAutomation (IPA) that allows companies and funding agencies to au-tomatically analyze extensive textual collections in order to detectemerging fields and forecast the future employing machine learningalgorithms Thus the availability of publicly available benchmarksfor trend detection is of significant importance as only thereby canone evaluate the performance and robustness of the techniques em-ployed

Within this work we release TRENDNERT ndash the first publicly availablebenchmark for (down)trend detection that offers a realistic setting fordeveloping trend detection algorithms TRENDNERT also supportsdowntrend detection ndash a significant problem that was not addressedbefore We also present several experimental findings on trend detec-tion First stratification improves trend detection if the distribution ofdocuments is skewed over time Third abstract-based trend detectionperforms better than full text if straightforward models are used

68 future work

There are possible directions for future work

bull The presented approach to estimate the trendiness of the topicby means of linear regression is straightforward and can be

15MD5-hash of First Author + Title + Year

[ September 4 2020 at 1017 ndash classicthesis ]

84 benchmark for scientific trend detection

replaced by more sophisticated versions like Kendallrsquos τ cor-relation coefficient or the Least Squares Regression Smoother(LOESS) (Cleveland 1979)

bull It also would be challenging to replace the k-means clustering al-gorithm that we adopted within this work with the LDA modelwhile retaining the document representations as it was donein Dieng Ruiz and Blei 2019 The case to investigate wouldbe then whether the topic modeling outperforms the simpleclustering algorithm in this particular setting And would theresulting system benefit from the topic modeling algorithm

bull Furthermore it would be of interest to conduct more researchon the following What kind of document representation canmake effective use of the information in full texts Recentlyproposed contextualised word embeddings could be a possibledirection for these experiments

[ September 4 2020 at 1017 ndash classicthesis ]

7D I S C U S S I O N A N D F U T U R E W O R K

As we observed in the two parts of this thesis Intelligent Process Au-tomation (IPA) is a challenging research area due to its multifaceted-ness and appliance in a real-world scenario Its main objective is toautomatize time-consuming and routine tasks and free up the knowl-edge worker for more cognitively demanding tasks However thistask addresses many difficulties such as a lack of resources for bothtraining and evaluation of algorithms as well as an insufficient per-formance of machine learning techniques when applied to real-worlddata and practical problems In this thesis we investigated two IPA

applications within the Natural Language Processing (NLP) domainndash conversational agents and predictive analytics ndash as well as addressedsome of the most common issues

bull In the first part of the thesis we have addressed the challenge ofimplementing a robust conversational system for IPA purposeswithin the practical e-learning domain considering the condi-tions of missing training data The primary purpose of the re-sulting system is to perform the onboarding process betweenthe tutors and students This onboarding reduces repetitiveand time-consuming activities while allowing tutors to focuson solely mathematical questions which is the core task withthe most impact on students Besides that by interacting withusers the system augments the resources with structured andlabeled training data that we used to implement learnable dia-logue components

The realization of such a system was associated with severalchallenges Among others were missing structured data andambiguous user-generated content that requires a precise lan-guage understanding Also the system that simultaneouslycommunicates with a large number of users has to maintainadditional meta policies such as fallback and check-up policiesto hold the conversation intelligently

bull Moreover we have shown the rule-based dialogue systemrsquos po-tential extension using transfer learning We utilized a smalldataset of structured dialogues obtained in a trial-run of therule-based system and applied the current state-of-the-art Bidi-rectional Encoder Representations from Transformers (BERT) ar-chitecture to solve the domain-specific tasks Specifically weretrained two of three components in the conversational systemndash Named Entity Recognition (NER) and Dialogue Manager (DM)(ie NAP task) ndash and confirmed the applicability of such algo-

85

[ September 4 2020 at 1017 ndash classicthesis ]

86 discussion and future work

rithms in a real-world low-resource scenario by showing thereasonably high performance of resulting models

bull Last but not least in the second part of the thesis we addressedthe issue of missing publicly available benchmarks for the eval-uation of machine learning algorithms for the trend detectiontask To the best of our knowledge we released the first openbenchmark for this purpose which offers a realistic setting fordeveloping trend detection algorithms Besides that this bench-mark supports downtrend detection ndash a significant problemthat was not sufficiently addressed before We additionally pro-posed the evaluation measure and presented several experimen-tal findings on trend detection

The ultimate objective of Intelligent Process Automation (IPA) shouldbe a creation of robust algorithms in low-resource and domain-specificconditions This is still a challenging task however some of the theexperiments presented in this thesis revealed that this goal could bepotentially achieved by means of Transfer Learning (TL) techniques

[ September 4 2020 at 1017 ndash classicthesis ]

B I B L I O G R A P H Y

Agrawal Amritanshu Wei Fu and Tim Menzies (2018) bdquoWhat iswrong with topic modeling And how to fix it using search-basedsoftware engineeringldquo In Information and Software Technology 98pp 74ndash88 (cit on p 66)

Alan M (1950) bdquoTuringldquo In Computing machinery and intelligenceMind 59236 pp 433ndash460 (cit on p 7)

Alghamdi Rubayyi and Khalid Alfalqi (2015) bdquoA survey of topicmodeling in text miningldquo In Int J Adv Comput Sci Appl(IJACSA)61 (cit on pp 64ndash66)

Ammar Waleed Dirk Groeneveld Chandra Bhagavatula Iz BeltagyMiles Crawford Doug Downey Jason Dunkelberger Ahmed El-gohary Sergey Feldman Vu Ha et al (2018) bdquoConstruction ofthe literature graph in semantic scholarldquo In arXiv preprint arXiv1805 02262 (cit on p 72)

Ankerholz Amber (2016) 2016 Future of Open Source Survey Says OpenSource Is The Modern Architecture httpswwwlinuxcomnews2016-future-open-source-survey-says-open-source-modern-

architecture (cit on p 78)Asooja Kartik Georgeta Bordea Gabriela Vulcu and Paul Buitelaar

(2016) bdquoForecasting emerging trends from scientific literatureldquoIn Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) pp 417ndash420 (cit on p 68)

Augustin Jason (2016) Emergin Science and Technology Trends A Syn-thesis of Leading Forecast httpwwwdefenseinnovationmarket-placemilresources2016_SciTechReport_16June2016pdf (citon p 78)

Blei David M and John D Lafferty (2006) bdquoDynamic topic modelsldquoIn Proceedings of the 23rd international conference on Machine learn-ing ACM pp 113ndash120 (cit on pp 66 67)

Blei David M John D Lafferty et al (2007) bdquoA correlated topic modelof scienceldquo In The Annals of Applied Statistics 11 pp 17ndash35 (citon p 66)

Blei David M Andrew Y Ng and Michael I Jordan (2003) bdquoLatentdirichlet allocationldquo In Journal of machine Learning research 3Janpp 993ndash1022 (cit on p 66)

Bloem Peter (2019) Transformers from Scratch httpwwwpeterbloemnlblogtransformers (cit on p 45)

Bocklisch Tom Joey Faulkner Nick Pawlowski and Alan Nichol(2017) bdquoRasa Open source language understanding and dialoguemanagementldquo In arXiv preprint arXiv171205181 (cit on p 15)

Bolelli Levent Seyda Ertekin and C Giles (2009) bdquoTopic and trenddetection in text collections using latent dirichlet allocationldquo InAdvances in Information Retrieval pp 776ndash780 (cit on p 66)

87

[ September 4 2020 at 1017 ndash classicthesis ]

88 Bibliography

Bolelli Levent Seyda Ertekin Ding Zhou and C Lee Giles (2009)bdquoFinding topic trends in digital librariesldquo In Proceedings of the 9thACMIEEE-CS joint conference on Digital libraries ACM pp 69ndash72

(cit on p 66)Bordes Antoine Y-Lan Boureau and Jason Weston (2016) bdquoLearning

end-to-end goal-oriented dialogldquo In arXiv preprint arXiv160507683(cit on pp 16 17)

Brooks Chuck (2016) 7 Top Tech Trends Impacting Innovators in 2016httpinnovationexcellencecomblog201512267-top-

tech-trends-impacting-innovators-in-2016 (cit on p 78)Burtsev Mikhail Alexander Seliverstov Rafael Airapetyan Mikhail

Arkhipov Dilyara Baymurzina Eugene Botvinovsky NickolayBushkov Olga Gureenkova Andrey Kamenev Vasily Konovalovet al (2018) bdquoDeepPavlov An Open Source Library for Conver-sational AIldquo In (cit on p 15)

Chang Jonathan David M Blei et al (2010) bdquoHierarchical relationalmodels for document networksldquo In The Annals of Applied Statis-tics 41 pp 124ndash150 (cit on p 66)

Chen Hongshen Xiaorui Liu Dawei Yin and Jiliang Tang (2017)bdquoA survey on dialogue systems Recent advances and new fron-tiersldquo In Acm Sigkdd Explorations Newsletter 192 pp 25ndash35 (citon pp 14ndash16)

Chen Lingzhen Alessandro Moschitti Giuseppe Castellucci AndreaFavalli and Raniero Romagnoli (2018) bdquoTransfer Learning for In-dustrial Applications of Named Entity Recognitionldquo In NL4AIAI IA pp 129ndash140 (cit on p 43)

Cleveland William S (1979) bdquoRobust locally weighted regression andsmoothing scatterplotsldquo In Journal of the American statistical asso-ciation 74368 pp 829ndash836 (cit on p 84)

Collobert Ronan Jason Weston Leacuteon Bottou Michael Karlen KorayKavukcuoglu and Pavel Kuksa (2011) bdquoNatural language pro-cessing (almost) from scratchldquo In Journal of machine learning re-search 12Aug pp 2493ndash2537 (cit on p 16)

Cuayaacutehuitl Heriberto Simon Keizer and Oliver Lemon (2015) bdquoStrate-gic dialogue management via deep reinforcement learningldquo InarXiv preprint arXiv151108099 (cit on p 14)

Cui Lei Shaohan Huang Furu Wei Chuanqi Tan Chaoqun Duanand Ming Zhou (2017) bdquoSuperagent A customer service chat-bot for e-commerce websitesldquo In Proceedings of ACL 2017 SystemDemonstrations pp 97ndash102 (cit on p 8)

Curiskis Stephan A Barry Drake Thomas R Osborn and Paul JKennedy (2019) bdquoAn evaluation of document clustering and topicmodelling in two online social networks Twitter and Redditldquo InInformation Processing amp Management (cit on p 74)

Dai Zihang Zhilin Yang Yiming Yang William W Cohen Jaime Car-bonell Quoc V Le and Ruslan Salakhutdinov (2019) bdquoTransformer-xl Attentive language models beyond a fixed-length contextldquo InarXiv preprint arXiv190102860 (cit on pp 43 44)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 89

Daniel Ben (2015) bdquoBig Data and analytics in higher education Op-portunities and challengesldquo In British journal of educational tech-nology 465 pp 904ndash920 (cit on p 22)

Deerwester Scott Susan T Dumais George W Furnas Thomas K Lan-dauer and Richard Harshman (1990) bdquoIndexing by latent seman-tic analysisldquo In Journal of the American society for information sci-ence 416 pp 391ndash407 (cit on p 65)

Deoras Anoop Kaisheng Yao Xiaodong He Li Deng Geoffrey Ger-son Zweig Ruhi Sarikaya Dong Yu Mei-Yuh Hwang and Gre-goire Mesnil (2015) Assignment of semantic labels to a sequence ofwords using neural network architectures US Patent App 14016186

(cit on p 13)Devlin Jacob Ming-Wei Chang Kenton Lee and Kristina Toutanova

(2018) bdquoBert Pre-training of deep bidirectional transformers forlanguage understandingldquo In arXiv preprint arXiv181004805 (citon pp 42ndash45)

Dhingra Bhuwan Lihong Li Xiujun Li Jianfeng Gao Yun-NungChen Faisal Ahmed and Li Deng (2016) bdquoTowards end-to-end re-inforcement learning of dialogue agents for information accessldquoIn arXiv preprint arXiv160900777 (cit on p 16)

Dieng Adji B Francisco JR Ruiz and David M Blei (2019) bdquoTopicModeling in Embedding Spacesldquo In arXiv preprint arXiv190704907(cit on pp 74 84)

Donahue Jeff Yangqing Jia Oriol Vinyals Judy Hoffman Ning ZhangEric Tzeng and Trevor Darrell (2014) bdquoDecaf A deep convolu-tional activation feature for generic visual recognitionldquo In Inter-national conference on machine learning pp 647ndash655 (cit on p 43)

Eric Mihail and Christopher D Manning (2017) bdquoKey-value retrievalnetworks for task-oriented dialogueldquo In arXiv preprint arXiv 170505414 (cit on p 16)

Fang Hao Hao Cheng Maarten Sap Elizabeth Clark Ari HoltzmanYejin Choi Noah A Smith and Mari Ostendorf (2018) bdquoSoundingboard A user-centric and content-driven social chatbotldquo In arXivpreprint arXiv180410202 (cit on p 10)

Friedrich Fabian Jan Mendling and Frank Puhlmann (2011) bdquoPro-cess model generation from natural language textldquo In Interna-tional Conference on Advanced Information Systems Engineering pp 482

ndash496 (cit on p 2)Frot Mathilde (2016) 5 Trends in Computer Science Research https

wwwtopuniversitiescomcoursescomputer-science-info-

rmation-systems5-trends-computer-science-research (citon p 78)

Glasmachers Tobias (2017) bdquoLimits of end-to-end learningldquo In arXivpreprint arXiv170408305 (cit on pp 16 17)

Goddeau David Helen Meng Joseph Polifroni Stephanie Seneffand Senis Busayapongchai (1996) bdquoA form-based dialogue man-ager for spoken language applicationsldquo In Proceeding of FourthInternational Conference on Spoken Language Processing ICSLPrsquo96Vol 2 IEEE pp 701ndash704 (cit on p 14)

[ September 4 2020 at 1017 ndash classicthesis ]

90 Bibliography

Gohr Andre Alexander Hinneburg Rene Schult and Myra Spiliopou-lou (2009) bdquoTopic evolution in a stream of documentsldquo In Pro-ceedings of the 2009 SIAM International Conference on Data MiningSIAM pp 859ndash870 (cit on p 65)

Gollapalli Sujatha Das and Xiaoli Li (2015) bdquoEMNLP versus ACLAnalyzing NLP research over timeldquo In EMNLP pp 2002ndash2006

(cit on p 72)Griffiths Thomas L and Mark Steyvers (2004) bdquoFinding scientific top-

icsldquo In Proceedings of the National academy of Sciences 101suppl 1pp 5228ndash5235 (cit on p 65)

Griffiths Thomas L Michael I Jordan Joshua B Tenenbaum andDavid M Blei (2004) bdquoHierarchical topic models and the nestedChinese restaurant processldquo In Advances in neural information pro-cessing systems pp 17ndash24 (cit on p 66)

Hall David Daniel Jurafsky and Christopher D Manning (2008) bdquoStu-dying the history of ideas using topic modelsldquo In Proceedingsof the conference on empirical methods in natural language processingAssociation for Computational Linguistics pp 363ndash371 (cit onp 66)

Harriet Taylor (2015) Privacy will hit tipping point in 2016 httpwwwcnbccom20151109privacy-will-hit-tipping-point-

in-2016html (cit on p 78)He Jiangen and Chaomei Chen (2018) bdquoPredictive Effects of Novelty

Measured by Temporal Embeddings on the Growth of ScientificLiteratureldquo In Frontiers in Research Metrics and Analytics 3 p 9

(cit on pp 64 68 71)He Junxian Zhiting Hu Taylor Berg-Kirkpatrick Ying Huang and

Eric P Xing (2017) bdquoEfficient correlated topic modeling with topicembeddingldquo In Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ACM pp 225ndash233 (cit on pp 66 67)

He Qi Bi Chen Jian Pei Baojun Qiu Prasenjit Mitra and Lee Giles(2009) bdquoDetecting topic evolution in scientific literature How cancitations helpldquo In Proceedings of the 18th ACM conference on In-formation and knowledge management ACM pp 957ndash966 (cit onp 68)

Henderson Matthew Blaise Thomson and Jason D Williams (2014)bdquoThe second dialog state tracking challengeldquo In Proceedings of the15th Annual Meeting of the Special Interest Group on Discourse andDialogue (SIGDIAL) pp 263ndash272 (cit on p 15)

Henderson Matthew Blaise Thomson and Steve Young (2013) bdquoDeepneural network approach for the dialog state tracking challengeldquoIn Proceedings of the SIGDIAL 2013 Conference pp 467ndash471 (cit onp 14)

Hess Ann Hari Iyer and William Malm (2001) bdquoLinear trend analy-sis a comparison of methodsldquo In Atmospheric Environment 3530Visibility Aerosol and Atmospheric Optics pp 5211 ndash5222 issn1352-2310 doi httpsdoiorg101016S1352- 2310(01)00342-9 url httpwwwsciencedirectcomsciencearticlepiiS1352231001003429 (cit on p 75)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 91

Hofmann Thomas (1999) bdquoProbabilistic latent semantic analysisldquo InProceedings of the Fifteenth conference on Uncertainty in artificial in-telligence Morgan Kaufmann Publishers Inc pp 289ndash296 (cit onp 65)

Honnibal Matthew and Ines Montani (2017) bdquospacy 2 Natural lan-guage understanding with bloom embeddings convolutional neu-ral networks and incremental parsingldquo In To appear 7 (cit onp 15)

Howard Jeremy and Sebastian Ruder (2018) bdquoUniversal languagemodel fine-tuning for text classificationldquo In arXiv preprint arXiv180106146 (cit on p 44)

IEEE Computer Society (2016) Top 9 Computing Technology Trends for2016 httpswwwscientificcomputingcomnews201601top-9-computing-technology-trends-2016 (cit on p 78)

Ivancic Lucija Dalia Suša Vugec and Vesna Bosilj Vukšic (2019) bdquoRobo-tic Process Automation Systematic Literature Reviewldquo In In-ternational Conference on Business Process Management Springerpp 280ndash295 (cit on pp 1 2)

Jain Mohit Pratyush Kumar Ramachandra Kota and Shwetak NPatel (2018) bdquoEvaluating and informing the design of chatbotsldquoIn Proceedings of the 2018 Designing Interactive Systems ConferenceACM pp 895ndash906 (cit on p 8)

Khanpour Hamed Nishitha Guntakandla and Rodney Nielsen (2016)bdquoDialogue act classification in domain-independent conversationsusing a deep recurrent neural networkldquo In Proceedings of COL-ING 2016 the 26th International Conference on Computational Lin-guistics Technical Papers pp 2012ndash2021 (cit on p 13)

Kim Young-Bum Karl Stratos Ruhi Sarikaya and Minwoo Jeong(2015) bdquoNew transfer learning techniques for disparate label setsldquoIn Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference onNatural Language Processing (Volume 1 Long Papers) pp 473ndash482

(cit on p 43)Kolhatkar Varada Heike Zinsmeister and Graeme Hirst (2013) bdquoAn-

notating anaphoric shell nouns with their antecedentsldquo In Pro-ceedings of the 7th Linguistic Annotation Workshop and Interoperabil-ity with Discourse pp 112ndash121 (cit on pp 77 78)

Kontostathis April Leon M Galitsky William M Pottenger SomaRoy and Daniel J Phelps (2004) bdquoA survey of emerging trend de-tection in textual data miningldquo In Survey of text mining Springerpp 185ndash224 (cit on p 63)

Krippendorff Klaus (2011) bdquoComputing Krippendorffrsquos alpha- relia-bilityldquo In Annenberg School for Communication (ASC) DepartmentalPapers 43 (cit on p 78)

Kurata Gakuto Bing Xiang Bowen Zhou and Mo Yu (2016) bdquoLever-aging sentence-level information with encoder lstm for semanticslot fillingldquo In arXiv preprint arXiv160101530 (cit on p 13)

Landis J Richard and Gary G Koch (1977) bdquoThe measurement ofobserver agreement for categorical dataldquo In biometrics pp 159ndash174 (cit on p 78)

[ September 4 2020 at 1017 ndash classicthesis ]

92 Bibliography

Le Minh-Hoang Tu-Bao Ho and Yoshiteru Nakamori (2005) bdquoDe-tecting emerging trends from scientific corporaldquo In InternationalJournal of Knowledge and Systems Sciences 22 pp 53ndash59 (cit onp 68)

Le Quoc V and Tomas Mikolov (2014) bdquoDistributed Representationsof Sentences and Documentsldquo In ICML Vol 14 pp 1188ndash1196

(cit on p 74)Lee Ji Young and Franck Dernoncourt (2016) bdquoSequential short-text

classification with recurrent and convolutional neural networksldquoIn arXiv preprint arXiv160303827 (cit on p 13)

Leopold Henrik Han van der Aa and Hajo A Reijers (2018) bdquoIden-tifying candidate tasks for robotic process automation in textualprocess descriptionsldquo In Enterprise Business-Process and Informa-tion Systems Modeling Springer pp 67ndash81 (cit on p 2)

Levenshtein Vladimir I (1966) bdquoBinary codes capable of correctingdeletions insertions and reversalsldquo In Soviet physics doklady 8pp 707ndash710 (cit on p 25)

Liao Vera Masud Hussain Praveen Chandar Matthew Davis Yasa-man Khazaeni Marco Patricio Crasso Dakuo Wang Michael Mu-ller N Sadat Shami Werner Geyer et al (2018) bdquoAll Work andNo Playldquo In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems ACM p 3 (cit on p 8)

Lowe Ryan Thomas Nissan Pow Iulian Vlad Serban Laurent Char-lin Chia-Wei Liu and Joelle Pineau (2017) bdquoTraining end-to-enddialogue systems with the ubuntu dialogue corpusldquo In Dialogueamp Discourse 81 pp 31ndash65 (cit on p 22)

Luger Ewa and Abigail Sellen (2016) bdquoLike having a really bad PAthe gulf between user expectation and experience of conversa-tional agentsldquo In Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems ACM pp 5286ndash5297 (cit on p 8)

MacQueen James (1967) bdquoSome methods for classification and analy-sis of multivariate observationsldquo In Proceedings of the fifth Berkeleysymposium on mathematical statistics and probability Vol 1 OaklandCA USA pp 281ndash297 (cit on p 74)

Manning Christopher D Prabhakar Raghavan and Hinrich Schuumltze(2008) Introduction to Information Retrieval Web publication at in-formationretrievalorg Cambridge University Press (cit on p 74)

Markov Igor (2015) 13 Of 2015rsquos Hottest Topics In Computer ScienceResearch httpswwwforbescomsitesquora2015042213-of-2015s-hottest-topics-in-computer-science-research

fb0d46b1e88d (cit on p 78)Martin James H and Daniel Jurafsky (2009) Speech and language pro-

cessing An introduction to natural language processing computationallinguistics and speech recognition PearsonPrentice Hall Upper Sad-dle River

Mei Qiaozhu Deng Cai Duo Zhang and ChengXiang Zhai (2008)bdquoTopic modeling with network regularizationldquo In Proceedings ofthe 17th international conference on World Wide Web ACM pp 101ndash110 (cit on p 65)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 93

Meyerson Bernard and DiChristina Mariette (2016) These are the top10 emerging technologies of 2016 httpwww3weforumorgdocsGAC16_Top10_Emerging_Technologies_2016_reportpdf (cit onp 78)

Mikolov Tomas Kai Chen Greg Corrado and Jeffrey Dean (2013)bdquoEfficient estimation of word representations in vector spaceldquo InarXiv preprint arXiv13013781 (cit on p 74)

Miller Alexander Adam Fisch Jesse Dodge Amir-Hossein KarimiAntoine Bordes and Jason Weston (2016) bdquoKey-value memorynetworks for directly reading documentsldquo In (cit on p 16)

Mohanty Soumendra and Sachin Vyas (2018) bdquoHow to Compete inthe Age of Artificial Intelligenceldquo In (cit on pp III V 1 2)

Moiseeva Alena and Hinrich Schuumltze (2020) bdquoTRENDNERT A Bench-mark for Trend and Downtrend Detection in a Scientific DomainldquoIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial In-telligence (cit on p 71)

Moiseeva Alena Dietrich Trautmann Michael Heimann and Hin-rich Schuumltze (2020) bdquoMultipurpose Intelligent Process Automa-tion via Conversational Assistantldquo In arXiv preprint arXiv200102284 (cit on pp 21 41)

Monaghan Fergal Georgeta Bordea Krystian Samp and Paul Buite-laar (2010) bdquoExploring your research Sprinkling some saffron onsemantic web dog foodldquo In Semantic Web Challenge at the Inter-national Semantic Web Conference Vol 117 Citeseer pp 420ndash435

(cit on p 68)Monostori Laacuteszloacute Andraacutes Maacuterkus Hendrik Van Brussel and E West-

kaumlmpfer (1996) bdquoMachine learning approaches to manufactur-ingldquo In CIRP annals 452 pp 675ndash712 (cit on p 15)

Moody Christopher E (2016) bdquoMixing dirichlet topic models andword embeddings to make lda2vecldquo In arXiv preprint arXiv160502019 (cit on p 74)

Mou Lili Zhao Meng Rui Yan Ge Li Yan Xu Lu Zhang and ZhiJin (2016) bdquoHow transferable are neural networks in nlp applica-tionsldquo In arXiv preprint arXiv160306111 (cit on p 43)

Mrkšic Nikola Diarmuid O Seacuteaghdha Tsung-Hsien Wen Blaise Thom-son and Steve Young (2016) bdquoNeural belief tracker Data-drivendialogue state trackingldquo In arXiv preprint arXiv160603777 (citon p 14)

Nessma Joussef (2015) Top 10 Hottest Research Topics in Computer Sci-ence http www pouted com top - 10 - hottest - research -

topics-in-computer-science (cit on p 78)Nie Binling and Shouqian Sun (2017) bdquoUsing text mining techniques

to identify research trends A case study of design researchldquo InApplied Sciences 74 p 401 (cit on p 68)

Pan Sinno Jialin and Qiang Yang (2009) bdquoA survey on transfer learn-ingldquo In IEEE Transactions on knowledge and data engineering 2210pp 1345ndash1359 (cit on p 41)

Qu Lizhen Gabriela Ferraro Liyuan Zhou Weiwei Hou and Tim-othy Baldwin (2016) bdquoNamed entity recognition for novel types

[ September 4 2020 at 1017 ndash classicthesis ]

94 Bibliography

by transfer learningldquo In arXiv preprint arXiv161009914 (cit onp 43)

Radford Alec Jeffrey Wu Rewon Child David Luan Dario Amodeiand Ilya Sutskever (2019) bdquoLanguage models are unsupervisedmultitask learnersldquo In OpenAI Blog 18 (cit on p 44)

Reese Hope (2015) 7 trends for artificial intelligence in 2016 rsquoLike 2015on steroidsrsquo httpwwwtechrepubliccomarticle7-trends-for-artificial-intelligence-in-2016-like-2015-on-steroids

(cit on p 78)Rivera Maricel (2016) Is Digital Game-Based Learning The Future Of

Learning httpselearningindustrycomdigital-game-based-

learning-future (cit on p 78)Rotolo Daniele Diana Hicks and Ben R Martin (2015) bdquoWhat is an

emerging technologyldquo In Research Policy 4410 pp 1827ndash1843

(cit on p 64)Salatino Angelo (2015) bdquoEarly detection and forecasting of research

trendsldquo In (cit on pp 63 66)Serban Iulian V Alessandro Sordoni Yoshua Bengio Aaron Courville

and Joelle Pineau (2016) bdquoBuilding end-to-end dialogue systemsusing generative hierarchical neural network modelsldquo In Thirti-eth AAAI Conference on Artificial Intelligence (cit on p 11)

Sharif Razavian Ali Hossein Azizpour Josephine Sullivan and Ste-fan Carlsson (2014) bdquoCNN features off-the-shelf an astoundingbaseline for recognitionldquo In Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops pp 806ndash813 (citon p 43)

Shibata Naoki Yuya Kajikawa and Yoshiyuki Takeda (2009) bdquoCom-parative study on methods of detecting research fronts using dif-ferent types of citationldquo In Journal of the American Society for in-formation Science and Technology 603 pp 571ndash580 (cit on p 68)

Shibata Naoki Yuya Kajikawa Yoshiyuki Takeda and KatsumoriMatsushima (2008) bdquoDetecting emerging research fronts basedon topological measures in citation networks of scientific publica-tionsldquo In Technovation 2811 pp 758ndash775 (cit on p 68)

Small Henry (2006) bdquoTracking and predicting growth areas in sci-enceldquo In Scientometrics 683 pp 595ndash610 (cit on p 68)

Small Henry Kevin W Boyack and Richard Klavans (2014) bdquoIden-tifying emerging topics in science and technologyldquo In ResearchPolicy 438 pp 1450ndash1467 (cit on p 64)

Su Pei-Hao Nikola Mrkšic Intildeigo Casanueva and Ivan Vulic (2018)bdquoDeep Learning for Conversational AIldquo In Proceedings of the 2018Conference of the North American Chapter of the Association for Com-putational Linguistics Tutorial Abstracts pp 27ndash32 (cit on p 12)

Sutskever Ilya Oriol Vinyals and Quoc V Le (2014) bdquoSequence tosequence learning with neural networksldquo In Advances in neuralinformation processing systems pp 3104ndash3112 (cit on pp 16 17)

Trautmann Dietrich Johannes Daxenberger Christian Stab HinrichSchuumltze and Iryna Gurevych (2019) bdquoRobust Argument Unit Recog-nition and Classificationldquo In arXiv preprint arXiv190409688 (citon p 41)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 95

Tu Yi-Ning and Jia-Lang Seng (2012) bdquoIndices of novelty for emerg-ing topic detectionldquo In Information processing amp management 482pp 303ndash325 (cit on p 64)

Turing Alan (1949) London Times letter to the editor (cit on p 7)Ultes Stefan Lina Barahona Pei-Hao Su David Vandyke Dongho

Kim Inigo Casanueva Paweł Budzianowski Nikola Mrkšic Tsu-ng-Hsien Wen et al (2017) bdquoPydial A multi-domain statistical di-alogue system toolkitldquo In Proceedings of ACL 2017 System Demon-strations pp 73ndash78 (cit on p 15)

Vaswani Ashish Noam Shazeer Niki Parmar Jakob Uszkoreit LlionJones Aidan N Gomez Łukasz Kaiser and Illia Polosukhin (2017)bdquoAttention is all you needldquo In Advances in neural information pro-cessing systems pp 5998ndash6008 (cit on p 45)

Vincent Pascal Hugo Larochelle Yoshua Bengio and Pierre-AntoineManzagol (2008) bdquoExtracting and composing robust features withdenoising autoencodersldquo In Proceedings of the 25th internationalconference on Machine learning ACM pp 1096ndash1103 (cit on p 44)

Vodolaacuten Miroslav Rudolf Kadlec and Jan Kleindienst (2015) bdquoHy-brid dialog state trackerldquo In arXiv preprint arXiv151003710 (citon p 14)

Wang Alex Amanpreet Singh Julian Michael Felix Hill Omer Levyand Samuel R Bowman (2018) bdquoGlue A multi-task benchmarkand analysis platform for natural language understandingldquo InarXiv preprint arXiv180407461 (cit on p 44)

Wang Chong and David M Blei (2011) bdquoCollaborative topic modelingfor recommending scientific articlesldquo In Proceedings of the 17thACM SIGKDD international conference on Knowledge discovery anddata mining ACM pp 448ndash456 (cit on p 66)

Wang Xuerui and Andrew McCallum (2006) bdquoTopics over time anon-Markov continuous-time model of topical trendsldquo In Pro-ceedings of the 12th ACM SIGKDD international conference on Knowl-edge discovery and data mining ACM pp 424ndash433 (cit on pp 65ndash67)

Wang Zhuoran and Oliver Lemon (2013) bdquoA simple and generic be-lief tracking mechanism for the dialog state tracking challengeOn the believability of observed informationldquo In Proceedings ofthe SIGDIAL 2013 Conference pp 423ndash432 (cit on p 14)

Weizenbaum Joseph et al (1966) bdquoELIZAmdasha computer program forthe study of natural language communication between man andmachineldquo In Communications of the ACM 91 pp 36ndash45 (cit onp 7)

Wen Tsung-Hsien Milica Gasic Nikola Mrksic Pei-Hao Su DavidVandyke and Steve Young (2015) bdquoSemantically conditioned lstm-based natural language generation for spoken dialogue systemsldquoIn arXiv preprint arXiv150801745 (cit on p 14)

Wen Tsung-Hsien David Vandyke Nikola Mrksic Milica Gasic LinaM Rojas-Barahona Pei-Hao Su Stefan Ultes and Steve Young(2016) bdquoA network-based end-to-end trainable task-oriented dia-logue systemldquo In arXiv preprint arXiv160404562 (cit on pp 1116 17 22)

[ September 4 2020 at 1017 ndash classicthesis ]

96 Bibliography

Werner Alexander Dietrich Trautmann Dongheui Lee and RobertoLampariello (2015) bdquoGeneralization of optimal motion trajecto-ries for bipedal walkingldquo In pp 1571ndash1577 (cit on p 41)

Williams Jason D Kavosh Asadi and Geoffrey Zweig (2017) bdquoHybridcode networks practical and efficient end-to-end dialog controlwith supervised and reinforcement learningldquo In arXiv preprintarXiv170203274 (cit on p 17)

Williams Jason Antoine Raux and Matthew Henderson (2016) bdquoThedialog state tracking challenge series A reviewldquo In Dialogue ampDiscourse 73 pp 4ndash33 (cit on p 12)

Wu Chien-Sheng Richard Socher and Caiming Xiong (2019) bdquoGlobal-to-local Memory Pointer Networks for Task-Oriented DialogueldquoIn arXiv preprint arXiv190104713 (cit on pp 15ndash17)

Wu Yonghui Mike Schuster Zhifeng Chen Quoc V Le Moham-mad Norouzi Wolfgang Macherey Maxim Krikun Yuan CaoQin Gao Klaus Macherey et al (2016) bdquoGooglersquos neural ma-chine translation system Bridging the gap between human andmachine translationldquo In arXiv preprint arXiv160908144 (cit onp 45)

Xie Pengtao and Eric P Xing (2013) bdquoIntegrating Document Cluster-ing and Topic Modelingldquo In Proceedings of the 30th Conference onUncertainty in Artificial Intelligence (cit on p 74)

Yan Zhao Nan Duan Peng Chen Ming Zhou Jianshe Zhou andZhoujun Li (2017) bdquoBuilding task-oriented dialogue systems foronline shoppingldquo In Thirty-First AAAI Conference on Artificial In-telligence (cit on p 14)

Yogatama Dani Michael Heilman Brendan OrsquoConnor Chris DyerBryan R Routledge and Noah A Smith (2011) bdquoPredicting a scien-tific communityrsquos response to an articleldquo In Proceedings of the Con-ference on Empirical Methods in Natural Language Processing Asso-ciation for Computational Linguistics pp 594ndash604 (cit on p 65)

Young Steve Milica Gašic Blaise Thomson and Jason D Williams(2013) bdquoPomdp-based statistical spoken dialog systems A reviewldquoIn Proceedings of the IEEE 1015 pp 1160ndash1179 (cit on p 17)

Zaino Jennifer (2016) 2016 Trends for Semantic Web and Semantic Tech-nologies http www dataversity net 2017 - predictions -

semantic-web-semantic-technologies (cit on p 78)Zhou Hao Minlie Huang et al (2016) bdquoContext-aware natural lan-

guage generation for spoken dialogue systemsldquo In Proceedingsof COLING 2016 the 26th International Conference on ComputationalLinguistics Technical Papers pp 2032ndash2041 (cit on pp 14 15)

Zhu Yukun Ryan Kiros Rich Zemel Ruslan Salakhutdinov RaquelUrtasun Antonio Torralba and Sanja Fidler (2015) bdquoAligningbooks and movies Towards story-like visual explanations by watch-ing movies and reading booksldquo In Proceedings of the IEEE interna-tional conference on computer vision pp 19ndash27 (cit on p 44)

[ September 4 2020 at 1017 ndash classicthesis ]

  • Abstract
  • Abstract
  • Acknowledgements
  • Contents
  • List of Figures
    • List of Figures
      • List of Tables
        • List of Tables
          • Acronyms
            • Acronyms
              • 1 Introduction
                • 11 Outline
                  • E-learning Conversational Assistant
                    • 2 Foundations
                      • 21 Introduction
                      • 22 Conversational Assistants
                        • 221 Taxonomy
                        • 222 Conventional Architecture
                        • 223 Existing Approaches
                          • 23 Target Domain amp Task Definition
                          • 24 Summary
                            • 3 Multipurpose IPA via Conversational Assistant
                              • 31 Introduction
                                • 311 Outline and Contributions
                                  • 32 Model
                                    • 321 OMB+ Design
                                    • 322 Preprocessing
                                    • 323 Natural Language Understanding
                                    • 324 Dialogue Manager
                                    • 325 Meta Policy
                                    • 326 Response Generation
                                      • 33 Evaluation
                                        • 331 Automated Evaluation
                                        • 332 Human Evaluation amp Error Analysis
                                          • 34 Structured Dialogue Acquisition
                                          • 35 Summary
                                          • 36 Future Work
                                            • 4 Deep Learning Re-Implementation of Units
                                              • 41 Introduction
                                                • 411 Outline and Contributions
                                                  • 42 Related Work
                                                  • 43 BERT Bidirectional Encoder Representations from Transformers
                                                  • 44 Data amp Descriptive Statistics
                                                  • 45 Named Entity Recognition
                                                    • 451 Model Settings
                                                      • 46 Next Action Prediction
                                                        • 461 Model Settings
                                                          • 47 Evaluation and Results
                                                          • 48 Error Analysis
                                                          • 49 Summary
                                                          • 410 Future Work
                                                              • Clustering-based Trend Analyzer
                                                                • 5 Foundations
                                                                  • 51 Motivation and Challenges
                                                                  • 52 Detection of Emerging Research Trends
                                                                    • 521 Methods for Topic Modeling
                                                                    • 522 Topic Evolution of Scientific Publications
                                                                      • 53 Summary
                                                                        • 6 Benchmark for Scientific Trend Detection
                                                                          • 61 Motivation
                                                                            • 611 Outline and Contributions
                                                                              • 62 Corpus Creation
                                                                                • 621 Underlying Data
                                                                                • 622 Stratification
                                                                                • 623 Document representations amp Clustering
                                                                                • 624 Trend amp Downtrend Estimation
                                                                                • 625 (Down)trend Candidates Validation
                                                                                  • 63 Benchmark Annotation
                                                                                    • 631 Inter-annotator Agreement
                                                                                    • 632 Gold Trends
                                                                                      • 64 Evaluation Measure for Trend Detection
                                                                                      • 65 Trend Detection Baselines
                                                                                        • 651 Configurations
                                                                                        • 652 Experimental Setup and Results
                                                                                          • 66 Benchmark Description
                                                                                          • 67 Summary
                                                                                          • 68 Future Work
                                                                                            • 7 Discussion and Future Work
                                                                                            • Bibliography
Page 3: Statistical Natural Language Processing Methods for

A B S T R A C T

Nowadays digitization is transforming the way businesses work Re-cently Artificial Intelligence (AI) techniques became an essential partof the automation of business processes In addition to cost advan-tages these techniques offer fast processing times and higher cus-tomer satisfaction rates thus ultimately increasing sales One of theintelligent approaches for accelerating digital transformation in com-panies is the Robotic Process Automation (RPA) A RPA-system is asoftware tool that robotizes routine and time-consuming responsibil-ities such as email assessment various calculations or creation ofdocuments and reports (Mohanty and Vyas 2018) Its main objectiveis to organize a smart workflow and therethrough to assist employeesby offering them more scope for cognitively demanding and engag-ing work

Intelligent Process Automation (IPA) offers all these advantages aswell however it goes beyond the RPA by adding AI componentssuch as Machine- and Deep Learning techniques to conventional au-tomation solutions Previously IPA approaches were primarily em-ployed within the computer vision domain However in recent timesNatural Language Processing (NLP) became one of the potential appli-cations for IPA as well due to its ability to understand and interprethuman language Usually NLP methods are used to analyze largeamounts of unstructured textual data and to respond to various in-quiries However one of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots virtual agents)that are used to enable human-to-machine communication Nowa-days conversational agents gain enormous demand due to their abil-ity to support a large number of users simultaneously while com-municating in a natural language The implementation of a conver-sational agent comprises multiple stages and involves diverse typesof NLP sub-tasks starting with natural language understanding (egintent recognition named entity extraction) and going towards dia-logue management (ie determining the next possible bots action)and response generation Typical dialogue system for IPA purposesundertakes straightforward customer support requests (eg FAQs)allowing human workers to focus on more complicated inquiries

In this thesis we are addressing two potential Intelligent ProcessAutomation (IPA) applications and employing statistical Natural Lan-guage Processing (NLP) methods for their implementation

The first block of this thesis (Chapter 2 ndash Chapter 4) deals withthe development of a conversational agent for IPA purposes within thee-learning domain As already mentioned chatbots are one of thecentral applications for the IPA domain since they can effectively per-form time-consuming tasks while communicating in a natural lan-

III

[ September 4 2020 at 1017 ndash classicthesis ]

guage Within this thesis we realized the IPA conversational botthat takes care of routine and time-consuming tasks regularly per-formed by human tutors of an online mathematical course Thisbot is deployed in a real-world setting within the OMB+ mathemat-ical platform Conducting experiments for this part we observedtwo possibilities to build the conversational agent in industrial set-tings ndash first with purely rule-based methods considering the miss-ing training data and individual aspects of the target domain (iee-learning) Second we re-implemented two of the main system com-ponents (ie Natural Language Understanding (NLU) and DialogueManager (DM) units) using the current state-of-the-art deep-learningarchitecture (ie Bidirectional Encoder Representations from Trans-formers (BERT)) and investigated their performance and potential useas a part of a hybrid model (ie containing both rule-based and ma-chine learning methods)

The second part of the thesis (Chapter 5 ndash Chapter 6) considers anIPA subproblem within the predictive analytics domain and addressesthe task of scientific trend forecasting Predictive analytics forecastsfuture outcomes based on historical and current data Therefore us-ing the benefits of advanced analytics models an organization canfor instance reliably determine trends and emerging topics and thenmanipulate it while making significant business decisions (ie invest-ments) In this work we dealt with the trend detection task ndash specif-ically we addressed the lack of publicly available benchmarks forevaluating trend detection algorithms We assembled the benchmarkfor the detection of both scientific trends and downtrends (ie topicsthat become less frequent overtime) To the best of our knowledgethe task of downtrend detection has not been addressed before Theresulting benchmark is based on a collection of more than one mil-lion documents which is among the largest that has been used fortrend detection before and therefore offers a realistic setting for thedevelopment of trend detection algorithms

IV

[ September 4 2020 at 1017 ndash classicthesis ]

Z U S A M M E N FA S S U N G

Robotergesteuerte Prozessautomatisierung (RPA) ist eine Art von Soft-ware-Bots die manuelle menschliche Taumltigkeiten wie die Eingabevon Daten in das System die Anmeldung in Benutzerkonten oderdie Ausfuumlhrung einfacher aber sich wiederholender Arbeitsablaumlufenachahmt (Mohanty and Vyas 2018) Einer der Hauptvorteile undgleichzeitig Nachteil der RPA-bots ist jedoch deren Faumlhigkeit die ge-stellte Aufgabe punktgenau zu erfuumlllen Einerseits ist ein solches Sys-tem in der Lage die Aufgabe akkurat sorgfaumlltig und schnell auszu-fuumlhren Andererseits ist es sehr anfaumlllig fuumlr Veraumlnderungen in definier-ten Szenarien Da der RPA-Bot fuumlr eine bestimmte Aufgabe konzipiertist ist es oft nicht moumlglich ihn an andere Domaumlnen oder sogar fuumlreinfache Aumlnderungen in einem Arbeitsablauf anzupassen (Mohantyand Vyas 2018) Diese Unfaumlhigkeit sich an veraumlnderte Bedingungenanzupassen fuumlhrte zu einem weiteren Verbesserungsbereich fuumlr RPA-bots ndash den Intelligenten Prozessautomatisierungssystemen (IPA)

IPA-Bots kombinieren RPA mit Kuumlnstlicher Intelligenz (AI) und koumln-nen komplexe und kognitiv anspruchsvollere Aufgaben erfuumlllen dieuA Schlussfolgerungen und natuumlrliches Sprachverstaumlndnis erfordernDiese Systeme uumlbernehmen zeitaufwaumlndige und routinemaumlszligige Auf-gaben ermoumlglichen somit einen intelligenten Arbeitsablauf und be-freien Fachkraumlfte fuumlr die Durchfuumlhrung komplizierterer AufgabenBisher wurden die IPA-Techniken hauptsaumlchlich im Bereich der Bild-verarbeitung eingesetzt In der letzten Zeit wurde die natuumlrlicheSprachverarbeitung (NLP) jedoch auch zu einem der potenziellen An-wendungen fuumlr IPA und zwar aufgrund von der Faumlhigkeit die mensch-liche Sprache zu interpretieren NLP-Methoden werden eingesetztum groszlige Mengen an Textdaten zu analysieren und auf verschiedeneAnfragen zu reagieren Auch wenn die verfuumlgbaren Daten unstruk-turiert sind oder kein vordefiniertes Format haben (zB E-Mails) oderwenn die in einem variablen Format vorliegen (zB Rechnungen ju-ristische Dokumente) dann werden ebenfalls die NLP Techniken an-gewendet um die relevanten Informationen zu extrahieren die dannzur Loumlsung verschiedener Probleme verwendet werden koumlnnen

NLP im Rahmen von IPA beschraumlnkt sich jedoch nicht auf die Extrak-tion relevanter Daten aus Textdokumenten Eine der zentralen An-wendungen von IPA sind Konversationsagenten die zur Interaktionzwischen Mensch und Maschine eingesetzt werden Konversations-agenten erfahren enorme Nachfrage da sie in der Lage sind einegroszlige Anzahl von Benutzern gleichzeitig zu unterstuumltzen und dabeiin einer natuumlrlichen Sprache kommunizieren Die Implementierungeines Chatsystems umfasst verschiedene Arten von NLP-Teilaufgabenbeginnend mit dem Verstaumlndnis der natuumlrlichen Sprache (zB Ab-sichtserkennung Extraktion von Entitaumlten) uumlber das Dialogmanage-

V

[ September 4 2020 at 1017 ndash classicthesis ]

ment (zB Festlegung der naumlchstmoumlglichen Bot-Aktion) bis hin zurResponse-Generierung Ein typisches Dialogsystem fuumlr IPA-Zweckeuumlbernimmt in der Regel unkomplizierte Kundendienstanfragen (zBBeantwortung von FAQs) so dass sich die Mitarbeiter auf komplexereAnfragen konzentrieren koumlnnen

Diese Dissertation umfasst zwei Bereiche die durch das breitereThema vereint sind naumlmlich die Intelligente Prozessautomatisierung(IPA) unter Verwendung statistischer Methoden der natuumlrlichen Sprach-verarbeitung (NLP)

Der erste Block dieser Arbeit (Kapitel 2 ndash Kapitel 4) befasst sichmit der Impementierung eines Konversationsagenten fuumlr IPA-Zweckeinnerhalb der E-Learning-Domaumlne Wie bereits erwaumlhnt sind Chat-bots eine der zentralen Anwendungen fuumlr die IPA-Domaumlne da siezeitaufwaumlndige Aufgaben in einer natuumlrlichen Sprache effektiv aus-fuumlhren koumlnnen Der IPA-Kommunikationsbot der in dieser Arbeitrealisiert wurde kuumlmmert sich ebenfalls um routinemaumlszligige und zeit-aufwaumlndige Aufgaben die sonst von Tutoren in einem Online-Mathe-matikkurs in deutscher Sprache durchgefuumlhrt werden Dieser Botist in der taumlglichen Anwendung innerhalb der mathematischen Platt-form OMB+ eingesetzt Bei der Durchfuumlhrung von Experimenten be-obachteten wir zwei Moumlglichkeiten den Konversationsagenten im in-dustriellen Umfeld zu entwickeln ndash zunaumlchst mit rein regelbasiertenMethoden unter Bedingungen der fehlenden Trainingsdaten und be-sonderer Aspekte der Zieldomaumlne (dh E-Learning) Zweitens habenwir zwei der Hauptsystemkomponenten (SprachverstaumlndnismodulDialog-Manager) mit dem derzeit fortschrittlichsten Deep LearningAlgorithmus reimplementiert und die Performanz dieser Komponen-ten untersucht

Der zweite Teil der Doktorarbeit (Kapitel 5 ndash Kapitel 6) betrachtetein IPA-Problem innerhalb des Vorhersageanalytik-Bereichs Vorher-sageanalytik zielt darauf ab Prognosen uumlber zukuumlnftige Ergebnisseauf der Grundlage von historischen und aktuellen Daten zu erstellenDaher kann ein Unternehmen mit Hilfe der Vorhersagesysteme zBdie Trends oder neu entstehende Themen zuverlaumlssig bestimmen unddiese Informationen dann bei wichtigen Geschaumlftsentscheidungen (zBInvestitionen) einsetzen In diesem Teil der Arbeit beschaumlftigen wiruns mit dem Teilproblem der Trendprognose ndash insbesondere mit demFehlen oumlffentlich zugaumlnglicher Benchmarks fuumlr die Evaluierung vonTrenderkennungsalgorithmen Wir haben den Benchmark zusammen-gestellt und veroumlffentlicht um sowohl Trends als auch Abwaumlrtstrendszu erkennen Nach unserem besten Wissen ist die Aufgabe der Ab-waumlrtstrenderkennung bisher nicht adressiert worden Der resultie-rende Benchmark basiert auf einer Sammlung von mehr als einerMillion Dokumente der zu den groumlszligten gehoumlrt die bisher fuumlr dieTrenderkennung verwendet wurden und somit einen realistischenRahmen fuumlr die Entwicklung von Trenddetektionsalgorithmen bietet

VI

[ September 4 2020 at 1017 ndash classicthesis ]

A C K N O W L E D G E M E N T S

This thesis was possible due to several people who contributed signif-icantly to my work I owe my gratitude to each of them for guidanceencouragement and inspiration during my doctoral study

First of all I would like to thank my advisor Prof Dr HinrichSchuumltze for his mentorship support and thoughtful guidance Myspecial gratitude goes for Prof Dr Ruedi Seiler and his fantastic teamat Integral-Learning GmbH I am thankful for your valuable inputinterest in our collaborative work and for your continuous supportI would also like to show gratitude to Prof Dr Alexander Fraser forco-advising support and helpful feedback

Finally none of this would be possible without my familyrsquos sup-port there are no words to express the depth of gratitude I feel forthem My earnest appreciation goes to my husband Dietrich thankyou for all the time you were by my side and for the encouragementyou provided me through these years I am very grateful to my par-ents for their understanding and backing at any moment I appreciatethe love of my grandparents it is hard to realize how significant theircare is for me Besides that I am also grateful to my sister Alexan-dra elder brother Denis and parents-in-law for merely being alwaysthere for me

VII

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

C O N T E N T S

List of Figures XIList of Tables XIIAcronyms XIII1 introduction 1

11 Outline 3

i e-learning conversational assistant 5

2 foundations 7

21 Introduction 7

22 Conversational Assistants 9

221 Taxonomy 9

222 Conventional Architecture 12

223 Existing Approaches 15

23 Target Domain amp Task Definition 18

24 Summary 19

3 multipurpose ipa via conversational assistant 21

31 Introduction 21

311 Outline and Contributions 22

32 Model 23

321 OMB+ Design 23

322 Preprocessing 23

323 Natural Language Understanding 25

324 Dialogue Manager 26

325 Meta Policy 30

326 Response Generation 32

33 Evaluation 37

331 Automated Evaluation 37

332 Human Evaluation amp Error Analysis 37

34 Structured Dialogue Acquisition 39

35 Summary 39

36 Future Work 40

4 deep learning re-implementation of units 41

41 Introduction 41

411 Outline and Contributions 42

42 Related Work 43

43 BERT Bidirectional Encoder Representations from Trans-formers 44

44 Data amp Descriptive Statistics 46

45 Named Entity Recognition 48

451 Model Settings 49

46 Next Action Prediction 49

461 Model Settings 49

47 Evaluation and Results 50

48 Error Analysis 58

IX

[ September 4 2020 at 1017 ndash classicthesis ]

X contents

49 Summary 58

410 Future Work 59

ii clustering-based trend analyzer 61

5 foundations 63

51 Motivation and Challenges 63

52 Detection of Emerging Research Trends 64

521 Methods for Topic Modeling 65

522 Topic Evolution of Scientific Publications 67

53 Summary 68

6 benchmark for scientific trend detection 71

61 Motivation 71

611 Outline and Contributions 72

62 Corpus Creation 73

621 Underlying Data 73

622 Stratification 73

623 Document representations amp Clustering 74

624 Trend amp Downtrend Estimation 75

625 (Down)trend Candidates Validation 75

63 Benchmark Annotation 77

631 Inter-annotator Agreement 77

632 Gold Trends 78

64 Evaluation Measure for Trend Detection 79

65 Trend Detection Baselines 80

651 Configurations 81

652 Experimental Setup and Results 81

66 Benchmark Description 82

67 Summary 83

68 Future Work 83

7 discussion and future work 85

bibliography 87

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F F I G U R E S

Figure 1 Taxonomy of dialogue systems 10

Figure 2 Example of a conventional dialogue architecture 12

Figure 3 OMB+ online learning platform 24

Figure 4 Ruled-Based System Dialogue flow 36

Figure 5 Macro F1 evaluation NAP ndash default modelComparison between German and multilingualBERT 52

Figure 6 Macro F1 test NAP ndash default model Compari-son between German and multilingual BERT 53

Figure 7 Macro F1 evaluation NAP ndash extended model 54

Figure 8 Macro F1 test NAP ndash extended model 55

Figure 9 Macro F1 evaluation NER ndash Comparison be-tween German and multilingual BERT 56

Figure 10 Macro F1 test NER ndash Comparison between Ger-man and multilingual BERT 57

Figure 11 (Down)trend detection Overall distribution ofpapers in the entire dataset 73

Figure 12 A trend candidate (positive slope) Topic Sen-timent and Emotion Analysis (using Social Me-dia Channels) 76

Figure 13 A downtrend candidate (negative slope) TopicXML 76

XI

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F TA B L E S

Table 1 Natural language representation of a sentence 13

Table 2 Rule 1 Admissible configurations 27

Table 3 Rule 2 Admissible configurations 27

Table 4 Rule 3 Admissible configurations 27

Table 5 Rule 4 Admissible configurations 28

Table 6 Rule 5 Admissible configurations 28

Table 7 Case 1 ID completeness 31

Table 8 Case 2 ID completeness 31

Table 9 Showcase 1 Short flow 33

Table 10 Showcase 2 Organisational question 33

Table 11 Showcase 3 Fallback and human-request poli-cies 34

Table 12 Showcase 4 Contextual question 34

Table 13 Showcase 5 Long flow 35

Table 14 General statistics for conversational dataset 46

Table 15 Detailed statistics on possible systems actionsin conversational dataset 48

Table 16 Detailed statistics on possible named entitiesin conversational dataset 48

Table 17 F1-results for the Named Entity Recognition task 51

Table 18 F1-results for the Next Action Prediction task 51

Table 19 Average dialogue accuracy computed for theNext Action Prediction task 51

Table 20 Trend detection List of gold trends 79

Table 21 Trend detection Example of MAP evaluation 80

Table 22 Trend detection Ablation results 82

XII

[ September 4 2020 at 1017 ndash classicthesis ]

acronyms XIII

AI Artificial Intelligence

API Application Programming Interface

BERT Bidirectional Encoder Representations from Transformers

BiLSTM Bidirectional Long Short Term Memory

CRF Conditional Random Fields

DM Dialogue Manager

DST Dialogue State Tracker

DSTC Dialogue State Tracking Challenge

EC Equivalence Classes

ES ElasticSearch

GM Gaussian Models

IAA Inter Annotator Agreement

ID Informational Dictionary

IPA Intelligent Process Automation

ITS Intelligent Tutoring System

LDA Latent Dirichlet Allocation

LSA Latent Semantic Analysis

LSTM Long Short Term Memory

MAP Mean Average Precision

MER Mutually Exclusive Rules

NAP Next Action Prediction

NER Named Entity Recognition

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OMB+ Online Mathematik Bruumlckenkurs

PL Policy Learning

pLSA probabilistic Latent Semantic Analysis

REST Representational State Transfer

RNN Recurrent Neural Network

RPA Robotic Process Automation

RegEx Regular Expressions

TL Transfer Learning

ToDS Task-oriented Dialogue System

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

1I N T R O D U C T I O N

Robotic Process Automation (RPA) is a type of software bots that im-itates manual human activities like entering data into a system log-ging into accounts or executing simple but repetitive workflows (Mo-hanty and Vyas 2018) In other words RPA-systems allow businessapplications to work without human involvement by replicating hu-man worker activities A further benefit of RPA-bots is that they aremuch cheaper compared to the employment of a human worker andthey are also able to work around-the-clock Overall advantages ofsoftware bots are hugely appealing and include cost savings accuracyand compliance while executing processes improved responsiveness(ie bots are faster than human workers) and finally such bots areusually agile and multi-skilled (Ivancic Vugec and Vukšic 2019)

However one of the main benefits and at the same time ndash draw-backs of the RPA-systems is their ability to precisely fulfill the as-signed task On the one hand the bot can carry out the task accu-rately and diligently On the other hand it is highly susceptible tochanges in defined scenarios Being designed for a particular taskthe RPA-bot is often not adaptable to other domains or even simplechanges in a workflow (Mohanty and Vyas 2018) This inability toadapt to changing conditions gave rise to a further area of improve-ment for RPA-bots ndash Intelligent Process Automation (IPA) systems

IPA-bots combine Robotic Process Automation (RPA) with ArtificialIntelligence (AI) and thus can perform complex and more cognitivelydemanding tasks that require reasoning and language understand-ing Hence IPA-bots went beyond automating simple ldquoclick tasksrdquo (asin RPA) and can accomplish jobs more intelligently ndash employing ma-chine learning algorithms and advanced analytics Such IPA-systemsundertake time-consuming and routine tasks and thus enable smartworkflows and free up skilled workers to accomplish higher-valueactivities

Previously IPA techniques were primarily employed within the com-puter vision domain Starting with the automation of simple displayclick-tasks and going towards sophisticated self-driving cars withtheir ability to interpret a stream of situational data obtained fromsensors and cameras However in recent times Natural LanguageProcessing (NLP) became one of the potential applications for IPA aswell due to its ability to understand and interpret human languageNLP methods are used to analyze large amounts of textual data andto respond to various inquiries Furthermore if the available datais unstructured and does not have any predefined format (ie cus-tomer emails) or if it is available in a variable format (ie invoices

1

[ September 4 2020 at 1017 ndash classicthesis ]

2 introduction

legal documents) then the Natural Language Processing (NLP) tech-niques are applied to extract the multiple relevant information Thisdata then can be utilized to solve distinct problems (ie natural lan-guage understanding predictive analytics) (Mohanty and Vyas 2018)For instance in the case of predictive analytics such a bot wouldmake forecasts about the future using current and historical data andtherethrough assist with sales or investments

However NLP in the context of Intelligent Process Automation (IPA)is not limited to the extraction of relevant data and insights from tex-tual documents One of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots) that are used toenable human-to-machine interaction This type of software is com-monly used for customer support or within recommendation- andbooking systems Conversational agents gain enormous demand dueto their ability to simultaneously support a large number of userswhile communicating in a natural language In a conventional chat-bot system a user provides input in a natural language the bot thenprocesses this query to extract potentially useful information and re-sponds with a relevant reply This process comprises multiple stagesand involves different types of NLP subtasks starting with NaturalLanguage Understanding (NLU) (eg intent recognition named en-tity extraction) and going towards dialogue management (ie deter-mining the next possible bots action considering the dialogue his-tory) and response generation (eg converting the semantic repre-sentation of next bots action to a natural language utterance) Typicaldialogue system for Intelligent Process Automation (IPA) purposesusually undertakes shallow customer support requests (eg answer-ing of FAQs) allowing human workers to focus on more sophisticatedinquiries

Natural Language Processing (NLP) techniques may also be usedindirectly for IPA purposes There are attempts to automatically iden-tify and classify tasks extracted from textual process descriptions asmanual user or automated The goal of such an NLP applicationis to reduce the effort required to identify suitable candidates for fur-ther robotic process automation (Friedrich Mendling and Puhlmann2011 Leopold Aa and Reijers 2018)

Intelligent Process Automation (IPA) is an emerging technology andevolves mainly due to the recognized potential benefit of combiningRobotic Process Automation (RPA) and Artificial Intelligence (AI) tech-niques to solve industrial problems (Ivancic Vugec and Vukšic 2019Mohanty and Vyas 2018) Industries are typically interested in be-ing responsive to customers and markets (Mohanty and Vyas 2018)Thus most customer support applications need to be real-time activeand highly robust to achieve higher client satisfaction rates The man-ual accomplishment of such tasks is often time-consuming expensiveand error-prone Furthermore due to the massive amounts of textualdata available for analysis it seems not to be feasible to process thisdata entirely by human means

[ September 4 2020 at 1017 ndash classicthesis ]

11 outline 3

11 outline

This thesis covers two topics united by the broader theme which isIntelligent Process Automation (IPA) utilizing statistical Natural Lan-guage Processing (NLP) methods

the first block of the thesis (Chapter 2 ndash Chapter 4) addressesthe challenge of implementing a conversational agent for Intelli-gent Process Automation (IPA) purposes within the e-learningfield As mentioned above chatbots are one of the central ap-plications of the IPA domain since they can effectively performtime-consuming tasks requiring natural language understand-ing allowing human workers to concentrate on more valuabletasks The IPA conversational bot that was realized within thisthesis takes care of routine and time-consuming tasks performedby human tutors within an online mathematical course Thisbot is deployed in a real-world setting and interact with stu-dents within the OMB+ mathematical platform Conducting ex-periments we observed two possibilities to build the conversa-tional agent in industrial settings ndash first with purely rule-basedmethods considering the missing training data and individualaspects of the target domain (ie e-learning) Second we re-implemented two of the main system components (ie Natu-ral Language Understanding (NLU) and Dialogue Manager (DM)units) using the current state-of-the-art deep-learning algorithm(ie Bidirectional Encoder Representations from Transformers(BERT)) and investigated their performance and potential use asa part of a hybrid model (ie containing both rule-based andmachine learning methods)

the second block of the thesis (Chapter 5 ndash Chapter 6) con-siders an Intelligent Process Automation (IPA) problem withinthe predictive analytics domain and addresses the task of scien-tific trend detection Predictive analytics makes forecasts aboutfuture outcomes based on historical and current data Henceorganizations can benefit through the advanced analytics mod-els by detecting trends and their behaviors and then employ-ing this information while making crucial business decisions(ie investments) In this work we deal with the subproblemof trend detection task ndash specifically we address the lack ofpublicly available benchmarks for the evaluation of trend de-tection algorithms Therefore we assembled the benchmark forboth trends and downtrends (ie topics that become less frequentovertime) detection To the best of our knowledge the task ofdowntrend detection has not been well addressed before Fur-thermore the resulting benchmark is based on a collection ofmore than one million documents which is among the largestthat has been used for trend detection before and therefore of-fers a realistic setting for developing trend detection algorithms

[ September 4 2020 at 1017 ndash classicthesis ]

4 introduction

All main chapters contain their specific introduction contribution listrelated work section and summary

[ September 4 2020 at 1017 ndash classicthesis ]

Part I

E - L E A R N I N G C O N V E R S AT I O N A L A S S I S TA N T

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

2F O U N D AT I O N S

In this block we address the challenge of designing and implement-ing a conversational assistant for Intelligent Process Automation (IPA)purposes within the e-learning domain The system aims to supporthuman tutors of the Online Mathematik Bruumlckenkurs (OMB+) plat-form during the tutoring round by undertaking routine and time-consuming responsibilities Specifically the system intelligently auto-mates the process of information accumulation and validation thatprecede every conversation Therethrough the IPA-bot frees up tutorsand allows them to concentrate on more complicated and cognitivelydemanding tasks

In this chapter we introduce the fundamental concepts required bythe topic of the first part of the thesis We begin with the foundationsto conversational assistants in Section 22 where we present the taxon-omy of existing dialogue systems in Section 221 give an overview ofthe conventional dialogue architecture in Section 222 with its mostessential components and describe the existing approaches for build-ing a conversational agent in Section 223 We finally introduce thetarget domain for our experiments in Section 23

21 introduction

Conversational Assistants - a type of software that interacts with peo-ple via written or spoken natural language have become widely pop-ularized in recent times Todayrsquos conversational assistants support abroad range of everyday skills including but not limited to creatinga reminder answering factoid questions and controlling smart homedevices

Initially the notion of artificial companion capable of conversingin a natural language was anticipated by Alan Turing in 1949 (Tur-ing 1949) Still the first significant step in this direction was donein 1966 with the appearance of Weizenbaums system ELIZA (Weizen-baum 1966) Its successor was the ALICE (Artificial LinguisticInternet Computer Entity) ndash a natural language processing dialoguesystem that conversed with a human by applying heuristical patternmatching rules Despite being remarkably popular and technologi-cally advanced for their days both systems were unable to pass theTuring test (Alan 1950) Following a long silent period the nextwave of growth of intelligent agents was caused by the advances inNatural Language Processing (NLP) This was enabled by access tolow-cost memory higher processing speed vast amounts of available

7

[ September 4 2020 at 1017 ndash classicthesis ]

8 foundations

data and sophisticated machine learning algorithms Hence one ofthe most notable leaps in the history of conversational agents beganin the year 2011 with intelligent personal assistants In that time Ap-ples Siri followed by Microsofts Cortana and Amazons Alexa in 2014and then Google Assistant in 2016 came to the scene The main focusof these agents was not to mimic humans as in ELIZA or ALICEbut to assist a human by answering simple factoid questions and set-ting notification tasks across a broad range of content Another typeof conversational agent called Task-oriented Dialogue System (ToDS)became a focus of industries in the year 2016 For the most part thesebots aimed to support customer service (Cui et al 2017) and respondto Frequently Asked Questions (Liao et al 2018) Their conversa-tional style provided more depth than those of personal assistantsbut a narrower range since they were patterned for task solving andnot for open-domain discussions

One of the central advantages of conversational agents and the rea-son why they are attracting considerable interest is their ability togive attention to several users simultaneously while supporting nat-ural language communication Due to this conversational assistantsgain momentum in numerous kinds of practical support applications(ie customer support) where a high number of users are involved atthe same time Classical engagement of human assistants is very cost-efficient and such support is usually not full-time accessible There-fore to automate human assistance is desirable but also an ambi-tious task Due to the lack of solid Natural Language Understand-ing (NLU) advancing beyond primary tasks is still challenging (Lugerand Sellen 2016) and even the most prosperous dialogue systemsoften fail to meet userrsquos expectations (Jain et al 2018) Significantchallenges involved in the design and implementation of a conversa-tional agent varying from domain engineering to Natural LanguageUnderstanding (NLU) and designing of an adaptive domain-specificconversational flow The quintessential problems and how-to ques-tions while building conversational assistants include

bull How to deal with variability and flexibility of language

bull How to get high-quality in-domain data

bull How to build meaningful representations

bull How to integrate commonsense and domain knowledge

bull How to build a robust system

Besides that current Natural Language Processing (NLP) and Ar-tificial Intelligence (AI) techniques have weaknesses when applied todomain-specific task-oriented problems especially if underlying datacomes from an industrial source As machine learning techniquesheavily rely on structured and cleaned training data to deliver consis-tent performance typically noisy real-world data without access toadditional knowledge databases negatively impacts the classificationaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 9

Despite that conversational agents could be successfully employedto solve less cognitive but still highly relevant tasks In this case adialogue agent could be operated as a part of the Intelligent ProcessAutomation (IPA) system where it takes care of straightforward butroutine and time-consuming functions performed in a natural lan-guage while allowing a human worker to focus on more cognitivedemanding processes

22 conversational assistants

Development and design of a particular dialogue system usually varyby its objectives among the most fundamental are

bull Purpose Should it be an open-ended dialogue system (egchit-chat) task-oriented system (eg customer support) or apersonal assistant (eg Google Assistant)

bull Skills What kind of behavior should a system demonstrate

bull Data Which data is required to develop (resp train) specificblocks and how could it be assembled and curated

bull Deployment Will the final system be integrated into a partic-ular software platform What are the challenges in choosingsuitable tools How will the deployment happen

bull Implementation Should it be a rule-based information-retrievalmachine-learning or a hybrid system

In the following chapter we will take a look at the most commontaxonomies of dialogue systems and different design and implemen-tation strategies

221 Taxonomy

According to the goals conversational agents could be classified intotwo main categories task-oriented systems (eg for customer support)and social bots (eg for chit-chat purposes) Furthermore systemsalso vary regarding their implementation characteristics Below weprovide a detailed overview of the most common variations

task-oriented dialogue systems are famous for their abilityto lead a dialogue with the ultimate goal of solving a userrsquos problem(see Figure 1) A customer support agent or a flightrestaurant book-ing system exemplify this class of conversational assistants Suchsystems are more meaningful and useful as those with an entirely so-cial goal however they usually affect more complications during theimplementation stage Since Task-oriented Dialogue System (ToDS) re-quire a precise understanding of the userrsquos intent (ie goal) the Natu-ral Language Understanding (NLU) segment must perform flawlessly

[ September 4 2020 at 1017 ndash classicthesis ]

10 foundations

Besides that if applied in an industrial setting ToDS are usually builtmodular which means they mostly rely on handcrafted rules and pre-defined templates and are restricted in their abilities Personal assis-tants (ie Google Assistant) are members of the task-oriented familyas well Though compared to standard agents they involve more so-cial conversation and could be used for entertainment purposes (egldquoOk Google tell me a jokerdquo) The latter is typically not available inconventional task-oriented systems

social bots and chit-chat systems in contrast regularly donot hold any specific goal and are designed for small-talk and enter-tainment conversations (see Figure 1) Those agents are usually end-to-end trainable and highly data-driven which means they requirelarge amounts of training data However due to entirely learnabletraining such systems often produce inappropriate responses mak-ing them extremely unreliable for practical applications Alike chit-chat systems social bots are rich in social conversation however theypossess an ability to solve uncomplicated user requests and conducta more meaningful dialogue (Fang et al 2018) Those requests arenot the same as in task-oriented systems and are mostly pointed toanswer simple factoid questions

Figure 1 Taxonomy of dialogue systems according to the measure of theirtask accomplishment and social contribution Figure originatesfrom Fang et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 11

The aforementioned system types could be decomposed farther ac-cording to the implementation characteristics

open domain amp closed domain A Task-oriented Dialogue Sys-tem (ToDS) is typically built within a closed domain setting Thespace of potential inputs and outputs is restricted since the sys-tem attempts to achieve a specific goal Customer support orshopping assistants are cases of closed domain problems (Wenet al 2016) These systems do not need to be able to talk aboutoff-site topics but they have to fulfill their particular tasks asefficiently as possible

Whereas chit-chat systems do not assume a well-defined goalor intention and thus could be built within an open domainThis means the conversation can progress in any direction andwithout any or minimal functional purpose (Serban et al 2016)However the infinite number of various topics and the fact thata certain amount of commonsense is required to create reason-able responses makes an open-domain setting a hard problem

content-driven amp user-centric Social bots and chit-chat sys-tems are typically utilizing a content-driven dialogue mannerThat means that their responses are based on the large and dy-namic content derived from the daily web mining and knowl-edge graphs This approach builds a dialogue based on trendycontent from diverse sources and thus it is an ideal way forsocial bots to catch usersrsquo attention and bring a user into thedialogue

User-centric approaches in turn assume precise language un-derstanding to detect the sentiment of usersrsquo statements Ac-cording to the sentiment the dialogue manager learns usersrsquopersonalities handles rapid topic changes and tracks engage-ment In this case the dialogue builds on specific user interests

long amp short conversations Models with a short conversationstyle attempt to create a single response to a single input (egweather forecast) In the case with a long conversation a systemhas to go through multiple turns and to keep track of what hasbeen said before (ie dialogue history) The longer the conver-sation the more challenging it is to train a system Customersupport conversations are typically long conversational threadswith multiple questions

response generation One of the essential elements of a dialoguesystem is a response generation This could be either done in aretrieval-based manner or by using generative models (ie NaturalLanguage Generation (NLG))

The former usually utilizes predefined responses and heuristicsto select a relevant response based on the input and context Theheuristic could be a rule-based expression match (ie RegularExpressions (RegEx)) or an ensemble of machine learning clas-

[ September 4 2020 at 1017 ndash classicthesis ]

12 foundations

sifiers (eg entity extraction prediction of next action) Suchsystems do not generate any original text instead they select aresponse from a fixed set of predefined candidates These mod-els are usually used in an industrial setting where the systemmust show high robustness performance and interpretabilityof the outcomes

The latter in contrast do not rely on predefined replies Sincesuch models are typically based on (deep-) machine learningtechniques they attempt to generate new responses from scratchThis is however a complicated task and is mostly used forvanilla chit-chat systems or in a research domain where thestructured training data is available

222 Conventional Architecture

In this and the following subsection we review the pipeline and meth-ods for Task-oriented Dialogue System (ToDS) Such agents are typi-cally composed of several components (Figure 2) addressing diversetasks required to facilitate a natural language dialogue (WilliamsRaux and Henderson 2016)

Figure 2 Example of a conventional dialogue architecture with its maincomponents in the bounding box Language Understanding (NLU)Dialogue Manager (DM) Response Generation (NLG) Figure origi-nates from NAACL 2018 Tutorial Su et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 13

A conventional dialogue architecture implements this with the fol-lowing components

natural language understanding unit has the following ob-jectives it parses the user input classifies the domain (ie cus-tomer support restaurant booking not required for a singledomain systems) and intent (ie find-flight book-taxi) and ex-tracts entities (ie Italian food Berlin)

Generally NLU unit maps every user utterance into semanticframes that are predefined according to different scenarios Ta-ble 1 illustrates an example of sentence representation wherethe departure-location (Berlin) and the arrival-location (Munich)are specified as slot values and the domain (airline travel) andintent (find-flight) are specified as well Typically there are twokinds of possible representations The first one is the utterancecategory represented in the form of the userrsquos intent The sec-ond could be the word-level information extraction in the formof Named Entity Recognition (NER) or Slot Filling tasks

Sentence Find flight from Berlin to Munich today

SlotsConcepts O O O B-dept O B-arr B-date

Named Entity O O O B-city O B-city O

Intent Find_Flight

Domain Airline Travel

Table 1 An illustrative example of a sentence representation1

A dialog act classification sub-module also known as intent clas-sification is dedicated to detecting the primary userrsquos goal atevery dialogue state It labels each userrsquos utterance with oneof the predefined intents Deep neural networks can be directlyapplied to this conventional classification problem (KhanpourGuntakandla and Nielsen 2016 Lee and Dernoncourt 2016)

Slot filling is another sub-module for language understandingUnlike intent detection (ie classification problem) slot fillingis defined as a sequence labeling problem where words in thesequence are labeled with semantic tags The input is a se-quence of words and the output is a sequence of slots onefor every word Variations of Recurrent Neural Network (RNN)architectures (LSTM BiLSTM) (Deoras et al 2015) and (attention-based) encoder-decoder architectures (Kurata et al 2016) havebeen employed for this task

[ September 4 2020 at 1017 ndash classicthesis ]

14 foundations

dialogue manager unit is subdivided into two components ndashfirst is a Dialogue State Tracker (DST) that holds the conver-sational state and second is a Policy Learning (PL) that definesthe systems next action

Dialogue State Tracker (DST) is the core component to guaranteea robust dialogue flow since it manages the userrsquos intent at ev-ery turn The conventional methods which are broadly appliedin commercial implementations often adopt handcrafted rulesto pick the most probable candidate (Goddeau et al 1996 Wangand Lemon 2013) among the predefined Though a variety ofstatistical approaches have emerged since the Dialogue StateTracking Challenge (DSTC)2 was arranged by Jason D Williamsin year 2013 Among the deep-learning based approaches isthe Belief Tracker proposed by Henderson Thomson and Young(2013) It employs a sliding window to produce a sequence ofprobability distributions over an arbitrary number of possiblevalues (Chen et al 2017) In the work of Mrkšic et al (2016) au-thors introduced a Neural Belief Tracker to identify the slot-valuepairs The system used the user intents preceding the user in-put the user utterance itself and a candidate slot-value pairwhich it needs to decide about as the input Then the systemiterated overall candidate slot-value pairs to determine whichones have just been expressed by the user (Chen et al 2017)Finally Vodolaacuten Kadlec and Kleindienst proposed a Hybrid Di-alog State Tracker that achieved the state-of-the-art performancefor DSTC2 shared task Their model used a separate-learned de-coder coupled with a rule-based system

Based on the state representation from the Dialogue State Tracker(DST) the PL unit generates the next system action that is thendecoded by the Natural Language Generation (NLG) moduleUsually a rule-based agent is used to initialize the system (Yanet al 2017) and then supervised learning is applied to theactions generated by these rules Alternatively the dialoguepolicy can be trained end-to-end with reinforcement learningto manage the system making policies toward the final perfor-mance (Cuayaacutehuitl Keizer and Lemon 2015)

natural language generation unit transforms the next actioninto a natural language response Conventional approaches toNatural Language Generation (NLG) typically adopt a template-based style where a set of rules is defined to map every nextaction to a predefined natural language response Such ap-proaches are simple generally error-free and easy to controlhowever their implementation is time-consuming and the styleis repetitive and hardly scalable

Among the deep learning techniques is the work by Wen etal (2015) that introduced neural network-based approaches toNLG with an LSTM-based structure Later Zhou and Huang

2Currently Dialog System Technology Challenge

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 15

(2016) adopted this model along with the question informa-tion semantic slot values and dialogue act types to generatemore precise answers Finally Wu Socher and Xiong (2019)proposed Global to Local Memory Pointer Networks (GLMP) ndash anarchitecture that incorporates knowledge bases directly into alearning framework (due to the Memory Networks component)and is able to re-use information from the user input (due tothe Pointer Networks component)

In particular cases dialogue flow can include speech recognitionand speech synthesis units (if managing a spoken natural languagesystem) and connection to third-party APIs to request informationstored in external databases (eg weather forecast) (as depicted inFigure 2)

223 Existing Approaches

As stated above a conventional task-oriented dialogue system is im-plemented in a pipeline manner That means that once the systemreceives a user query it interprets it and acts according to the dia-logue state and the corresponding policy Additionally based on un-derstanding it may access the knowledge base to find the demandedinformation there Finally a system transforms a systemrsquos next ac-tion (and if given an extracted information) into its surface form as anatural language response (Monostori et al 1996)

Individual components (ie Natural Language Understanding (NLU)Dialogue Manager (DM) Natural Language Generation (NLG)) of aparticular conversational system could be implemented using differ-ent approaches starting with entirely rule- and template-based meth-ods and going towards hybrid approaches (using learnable componentsalong with handcrafted units) and end-to-end trainable machine learn-ing methods

rule-based approaches Though many of the latest researchapproaches handle Natural Language Understanding (NLU) and Nat-ural Language Generation (NLG) units by using statistical NaturalLanguage Processing (NLP) models (Bocklisch et al 2017 Burtsevet al 2018 Honnibal and Montani 2017) most of the industriallydeployed dialogue systems still utilise handcrafted features and rulesfor the state tracking action prediction intent recognition and slotfilling tasks (Chen et al 2017 Ultes et al 2017) So most of thePyDial3 framework modules offer rule-based implementations usingRegular Expressions (RegEx) rule-based dialogue trackers (Hender-son Thomson and Williams 2014) and handcrafted policies Alsoin order to map the next action to text Ultes et al (2017) suggest rulesalong with a template-based response generation

3PyDial Framework httpwwwcamdialorgpydial

[ September 4 2020 at 1017 ndash classicthesis ]

16 foundations

The rule-based approach ensures robustness and stable performancethat is crucial for industrial systems that communicate with a largenumber of users simultaneously However it is highly expensive andtime-consuming to deploy a real dialogue system built in that man-ner The major disadvantage is that the usage of handcrafted systemsis restricted to a specific domain and possible adaptation requiresextensive manual engineering

end-to-end learning approaches Due to the recent advanceof end-to-end neural generative models (Collobert et al 2011) manyefforts have been made to build an end-to-end trainable architecturefor conversational agents Rather than using the traditional pipeline(see Figure 2) the end-to-end model is conceived as a single module(Chen et al 2017)

Wen et al (2016) followed by Bordes Boureau and Weston (2016)were among the pioneers who adapted an end-to-end trainable ap-proach for conversational systems Their approach was to treat dia-logue learning as the problem of learning a mapping from dialoguehistories to system responses and to apply an encoder-decoder model(Sutskever Vinyals and Le 2014) to train the entire system The pro-posed model still had its significant limitations (Glasmachers 2017)such as missing policies to learn a dialogue flow and the inability toincorporate additional knowledge that is especially crucial for train-ing a task-oriented agent To overcome the first problem Dhingraet al (2016) introduced an end-to-end reinforcement learning approachthat jointly trains dialogue state tracker and policy learning unitsThe second weakness was addressed by Eric and Manning (2017)who augmented existing recurrent network architectures with a dif-ferentiable attention-based key-value retrieval mechanism over theentries of a knowledge base This approach was inspired by Key-Value Memory Networks (Miller et al 2016) Later on Wu Socherand Xiong (2019) proposed Global to Local Memory Pointer Networks(GLMP) where the authors incorporated knowledge bases directlyinto a learning framework In their model a global memory encoderand a local memory decoder are employed to share external knowl-edge The encoder encodes dialogue history modifies global con-textual representation and generates a global memory pointer Thedecoder first produces a sketch response with empty slots Next ittransfers the global memory pointer to filter the external knowledgefor relevant information and then instantiates the slots via the localmemory pointers

Despite having better adaptability compared to any rule-based sys-tem and being simpler to train end-to-end approaches remain unattain-able for commercial conversational agents that operating on real-world data A well and carefully constructed task-oriented dialoguesystem in an observed domain using handcrafted rules (in NLU andDST units) and predefined responses for NLG still outperforms the

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 17

end-to-end systems due to its robustness (Glasmachers 2017 WuSocher and Xiong 2019)

hybrid approaches Though end-to-end learning is an attrac-tive solution for conversational systems current techniques are data-intensive and require large amounts of dialogues to learn simple be-haviors To overcome this barrier Williams Asadi and Zweig (2017)introduce Hybrid Code Networks (HCNs) which is an ensemble ofretrieval and trainable units System utilizes an Recurrent NeuralNetwork (RNN) for Natural Language Understanding (NLU) blockdomain-specific knowledge that are encoded as software (ie API

calls) and custom action templates used for Natural Language Gen-eration (NLG) unit Compared to existing end-to-end methods theauthors report that their approach considerably reduces the amountof data required for training (Williams Asadi and Zweig 2017) Ac-cording to the authors HCNs achieve state-of-the-art performance onthe bAbI dialog dataset (Bordes Boureau and Weston 2016) and out-perform two commercial customer dialogue systems4

Along with this work Wen et al (2016) propose a neural network-based model that is end-to-end trainable but still modularly con-nected In their work authors treat dialogue as a sequence to se-quence mapping problem (Sutskever Vinyals and Le 2014) aug-mented with the dialogue history and the current database searchoutcome (Wen et al 2016) The authors explain the learning processas follows At every turn the system takes a user input representedas a sequence of tokens and converts it into two internal represen-tations a distributed representation of the intent and a probabilitydistribution over slot-value pairs called the belief state (Young et al2013) The database operator then picks the most probable values inthe belief state to form the search result At the same time the intentrepresentation and the belief state are transformed and combined bya Policy Learning (PL) unit to form a single vector representing thenext system action This system action is then used to generate asketch response token by the token The final system response iscomposed by substituting the actual values of the database entriesinto the sketch sentence structure (Wen et al 2016)

Hybrid models appear to replace the established rule- and template-based approaches currently utilized in an industrial setting The per-formance of the most Natural Language Understanding (NLU) com-ponents (ie Named Entity Recognition (NER) domain and intentclassification) is reasonably high to apply them for the developmentof commercial conversational agents Whereas the Natural LanguageGeneration (NLG) unit could still be realized as a template-based so-lution by employing predefined response candidates and predictingthe next system action employing deep learning systems

4Internal Microsoft Research datasets in customer support and troubleshootingdomains were employed for training and evaluation

[ September 4 2020 at 1017 ndash classicthesis ]

18 foundations

23 target domain amp task definition

MUMIE5 is an open-source e-learning platform for studying and teach-ing mathematics and computer science Online Mathematik Bruumlck-enkurs (OMB+) is one of the projects powered by MUMIE SpecificallyOnline Mathematik Bruumlckenkurs (OMB+)6 is a German online learningplatform that assists students who are preparing for an engineeringor computer science study at a university

The coursersquos central purpose is to assist students in reviving theirmathematical skills so that they can follow the upcoming universitycourses This online course is thematically segmented into 13 partsand includes free mathematical classes with theoretical and practi-cal content Every chapter begins with an overview of its sectionsand consists of explanatory texts with built-in videos examples andquick checks of understanding Furthermore various examinationmodes to practice the content (ie training exercises) and to checkthe understanding of the theory (ie quiz) are available Besidesthat OMB+ provides a possibility to get assistance from a human tu-tor Usually the students and tutors interact via a chat interface inwritten form The language of communication is German

The current dilemma of the OMB+ platform is that the number ofstudents grows significantly every year but it is challenging to findqualified tutors ndash and it is expensive to hire more human tutors Thisresults in a longer waiting period for students until their problemscan be considered and processed In general all student questionscan be grouped into three main categories Organizational questions(ie course certificate) contextual questions (ie content theorem) andmathematical questions (exercises solutions) To assist a student witha mathematical question a tutor has to know the following regularinformation What kind of topic (or subtopic) a student has a problemwith At which examination mode (quiz chapter level training or exer-cise section level training or exercise or final examination) a studentis working right now And finally the exact question number and exactproblem formulation This means that a tutor has to ask the same on-boarding questions every time a new conversation starts This is how-ever very time-consuming and could be successfully solved within anIntelligent Process Automation (IPA) system

The work presented in Chapter 3 and Chapter 4 was enabled by thecollaboration with MUMIE and Online Mathematik Bruumlckenkurs (OMB+)team and addresses the abovementioned problem Within Chapter 3we implemented an Intelligent Process Automation (IPA) dialogue sys-tem with rule- and retrieval-based algorithms that interacts with stu-dents and performs the onboarding This system is multipurposeon the one hand it eases the assistance process for human tutors byreducing repetitive and time-consuming activities and therefore al-lows workers to focus on more challenging tasks Second interacting

5Mumie httpswwwmumienet6httpswwwombplusde

[ September 4 2020 at 1017 ndash classicthesis ]

24 summary 19

with students it augments the resources with structured and labeledtraining data The latter in turn allowed us to re-train specific sys-tem components utilizing machine and deep learning methods Wedescribe this procedure in Chapter 4

We report that the earlier collected human-to-human data from theOMB+ platform could not be used for training a machine learning sys-tem since it is highly unstructured and contains no direct question-answer matching which is crucial for trainable conversational algo-rithms We analyzed these conversations to get insight from themthat we used to define the dialogue flow

We also do not consider the automatization of mathematical ques-tions in this work since this task would require not only preciselanguage understanding and extensive commonsense knowledge butalso the accurate perception of mathematical notations The tutor-ing process at OMB+ differentiates a lot from the standard tutoringmethods Here instructors attempt to navigate students by givinghints and tips instead of providing an out-of-the-box solution Exist-ing human-to-human dialogues revealed that such kind of task couldbe even challenging for human tutors since it is not always directlyperceptible what precisely the student does not understand or hasdifficulties with Furthermore dialogues containing mathematical ex-planations could last for a long time and the topic (ie intent andfinal goal) can shift multiple times during the conversation

To our knowledge this task would not be feasible to sufficientlysolve with current machine learning algorithms especially consider-ing the missing training data A rule-based system would not bethe right solution for this purpose as well due to a large number ofcomplex rules which would need to be defined Thus in this workwe focus on the automatization of less cognitively demanding taskswhich are still relevant and must be solved in the first instance to easethe tutoring process for instructors

24 summary

Conversational agents are a highly demanding solution for indus-tries due to their potential ability to support a large number of usersmulti-threaded while performing a flawless natural language conver-sation Many messenger platforms are created and custom chatbotsin various constellations emerge daily Unfortunately most of thecurrent conversational agents are often disappointing to use due tothe lack of substantial natural language understanding and reason-ing Furthermore Natural Language Processing (NLP) and ArtificialIntelligence (AI) techniques have limitations when applied to domain-specific and task-oriented problems especially if no structured train-ing data is available

Despite that conversational agents could be successfully applied tosolve less cognitive but still highly relevant tasks ndash that is within the

[ September 4 2020 at 1017 ndash classicthesis ]

20 foundations

Intelligent Process Automation (IPA) domain Such IPA-bots wouldundertake tedious and time-consuming responsibilities to free up theknowledge worker for more complex and intricate tasks IPA-botscould be seen as a member of goal-oriented dialogue family and arefor the most part performed either in a rule-based manner or throughhybrid models when it comes to a real-world setting

[ September 4 2020 at 1017 ndash classicthesis ]

3M U LT I P U R P O S E I PA V I A C O N V E R S AT I O N A LA S S I S TA N T

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research describedin this chapter was carried out in its entirety by the au-thor of this thesis The other authors of the publicationacted as advisers At the time of writing this thesis thesystem described in this work was deployed at the OnlineMathematik Bruumlckenkurs (OMB+) platform

As we outlined in the introduction Intelligent Process Automa-tion (IPA) is an emerging technology with the primary goal of assist-ing the knowledge worker by taking care of repetitive routine andlow-cognitive tasks Conversational assistants that interact with usersin a natural language are a potential application for the IPA domainSuch smart virtual agents can support the user by responding to var-ious inquiries and performing routine tasks carried out in a naturallanguage (ie customer support)

In this work we tackle the challenge of implementing an IPA con-versational agent in a real-world industrial context and conditions ofmissing training data Our system has two meaningful benefits Firstit decreases monotonous and time-consuming tasks and hence letsworkers concentrate on more intellectual processes Second interact-ing with users it augments the resources with structured and labeledtraining data The latter in turn can be used to retrain a systemutilizing machine and deep learning methods

31 introduction

Conversational dialogue systems can give attention to several userssimultaneously while supporting natural communication They arethus exceptionally needed for practical assistance applications wherea high number of users are involved at the same time Regularly di-alogue agents require a lot of natural language understanding andcommonsense knowledge to hold a conversation reasonably and in-telligently Therefore nowadays it is still challenging to substitutehuman assistance with an intelligent agent completely However adialogue system could be successfully employed as a part of the In-telligent Process Automation (IPA) application In this case it wouldtake care of simple but routine and time-consuming tasks performed

21

[ September 4 2020 at 1017 ndash classicthesis ]

22 multipurpose ipa via conversational assistant

in a natural language while allowing a human worker to focus onmore cognitive demanding processes

Recent research in the dialogue generation domain is conducted byemploying Artificial Intelligence (AI) techniques like machine- anddeep learning (Lowe et al 2017 Wen et al 2016) However thosemethods require high-quality data for training that is often not di-rectly available for domain-specific and industrial problems Eventhough companies gather and store vast volumes of data it is of-ten low-quality with regards to machine learning approaches (Daniel2015) Especially if it concerns dialogue data which has to be prop-erly structured as well annotated Whereas if trained on artificiallycreated datasets (which is often the case in the research setting) thesystems can solve only specific problems and are hardly adaptableto other domains Hence despite the popularity of deep learningend-to-end models one still needs to rely on traditional pipelines inpractical dialogue engineering mainly while introducing a new do-main

311 Outline and Contributions

This work addresses the challenge of implementing a dialogue systemfor IPA purposes within the practical e-learning domain under condi-tions of missing training data The chapter is structured as followsIn Section 32 we introduce our method and describe the design ofthe rule-based dialogue architecture with its main components suchas Natural Language Understanding (NLU) Dialogue Manager (DM)and Natural Language Generation (NLG) units In Section 33 wepresent the results of our dual-focused evaluation and Section 34 de-scribes the format of collected structured data Finally we concludein Section 35

Our contributions within this work are as follows

bull We implemented a robust conversational system for IPA pur-poses within a practical e-learning domain and under the con-ditions of missing training (ie dialogue) data

bull The system has two main objectives

ndash First it reduces repetitive and time-consuming activitiesand allows workers of the e-learning platform to focussolely on mathematical inquiries

ndash Second by interacting with users it augments the resourceswith structured and partially labeled training data for fur-ther implementation of trainable dialogue components

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 23

32 model

The central purpose of the introduced system is to interact with stu-dents at the beginning of every chat conversation and gather informa-tion on the topic (and sub-topic) examination mode and level questionnumber and exact problem formulation Such onboarding process savestime for human tutors and allows them to handle mathematical ques-tions solely Furthermore the system is realized in a way such that itaccumulates labeled and structured dialogues in the background

Figure 4 displays the intact conversation flow After the systemaccepts user input it analyzes it at different levels and extracts infor-mation if such is provided If some of the information required fortutoring is missing the system asks the student to provide it Whenall the information points are collected they are automatically vali-dated and forwarded to a human tutor The latter can then directlyproceed with the assistance process In the following we explain thekey components of the system

321 OMB+ Design

Figure 3 illustrates the internal structure and design of the OMB+ plat-form It has topics and sub-topics as well as four various examinationmodes Each topic (Figure 3 tag 1) correspond to a chapter level and al-ways has sub-topics (Figure 3 tag 2) that correspond to a section levelExamination modes training and exercise are ambiguous because theycorrespond to either a chapter (Figure 3 tag 3) or a section (Figure 3tag 5) level and it is important to differentiate between them sincethey contain different types of content The mode final examination(Figure 3 tag 4) always corresponds to a chapter level whereas quiz(Figure 3 tag 5) can belong only to a section level According to thedesign of the OMB+ platform there are several ways of how a possibledialogue flow can proceed

322 Preprocessing

In a natural language conversation one may respond in various formsmaking the extraction of data from user-generated text challengingdue to potential misspellings or confusable spellings (ie Aufgabe 1aAufgabe 1 (a)) Hence to facilitate a substantial recognition of intentsand extraction of entities we normalize (eg correct misspellingsdetect synonyms) and preprocess every user input before moving for-ward to the Natural Language Understanding (NLU) module

[ September 4 2020 at 1017 ndash classicthesis ]

24 multipurpose ipa via conversational assistant

Figure 3 OMB+ Online Learning Platform where 1 is the Topic (corre-sponds to a chapter level) 2 is a Sub-Topic (corresponds to a sectionlevel) 3 is chapter level examination mode 4 is the Final Exam-ination Mode (available only for chapter level) 5 are the Examina-tion Modes Exercise Training (available at section levels) and Quiz(available only at section level) and 6 is the OMB+ Chat

We perform both linguistic and domain-specific preprocessing whichinclude the following steps

bull lowercasing and stemming of words in the input query

bull removal of stop words and punctuation

bull all mentions of X in mathematical formulations are removed toavoid confusion with roman number 10 (ldquoXrdquo)

bull in a combination of the type word ldquoKapitelAufgaberdquo + digitwritten as a word (ie ldquoeinsrdquo ldquozweitesrdquo etc) word is replacedwith a digit (ldquoKapitel einsrdquo rarr ldquoKapitel 1rdquo ldquoim fuumlnften Kapitelrdquo rarrldquoim Kapitel 5rdquo) roman numbers are replaced with a digit as well(ldquoKapitel IVrdquorarr ldquoKapitel 4rdquo)

bull detected ambiguities are normalized (eg ldquoTrainingsaufgaberdquorarrldquoTrainingrdquo)

bull recognized misspellings resp type errors are corrected (egldquoDifeernzialrechnungrdquorarr ldquoDifferentialrechnungrdquo)

bull permalinks are parsed and analyzed From each permalink it ispossible to extract topic examination mode and question num-ber

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 25

323 Natural Language Understanding

We implement the Natural Language Understanding (NLU) unit uti-lizing handcrafted rules Regular Expressions (RegEx) and Elasctic-search1 API

intent classification According to the design of the OMB+platform all student questions can be classified into three cate-gories organizational questions (ie course certificate) contextualquestions (ie content theorem) and mathematical questions (exer-cises solutions) To classify the input query by its intent (ie cat-egory) we employ weighted key-word information in the formof handcrafted rules We assume that specific words are explic-itly associated with a corresponding intent (eg theorem or rootdenote the mathematical question feedback or certificate denoteorganizational inquire) If no intent could be ordered then itis assumed that the NLU unit was not capable of understand-ing and the intent is unknown In this case the virtual agentrequests the user to provide an intent manually by picking onefrom the mentioned three options The queries from organiza-tional and theoretical categories are directly handed over to ahuman tutor while mathematical questions are analyzed by theautomated agent for further information extraction

entity extraction On the next step the system retrieves the en-tities from a user message We specify five following prerequi-site entities which have to be collected before the dialogue canbe handed over to a human tutor topic sub-topic examina-tion mode and level and question number This part is imple-mented using ElasticSearch (ES) and RegEx To facilitate the useof ES we indexed the OMB+ web-page to an internal databaseBesides indexing the titles of topics and sub-topics we also pro-vided supplementary information on possible synonyms andwriting styles We additionally filed OMB+ permalinks whichdirect to the site pages To query the resulting database we em-ploy the internal Elasticsearch multi match function and set theminimum should match parameter to 20 This parameter spec-ifies the number of terms that must match for a document tobe considered relevant Besides that we adopted fuzziness withthe maximum edit distance set to 2 characters The fuzzy queryuses similarity based on Levenshtein edit distance (Levenshtein1966) Finally the system produces a ranked list of potentialmatching entries found in the database within the predefinedrelevance threshold (we set it to θ=15) We then pick the mostprobable entry as the right one and select the correspondingentity from the user input

Overall the Natural Language Understanding (NLU) module re-ceives the user input as a preprocessed text and examines it across all

1httpswwwelasticcoproductselasticsearch

[ September 4 2020 at 1017 ndash classicthesis ]

26 multipurpose ipa via conversational assistant

predefined RegEx statements and for a match in the ElasticSearch (ES)database We use ES to retrieve entities for the topic sub-topic andexamination mode whereas RegEx is used to extract the informationon a question number Every time the entity is extracted it is filled inthe Informational Dictionary (ID) The ID has the following six slotsto be filled in topic sub-topic examination level examination modequestion number and exact problem formulation (this information isrequested on further steps)

324 Dialogue Manager

A conventional Dialogue Manager (DM) consists of two interactingcomponents The first component is the Dialogue State Tracker (DST)that maintains a representation of the current conversational stateand the second is the Policy Learning (PL) that defines the next systemaction (ie response)

In our model every agentrsquos next action is determined by the stateof the previously obtained information accumulated in the Informa-tional Dictionary (ID) For instance if the system recognizes that thestudent works on the final examination it also knows (defined by hand-coded logic in the predefined rules) that there is no necessity to askfor sub-topic because the final examination always corresponds to achapter level (due to the layout of OMB+ platform) If the system rec-ognizes that the user struggles in solving quiz it has to ask for boththe corresponding topic and sub-topic because the quiz always relatesto a section level

To discover all of the potential conversational flows we implementMutually Exclusive Rules (MER) which indicate that two events e1and e2 are mutually exclusive or disjoint since they cannot both oc-cur at the same time (ie the intersection of these events is emptyP(A cap B) = 0) Additionally we defined transition and mapping rulesHence only particular entities can co-occur while others must beexcluded from the specific rule combination Assume a topic t =

Geometry According to MER this topic can co-occur with one of thepertaining sub-topics defined at the OMB+ (ie s = Angel) Thus thelevel of the examination would be l = Section (but l 6= Chapter)because the student already reported a sub-section he or she is work-ing on In this specific case the examination mode could either bee isin [TrainingExerciseQuiz] but not e = Final_ExaminationThis is because e = Final_Examination can co-occur only with atopic (chapter-level) but not with a sub-topic (section-level) The ques-tion number q could be arbitrary and has to co-occur with every othercombination

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 27

This allows the formal solution to be found Assume a list of alltheoretically possible dialogue states S = [topic sub-topic trainingexercise chapter level section level quiz final examination questionnumber] and for each element sn in S is true that

sn =

1 if information is given

0 otherwise(1)

This gives all general (resp possible) dialogue states without ref-erence to the design of the OMB+ platform However to make thedialogue states fully suitable for the OMB+ from the general stateswe select only those which are valid To define the validness of thestate we specify the following five MERrsquos

R1 Topic

not T T

Table 2 Rule 1 ndash Admissible topic configurations

Rule (R1) in Table 2 denotes admissible configurations for topic andmeans that the topic could be either given (T ) or not (notT )

R2 Examination Mode

not TR not E not Q not FE

TR not E not Q not FE

not TR E not Q not FE

not TR not E Q not FE

not TR not E not Q FE

Table 3 Rule 2 ndash Admissible examination mode configurations

Rule (R2) in Table 3 denotes that either no information on the ex-amination mode is given or examination mode is Training (TR) orExercise (E) or Quiz (Q) or Final Examination (FE) but not more thanone mode at the same time

R3 Level

not CL not SL

CL not SL

not CL SL

Table 4 Rule 3 ndash Admissible level configurations

Rule (R3) in Table 4 indicates that either no level information isprovided or the level corresponds to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

[ September 4 2020 at 1017 ndash classicthesis ]

28 multipurpose ipa via conversational assistant

R4 Examination amp Level

not TR not E not SL not CL

TR not E SL not CL

TR not E not SL CL

TR not E not SL not CL

not TR E SL not CL

not TR E not SL CL

not TR E not SL not CL

Table 5 Rule 4 ndash Admissible examination mode (only for Training and Exer-cise) and corresponding level configurations

Rule (R4) in Table 5 means that Training (TR) and Exercise (E) ex-amination modes can either belong to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

R5 Topic amp Sub-Topic

not T not ST

T not ST

not T ST

T ST

Table 6 Rule 5 ndash Admissible topic and corresponding sub-topic configura-tions

Rule (R5) in Table 6 symbolizes that we could be either given onlya topic (T ) or the combination of topic and sub-topic (ST ) at the sametime or only sub-topic or no information on this point at all

We then define a valid dialogue state as a dialogue state that meetsall requirements of the abovementioned rules

foralls(State(s)and R1minus5(s))rarr Valid(s) (2)

Once we have the valid states for dialogues we perform a mappingfrom each valid dialogue state to the next possible systems action Forthat we first define five transition rules The order of transition rulesis important

T1 = notT (3)

means that no topic (T ) is found in the ID (ie could not be ex-tracted from user input)

T2 = notEM (4)

indicates that no examination mode (EM) is found in the ID

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 29

T3 = EM where EM isin [TRE] (5)

denotes that the extracted examination mode (EM) is either Train-ing (TR) or Exercise (E)

T4 =

T notST TR SL

T notST E SL

T notST Q

(6)

means that no sub-topic (ST ) is provided by a user but ID either al-ready contains the combination of topic (T ) training (TR) and sectionlevel (SL) or the combination of topic exercise (E) and section levelor the combination of topic and quiz (Q)

T5 = notQNR (7)

indicates that no question number (QNR) was provided by a stu-dent (or could not be successfully extracted)

Finally we assumed the list of possible next actions for the system

A =

Ask for Topic

Ask for Examination Mode

Ask for Level

Ask for Sub-Topic

Ask for Question Nr

(8)

Following the transition rules we map each valid dialogue state tothe possible next action am in A

existaforalls(Valid(s)and T1(s))rarr a(s T) (9)

in the case where we do not have any topic provided the nextaction is to ask for the topic (T )

existaforalls(Valid(s)and T2(s))rarr a(s EM) (10)

if no examination mode is provided by a user (or it could not besuccessfully extracted from the user query) the next action is definedas ask for examination mode (EM)

existaforalls(Valid(s)and T3(s))rarr a(s L) (11)

in case where we know the examination mode EM isin [TrainingExercise] we have to ask about the level (ie training at chapter levelor training at section level) thus the next action is ask for level (L)

[ September 4 2020 at 1017 ndash classicthesis ]

30 multipurpose ipa via conversational assistant

existaforalls(Valid(s)and T4(s))rarr a(s ST) (12)

if no sub-topic is provided but the examination mode EM isin [Train-ing Exercise] at section level the next action is defined as ask for sub-topic (ST )

existaforalls(Valid(s)and T5(s))rarr a(s QNR) (13)

if no question number is provided by a user then the next action isask for question number (QNR)

Following this scheme we generate 56 state transitions which spec-ify the next system actions Being on a new conversation state wecompare the extracted (or updated) data in the Informational Dictio-nary (ID) with the valid dialogue states and select the mapped actionas the next systemrsquos action

flexibility of dst The DST transitions outlined above are meantto maintain the current configuration of the OMB+ learning platformStill further MERs could be effortlessly generated to create new tran-sitions Examining this we performed tests with the previous de-sign of the platform where all the topics except for the first onehad sub-topics Whereas the first topic contained both sub-topics andsub-sub-topics We could produce the missing transitions with the de-scribed approach The number of permissible transitions in this caseincreased from 56 to 117

325 Meta Policy

Since the proposed system is expected to operate in a real-world set-ting we had to develop additional policies that manage the dialogueflow and verify the systemrsquos accuracy We explain these policies be-low

completeness of informational dictionary The model au-tomatically verifies the completeness of the Informational Dic-tionary (ID) which is determined by the number of essentialslots filled in the informational dictionary There are 6 discretecases when the ID is deemed to be complete For example ifa user works on a final examination mode the agent does nothas to request a sub-topic or examination level Hence the ID

has to be filled only with data for a topic examination typeand question number Whereas if the user works on trainingmode the system has to collect information about the topic sub-topic examination level examination mode and the questionnumber Once the ID is intact it is provided to the verificationstep Otherwise the agent proceeds according to the next ac-tion The system extracts the information in each dialogue stateand therefore if the user gives updated information on any sub-

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 31

ject later in the dialogue history the corresponding slot will beupdated in the ID

Below are examples of two final cases (out of six) where Infor-mational Dictionary (ID) is considered to be complete

Case1 =

intent = Math

topic 6= None

exam mode = Final Examination

level = None

question nr 6= None

(14)

Table 7 Case 1 ndash Any topic examination mode is final examination exami-nation level does not matter any question number

Case2 =

intent = Math

topic 6= None

sub-topic 6= None

exam mode isin [Training Exercise]

level = Section

question nr 6= None

(15)

Table 8 Case 2 Any topic any related sub-topic examination mode is ei-ther training or exercise examination level is section any questionnumber

verification step After the system has collected all the essentialdata (ie ID is complete) it continues to the final verification stepHere the obtained data is presented to the current student inthe dialogue session The student is asked to review the exactnessof the data and if any entries are faulty to correct them TheInformational Dictionary (ID) is where necessary updated withthe user-provided data This procedure repeats until the userconfirms the correctness of the compiled data

fallback policy In the cases where the system fails to retrieveinformation from a studentrsquos query it re-asks a user and at-tempts to retrieve information from a follow-up response Themaximum number of re-ask trials is a parameter and set to threetimes (r = 3) If the system is still incapable of extracting in-formation the user input is considered as the ground truth andfilled to the appropriate slot in ID An exception to this ruleapplies where the user has to specify the intent manually Afterthree unclassified trials a dialogues session is directly handedover to a human tutor

[ September 4 2020 at 1017 ndash classicthesis ]

32 multipurpose ipa via conversational assistant

human request In every dialogue state a user can shift to a hu-man tutor For this a user can enter the ldquohumanrdquo (or ldquoMenschrdquoin German language) key-word Therefore every user messageis investigated for the presence of this key-word

326 Response Generation

The semantic representation of the systems next action is transformedinto a natural language utterance Hence each possible action ismapped to precisely one response that is predefined in the templateSome of the responses are fixed (ie ldquoWelches Kapitel bearbeitest dugeraderdquo) 2 Others have placeholders for custom values In the lattercase the utterance can be formulated dependent on the InformationalDictionary (ID)

Assume predefined response ldquoDeine Angaben waren wie folgt a)Kapitel K b) Abschnitt A c) Aufgabentyp M d) Aufgabe Q e)Ebene E Habe ich dich richtig verstanden Bitte antworte mit Ja oderNeinrdquo3 The slots [KAMQE] are the placeholders for the informa-tion stored in the ID During the conversation the system selects thetemplate according to the next action and fills the slots with extractedvalues if required After the extracted information is filled in the cor-responding placeholders in the utterance the final response could beas follows ldquoDeine Angaben waren wie folgt a) Kapitel I ElementaresRechnen b) Abschnitt Rechenregel fuumlr Potenzen c) Aufgabentyp Quiz d)Aufgabe 1(a) e) Ebene Abschnitt Habe ich dich richtig verstanden Bitteantworte mit Ja oder Neinrdquo4

Below we demonstrate five sample dialogues in original Germanlanguage with variable flows

bull short dialogue where certain information is already providedin the first studentrsquos input

bull organizational question case

bull contextual question case

bull example with the manual intent classification

bull example with fallback and human policies

bull long flow with entry correction on the final verification phase

2Translates to ldquoWhich chapter are you currently working onrdquo3Translates to ldquoYour details were as follows a) Chapter b) Section c) Examina-

tion Mode d) Question Number e) Level Did I understand you correctly Pleaseanswer yes or no

4Translates to ldquoYour details were as follows a) Elementary Calculus b) PowerLaws c) Quiz d) Question 1 (a) e) Section Level Did I understand you correctlyPlease answer yes or no

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 33

Student Hallo ich habe Schwierigkeiten beim Kapitel I Elementares Rech-nen mit der Uumlbungsaufgabe 1 a

Next-Action [Ask for a Level Chapter or Section]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel ldquoGeometrierdquo) oder um eine Aufgabe auf der Ebene einesAbschnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt ldquoWinkelrdquoim Kapitel ldquoGeometrierdquo) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlrAbschnitt

Student Kapitel

Next-Action [Final-Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Aufgabentyp Exercisec) Aufgabe 1ad) Ebene KapitelHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yep

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 9 Showcase 1 Short flow most of the information was extracted fromthe initial user input In bold are the extracted entities

Student Hallo ich habe eine Frage bzgl eines Zertifikates und zwar wiekann ich es austellen lassen

Next-Action [Organisational Question]Bot Es scheint sich wohl um eine organisatorische Frage zu handeln Bittefasse deine Frage in einem kurzen Text zusammen damit ich sie an meinenmenschlichen Kollegen weiterleiten kann

Student Wie bekomme ich ein Zertifikat

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 10 Showcase 2 Organisational Question Underlined are the key-words which point on the organisational intent

[ September 4 2020 at 1017 ndash classicthesis ]

34 multipurpose ipa via conversational assistant

Student hallo

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student ma

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student math

Next-Action [Ask for Topic]Bot Welches Kapitel bearbeitest du gerade

Student Lineare Gleichungssysteme

Next-Action [Ask for Examination Mode]Bot Bearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfung

Student human

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 11 Showcase 3 Manual intent classification Fallback and Human Re-quest policies (in bold type)

Student Halli-Hallo koumlnnte mir bitte jemand die Regel 72 (vertikale streck-ung und stauchung des graphen) erklaumlren Danke im Voraus

Next-Action [Contextual Question]Bot Es scheint sich wohl um eine inhaltliche Frage zu handeln Bitte fassedeine Frage in einem kurzen Text zusammen damit ich sie an meinen men-schlichen Kollegen weiterleiten kann

Student Ich habe Schwierigkeiten beim Verstehen der Regel 72 Was istgenau die vertikale Streckung des Graphen

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 12 Showcase 4 Contextual Question Underlined are the key-wordswhich point on the contextual intent

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 35

Student Hallo ich kann beim Kapitel 1 nicht weiterkommen Trainingsauf-gabe 1 (a)

Next-Action [Ask for Level]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel Geometrie) oder um eine Aufgabe auf der Ebene eines Ab-schnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt Winkel imKapitel Geometrie) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlr Ab-schnitt

Student ich glaube es ist eine Sektion

Next-Action [Ask for Sub-Topic]Bot Wie heiszligt der Abschnitt in dem sich die Aufgabe befindet auf die sichdeine Frage bezieht

Student ich bearbeite rechenregeln und potenzen

Next-Action [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Quizd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student nope

Next-Action [Ask_For]Bot Welche der folgenden Punkten habe ich falsch erkannt Schreibe bittewelche der folgenden Punkte es sind a b c d e

Student c

State-Transition [Correct-Fields]Bot Gib bitte die korrekte Information fuumlr c) ein

Student Training

State-Transition [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Trainingd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yes

State-Transition [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 13 Showcase 5 Long Flow Correction of entries In bold type are theextracted entities

[ September 4 2020 at 1017 ndash classicthesis ]

36 multipurpose ipa via conversational assistant

Figure 4 Dialogue Flow Abbreviations UNK - unknown ID - informa-tional dictionary RegEx - regular expressions

[ September 4 2020 at 1017 ndash classicthesis ]

33 evaluation 37

33 evaluation

In order to get the feedback on the quality functionality and use-fulness of the proposed model we evaluated it in two steps firstwith an automated method (Section 331) utilizing 130 manually an-notated conversations to verify the robustness of the system Secondwith the help of human tutors (Section 332) from OMB+ to exam-ine the user experience We describe the details as well as the mostcommon errors below

331 Automated Evaluation

To manage an automated evaluation we manually assembled a datasetwith 130 conversational flows We predefined viable initial user ques-tions user answers as well as gold system responses These flowscover most frequent dialogues that we previously observed in human-to-human dialogue data We examined our system by implementinga self-chatting evaluation bot The evaluation cycle is as follows

bull The system accepts the first question from a predefined dia-logue via an API request preprocesses and analyzes it to de-termine the intent and extract entities

bull Then it estimates the appropriate next action and responds ac-cordingly

bull This response is then compared to the systems gold answerIf the predicted result is correct then the system receives thenext predefined user input from the templated dialogue and re-sponds again as defined above This procedure proceeds untilthe dialogue terminates (ie ID is complete) Otherwise thesystem reports an unsuccessful case number

The system successfully passed this evaluation for all 130 cases

332 Human Evaluation amp Error Analysis

To evaluate the system on irregular cases we carried out experimentswith human tutors from the OMB+ platform The tutors are experi-enced regarding rare or complex issues ambiguous responses mis-spellings and other infrequent but still relevant problems which oc-cur during a natural language dialogue In the following we reviewsome common errors and describe additional observations

misspellings and confusable spelling frequently befall in theuser-generated text Since we attempt to let the conversationremain natural to provide better user experience we do notinstruct students for formal writing Therefore we have to ac-count for various writing issues One of the prevalent problems

[ September 4 2020 at 1017 ndash classicthesis ]

38 multipurpose ipa via conversational assistant

are misspellings German words are commonly long and canbe complex and because users often type quickly it can lead tothe wrong order of characters within a given word To tacklethis challenge we utilized fuzzy match within ES Though themaximum allowed edit distance in ElasticSearch (ES) is set to 2characters which means that all the misspellings beyond thisthreshold could not be correctly identified by ES (eg Differenzial-rechnung vs Differnetialrechnung) Another representative exam-ple would be the formulation of the section or question num-ber The equivalent information can be communicated in sev-eral discrete ways which has to be considered by RegEx unit(eg Aufgabe 5 a Aufgabe V a Aufgabe 5 (a)) A similar prob-lem occurs with confusable spelling (ie Differenzialrechnung vsDifferenzialgleichung)

We analyzed the cases stated above and improved our modelby adding some of the most common misspellings to the Elas-ticSearch (ES) database or handling them with RegEx during thepreprocessing step

elasticsearch threshold We observed both cases where thesystem failed to extract information although the user providedit and examples where ES extracted information not mentionedin a user query According to our understanding these errorsoccur due to the relevancy scoring algorithm of ElasticSearch (ES)where a documentrsquos score is a combination of textual similarityand other metadata based scores Our examination revealedthat ES mostly fails to extract the information if the user mes-sage is rather short (eg about 5 tokens) To overcome this diffi-culty we coupled the current input ut with the dialogue history(h = utmiddotmiddotmiddot tminus2) That eliminated the problem and improved theretrieval quality

To solve the opposite problem where ES extracts false infor-mation (or information that was not mentioned in a query)was more challenging We learned that the problem comesfrom short words or sub-words (ie suffix prefix) which ES

locates in a database and considers them credible enough TheES documentation suggests getting rid of stop words to elim-inate this behavior However this did not improve the searchAlso fine-tuning of ES parameters such as the relevance thresholdprefix length5 and minimum should match6 parameter did not re-sult in notable improvements To cope with this problem weimplemented a verification step where a user can edit the erro-neously retrieved data

5The amount of fuzzified characters This parameter helps to decrease the num-ber of terms which must be reviewed

6Indicates a number of terms that must match for a document to be consideredrelevant

[ September 4 2020 at 1017 ndash classicthesis ]

34 structured dialogue acquisition 39

34 structured dialogue acquisition

As we stated the implemented system not only supports the humantutor by assisting students but also collects structured and labeledtraining data in the background In a trial run of the rule-based sys-tem we accumulated a toy-dataset with dialogues The assembleddataset includes the following

bull Plain dialogues with unique dialogue-indexes

bull Plain ID information collected for the whole conversation

bull Pairs of questions (ie user requests) and responses (ie botreplies) with the unique dialogue- and turn-indexes

bull Triples in the form of (Question Next Action Response) Informa-tion on the next systemrsquos action could be employed to train aDM unit with (deep-) machine learning algorithms

bull On every conversational state we keep the entities that the sys-tem was able to extract along with their position in the utter-ance This information could be used to train a custom domain-specific NER model

35 summary

In this work we realized a conversational agent for IPA purposesthat addresses two problems First it decreases repetitive and time-consuming activities which allows workers of the e-learning plat-form to focus solely on mathematical and hence cognitively demand-ing questions Second by interacting with users it augments theresources with structured and labeled training data The latter canbe utilized for further implementation of learnable dialogue compo-nents

The realization of such a system was connected with multiple chal-lenges Among others were missing structured conversational dataambiguous or erroneous user-generated text and the necessity todeal with existing corporate tools and their design

The proposed conversational agent enabled us to accumulate struc-tured labeled data without any special efforts from the human (ietutors) side (eg manual annotation and post-processing of existingdata change of the conversational structure within the tutoring pro-cess) Once we collected structured dialogues we were able to re-train particular segments of the system with deep learning methodsand achieved consistent performance for both of the proposed tasksWe report the re-implemented elements in Chapter 4

At the time of writing this thesis the system described in this workwas deployed at the OMB+ platform

[ September 4 2020 at 1017 ndash classicthesis ]

40 multipurpose ipa via conversational assistant

36 future work

Possible extensions of our work are

bull In this work we implemented a rule-based dialogue modelfrom scratch and adjusted it for the specific domain (ie e-learning) The core of the model ndash is a dialogue manager thatdetermines the current state of the conversation and the possi-ble next action Rule-based systems are generally consideredto be hardly adaptable to new domains however our dialoguemanager proved to be flexible to slight modifications in a work-flow One of the possible directions of future work would bethus the investigation of the general adaptability of the dialoguemanager core to other scenarios and domains (eg differentcourse)

bull A further possible extension would we the introduction of dif-ferent languages The current model is implemented for the Ger-man language but the OMB+ platform provides this course alsoin English and Chinese Therefore it would be interesting toinvestigate whether it is possible to adjust the existing systemto other languages without drastic changes in the model

[ September 4 2020 at 1017 ndash classicthesis ]

4D E E P L E A R N I N G R E - I M P L E M E N TAT I O N O F U N I T S

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research described inthis chapter was carried out in its entirety by the authorof this thesis The other authors of the publication actedas advisors

In this chapter we address the challenge of re-implementation oftwo central components of our IPA dialogue system described in Chap-ter 3 by employing deep learning techniques Since the data for thetraining was collected in a (short) trial-run of the rule-based systemit is not sufficient to train a neural network from scratch Thereforewe utilize Transfer Learning (TL) methods and employ the previouslypre-trained Bidirectional Encoder Representations from Transform-ers (BERT) model that we fine-tune for two of our custom tasks

41 introduction

Data mining and machine learning technologies have gained notableadvancement in many research fields including but not limited tocomputer vision robotics advanced analytics and computational lin-guistics (Pan and Yang 2009 Trautmann et al 2019 Werner et al2015) Besides that in recent years deep neural networks with theirability to accurately map from inputs to outputs (ie labels imagessentences) while analyzing patterns in data became a potential solu-tion for many industrial automatization tasks (eg fake news detec-tion sentiment analysis)

Nevertheless conventional supervised models still have limitationswhen applied to real-world data and industrial tasks The first chal-lenge here is a training phase since a robust model requires an im-mense amount of structured and labeled data Still there are just afew cases where such data is publicly available (eg research datasets)but for most of the tasks domains and languages the data has tobe gathered over a long time The second hurdle is that if trainedon existing data from a distinct domain a supervised model cannotgeneralize to conditions that are different from those faced in thetraining dataset Most of the machine learning methods perform cor-rectly only under a general assumption the training and test dataare mapped to the same feature space and the same distribution (Panand Yang 2009) When the distribution changes most statistical mod-els need to be retrained from scratch using newly obtained training

41

[ September 4 2020 at 1017 ndash classicthesis ]

42 deep learning re-implementation of units

samples It is especially crucial if applying such approaches to real-world data that is usually very noisy and contains a large numberof scenarios In this case a model would be ill-trained to make deci-sions about novel patterns Furthermore for most of the real-worldapplications it is expensive or even not feasible at all to recollect therequired training data to retrain the model Hence the possibility totransfer the knowledge obtained during the training on existing datato the new unseen domain is one of the possible solutions for indus-trial problems Such a technique is called Transfer Learning (TL) andit allows dealing with the abovementioned scenarios by leveragingexisting labeled data of some related tasks or domains

411 Outline and Contributions

This work addresses the challenge of re-implementing two out ofthree central components in our rule-based IPA conversational assis-tant with deep learning methods As we discussed in Chapter 2 hy-brid dialogue models (ie consisting of both rule-based and trainablecomponents) appear to replace the established rule- and template-based methods which are to the most part employed in the in-dustrial setting To investigate this assumption and examine currentstate-of-the-art deep learning techniques we conducted experimentson our custom domain-specific data Since the available data werecollected in a trial-run and the obtained dataset was relatively smallto train a conventional machine learning model from scratch we uti-lized the Transfer Learning (TL) approach and fine-tuned the existingpre-trained model for our target domain

For the experiments we defined two tasks

bull First we considered the NER problem in a custom domain set-ting We defined a sequence labeling task and employed a Bidirec-tional Encoder Representations from Transformers (BERT) (De-vlin et al 2018) as one of the transfer learning methods widelyutilized in the NLP area We applied the model on our datasetand fine-tuned it for seven (7) domain-specific (ie e-learning)entities

bull Second we examined the effectiveness of transfer learning forthe dialogue manager core The DM is one of the fundamentalparts of the conversational agent and requires compelling learn-ing techniques to hold the conversation intelligently For thatexperiment we defined a classification task and applied the BERT

model to predict the systems next action for every given userinput in a conversation We then computed the macro F1-scorefor 14 possible classes (ie actions) and an average dialogueaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

42 related work 43

The main characteristics of both settings are

bull the set of named entities and labels (ie actions) between sourceand target domain is different

bull for both settings we utilized the BERT model in German andmultilingual versions with additional parameters

bull the dataset used to train the target model is small and consistsof highly domain-specific labels The latter is often appearingin industrial scenarios

We finally verified that the selected method performed reasonablywell on both tasks We reached the performance of 093 macro F1

points for Named Entity Recognition (NER) and 075 macro F1 pointsfor the Next Action Prediction (NAP) task Therefore we concludethat both NER and NAP components could be employed to substituteor extend the existing rule-based modules

42 related work

As discussed in the previous section the main benefit of TransferLearning (TL) is that it allows the domains and tasks used duringtraining and testing phases to be different Initial steps in the for-mulation of transfer learning were made after NeurIPS-951 workshopthat was focused on the demand for lifelong machine-learning meth-ods that preserve and reuse previously learned knowledge (Chen etal 2018) Research on transfer learning has attracted more and moreattention since then especially in the computer vision domain andin particular for image recognition tasks (Donahue et al 2014 SharifRazavian et al 2014)

Recent studies have shown that TL can also be successfully appliedwithin the Natural Language Processing (NLP) domain especially onsemantically equivalent tasks (Dai et al 2019 Devlin et al 2018Mou et al 2016) Several research works were carried out specificallyon the Named Entity Recognition (NER) task and explored the capa-bilities of transfer learning when applied to different named entitycategories (ie different output spaces) For instance Qu et al (2016)pre-trained a linear-chain Conditional Random Fields (CRF) on a largeamount of labeled data in the source domain Then they introduceda two linear layer neural network that learns the difference betweenthe source and target label distributions Finally the authors initial-ized another CRF with parameters learned in linear layers to predictthe labels of the target domain Kim et al (2015) conducted experi-ments with transferring features and model parameters between re-lated domains where the label classes are different but may have asemantic similarity Their primary approach was to construct labelembeddings to automatically map the source and target label classesto improve the transfer (Chen et al 2018)

1Conference on Neural Information Processing Systems Formerly NIPS

[ September 4 2020 at 1017 ndash classicthesis ]

44 deep learning re-implementation of units

However there are also models trained in a manner such that theycould be applied to multiple NLP tasks simultaneously Among sucharchitectures are ULMFit (Howard and Ruder 2018) BERT (Devlin etal 2018) GPT2 (Radford et al 2019) and TransformerXL (Dai et al2019) ndash powerful models that were recently proposed for transferlearning within the NLP domain These architectures are called lan-guage models since they attempt to learn language structure from largetextual collections in an unsupervised manner and then predict thenext word in a sequence based on previously observed context wordsThe Bidirectional Encoder Representations from Transformers (BERT)(Devlin et al 2018) model is one of the most popular architectureamong the abovementioned and it is also one of the first that showeda close to human-level performance on the General Language Un-derstanding Evaluation (GLUE) benchmark2 (Wang et al 2018) thatincludes tasks on sentiment analysis question answering semanticsimilarity and language inference The architecture of BERT consistsof stacked transformer blocks that are pre-trained on a large corpusconsisting of 800M words from modern English books (ie BooksCor-pus by Zhu et al) and 2 500M words of text from pre-processed (iewithout markup) English Wikipedia articles Overall BERT advancedthe state-of-the-art performance for eleven (11) NLP tasks We describethis approach in detail in the subsequent section

43 bert bidirectional encoder representations from

transformers

Bidirectional Encoder Representations from Transformers (BERT) ar-chitecture is divided into two steps pre-training and fine-tuning Thepre-training step was enabled through two following tasks

bull Masked Language Model ndash the idea behind this approach is totrain a deep bidirectional representation that is different fromconventional language models which are either trained left-to-right or right-to-left To train a bidirectional model the authorsmask out a certain number of tokens in a sequence The modelis then asked to predict the original words for masked tokens Itis essential that in contrast to denoising auto-encoders (Vincentet al 2008) the model does not need to reconstruct the entireinput instead it attempts to predict the masked words Sincethe model does not know which words exactly it will be askedabout it learns a representation for every token in a sequence(Devlin et al 2018)

bull Next Sentence Prediction ndash aims to capture relationships be-tween two sentences A and B In order to train the model theauthors pre-trained a binarized next sentence prediction taskwhere pairs of sequences were labeled according to the criteriaA and B either follow each other directly in the corpus (label

2httpsgluebenchmarkcom

[ September 4 2020 at 1017 ndash classicthesis ]

43 bert bidirectional encoder representations from transformers 45

IsNext) or are both taken from random places (label NotNext)The model is then asked to predict whether the former or thelatter is the case The authors demonstrated that pre-trainingtowards this task is extremely useful for both question answeringand natural language inference tasks (Devlin et al 2018)

Furthermore BERT uses WordPiece tokenization for embeddings(Wu et al 2016) which segments words into subword-level tokensThis approach allows the model to make additional inferences basedon word structure (ie -ing ending denotes similar grammatical cat-egories)

The fine-tuning step follows the pre-training process For that theBERT model is initialized with parameters obtained during the pre-training step and a task-specific fully-connected layer is used aftertransformer blocks which maps the general representations to cus-tom labels (employing softmax operation)

self-attention The key operation of BERT architecture is theself-attention which is a sequence-to-sequence operation with conven-tional encoder-decoder structure (Vaswani et al 2017) The encodermaps an input sequence (x1 x2 xn) to a sequence of continuousrepresentations z = (z1 z2 zn) Given z the decoder then gener-ates an output sequence (y1y2 ym) of symbols one element ata time (Vaswani et al 2017) To produce output vector yi the selfattention operation takes a weighted average over all input vectors

yi =sumj

wijxj (16)

where j is an index over the entire sequence and the weights sumto one (1) over all j The weight wij is derived from a dot productfunction over xi and xj (Bloem 2019 Vaswani et al 2017)

w primeij = xTi xj (17)

The dot product returns a value between negative and positive in-finity and a softmax maps these values to [0 1] to ensure that theysum to one (1) over the entire sequence (Bloem 2019 Vaswani et al2017)

wij =expwlsquoijsumj expwlsquoij

(18)

A self-attention is not the only operation in transformers but is theessential one since it propagates information between vectors Otheroperations in the transformer are applied to every vector in the inputsequence without interactions between vectors (Bloem 2019 Vaswaniet al 2017)

[ September 4 2020 at 1017 ndash classicthesis ]

46 deep learning re-implementation of units

44 data amp descriptive statistics

The benchmark that we accumulated through the trial-run includes300 structured and partially labeled dialogues with the average lengthof dialogue being six (6) utterances All the conversations were heldin the German language Detailed general statistics can be found inTable 14

Value

Max Len Dialogue (in utterances) 15

Avg Len Dialogue (in utterances) 6

Max Len Utterance (in tokens) 100

Avg Len Utterance (in tokens) 9

Overall Unique Action Labels 13

Overall Unique Entity Labels 7

Train ndash Dialogues ( Utterances) 200 (1161)

Eval ndash Dialogues ( Utterances) 50 (279)

Test ndash Dialogues ( Utterances) 50 (300)

Table 14 General statistics for conversational dataset

The dataset covers 14 unique next actions and seven (7) uniquenamed entities Detailed information on next actions and entities canbe found in Table 15 and Table 16 Below we listed possible actionswith the corresponding explanation and predefined mapped state-ments

unk ndash requests to define the intent of the question mathematicalorganizational or contextual (ie fallback policy in case if theautomated intent recognition unit fails) rarr ldquoHast du eine Fragezu einer Aufgabe (MATH) zu einem Text im Kurs (TEXT) oder eineorganisatorische Frage (ORG)rdquo

org ndash handovers discussion to a human tutor in case of organiza-tional question rarr ldquoEs scheint sich wohl um eine organisatorischeFrage zu handeln Bitte fasse deine Frage in einem kurzen Text zusam-men damit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

text ndash handovers discussion to a human tutor in case of contextualquestion rarr ldquoEs scheint sich wohl um eine inhaltliche Frage zuhandeln Bitte fasse deine Frage in einem kurzen Text zusammendamit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

[ September 4 2020 at 1017 ndash classicthesis ]

44 data amp descriptive statistics 47

topic ndash requests information on the topic rarr ldquoWelches Kapitel bear-beitest du gerade Antworte bitte mit der Kapitelnummer zB IA IB II oder mit dem Kapitelnamen zB Geometrierdquo

examination ndash requests information about the examination moderarrldquoBearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfungrdquo

subtopic ndash requests information about the subtopic in case of quizor section-level trainingexercise modes rarr ldquoWie heiszligt der Ab-schnitt in dem sich die Aufgabe befindet auf die sich deine Fragebeziehtrdquo

level ndash requests information about the level of examination modechapter or section rarr ldquoGeht es um eine Aufgabe auf Kapitel-Ebene(zB um eine Trainingsaufgabe im Kapitel Geometrie) oder um eineAufgabe auf der Ebene eines Abschnitts in einem Kapitel (zB umeine Aufgabe zum Abschnitt Winkel im Kapitel Geometrie) Antwortebitte mit KAP fuumlr Kapitel und SEK fuumlr Abschnittrdquo

question number ndash requests information about the question num-ber rarr ldquoWelche Aufgabe bearbeitest du gerade Wenn du zB beiTeilaufgabe c in der zweiten Trainingsaufgabe (oder der zweiten Uumlbungs-aufgabe) bist antworte mit 2c Bearbeitest du gerade einen Quiz odereine Schlusspruumlfung gib bitte nur die Teilaufgabe anrdquo

final request ndash considers the Informational Dictionary (ID) to becomplete and initiates the verification step rarr ldquoDeine Angabenwaren wie folgt Habe ich dich richtig verstanden Bitte antwortemit Ja oder Neinrdquo

verify request ndash requests the student to verify the assembled in-formation rarr ldquoWelche der folgenden Punkte habe ich falsch erkanntSchreibe bitte welche der folgenden Punkte es sind [a b c ]3 rdquo

correct request ndash requests the student to provide a correct infor-mation if there is an error in the assembled data rarr ldquoGib bittedie korrekte Information fuumlr [a]4 einrdquo

exact question ndash requests the student to provide the detailed ex-planation of the problem rarr ldquoBitte fasse deine Frage in einemkurzen Text zusammen damit ich sie an meinen menschlichen Kolle-gen weiterleiten kannrdquo

human handover ndash finalizes the conversation rarr ldquoVielen Dankunsere menschlichen Kollegen melden sich gleich bei dirrdquo

3Variables that depend on the state of Informational Dictionary (ID)4Variable that was selected as erroneous and needs to be corrected

[ September 4 2020 at 1017 ndash classicthesis ]

48 deep learning re-implementation of units

Action Counts Action Counts

Final Request 321 Unk 80

Human Handover 300 Subtopic 55

Exact Question 286 Correct Request 40

Question Number 176 Verify Request 34

Examination 175 Org 17

Topic 137 Text 13

Level 130

Table 15 Detailed statistics on possible systems actions Columns Countsdenote the number of occurrences of each action in the entiredataset

Entity Counts

Question Nr 317

Chapter 311

Examination 303

Subtopic 198

Level 80

Intent 70

Table 16 Detailed statistics on possible named entities Column Counts de-note the number of occurrences of each entity in the entire dataset

45 named entity recognition

We defined a sequence labeling task to extract custom entities fromuser input We considered seven (7) viable entities (see Table 16) tobe recognized by the model topic subtopic examination mode and levelquestion number intent as well as the entity other for remaining wordsin the utterance Since the data collected from the rule-based sys-tem already includes information on the entities we able to train adomain-specific NER unit However since the original user-input wasinformal the same information could be provided in different writ-ing styles That means that a single entity could have different surfaceforms (eg synonyms writing styles) Therefore entities that we ex-tracted from the rule-based system were transformed into a universalstandard (eg official chapter names) To consider all of the variableentity forms while post-labeling the original dataset we determinedgeneric entity names (eg chapter question nr) and mapped vari-ations of entities from the user input (eg Chapter = [ElementaryCalculus Chapter I ]) to them The overall dataset consists of 300labeled dialogues where 200 (with 1161 utterances) were employedfor training and 100 for evaluation and test sets (50 dialogues withca 300 utterances for each set respectively)

[ September 4 2020 at 1017 ndash classicthesis ]

46 next action prediction 49

451 Model Settings

We conducted experiments with German and multilingual BERT im-plementations5 The capitalization of words is significant for the Ger-man language therefore we run the experiments on the capitalizeddata while preserving the original punctuation We employed the avail-able base model for both multilingual and German BERT implemen-tations in the cased version We initiated the learning rate for bothmodels to 1e minus 4 and the maximum length of the tokenized inputwas set to 128 tokens We conducted the experiments multiple timeswith different seeds for a maximum of 50 epochs with the trainingbatch size set to 32 We utilized AdamW as the optimizer and appliedearly stopping if the performance did not improve significantly after5 epochs

46 next action prediction

We defined a classification problem where we aim to predict the sys-temrsquos next action according to the given user input We assumed 14custom actions (see Table 15) that we considered being our classesWhile collecting the toy dataset every input was automatically la-beled by the rule-based system with the corresponding next actionand the dialogue-id Thus no additional post-labeling was requiredon this step

We investigated two settings for this task

bull Default Setting Utilizing a user input and the correspondingclass (ie next action) without any supplementary context Bydefault we conduct all of our experiments in this setting

bull Extended Setting Employing a user input corresponding nextaction and previous systems action as a source of supplementarycontext For this setting we experimented with the best per-forming model from the default setting

The overall dataset consists of 300 labeled dialogues where 200(with 1161 utterances) were employed for training and 100 for evalu-ation and test sets (50 dialogues with ca 300 utterances for each setrespectively)

461 Model Settings

For the NAP task we carried out experiments with German and multi-lingual BERT implementations as well We examined the performanceof both capitalized and lower-cased inputs as well as plain and prepro-cessed data For the multilingual BERT we used the base model inboth cased and uncased modifications For the German BERT we

5httpsgithubcomhuggingfacetransformers

[ September 4 2020 at 1017 ndash classicthesis ]

50 deep learning re-implementation of units

utilized the base model in the cased modification only6 For bothmodels we initiated the learning rate to 4e minus 5 and the maximumlength of the tokenized input was set to 128 tokens We conductedthe experiments multiple times with different seeds for a maximumof 300 epochs with the training batch size set to 32 We utilized AdamW

as the optimizer and applied early stopping if the performance didnot improve significantly after 15 epochs

47 evaluation and results

To evaluate the models we computed word-level macro F1 score forthe Named Entity Recognition (NER) task and utterance-level macro F1

score for the Next Action Prediction (NAP) task The word-level F1

is estimated as the average of the F1 scores per class each computedfrom all words in the evaluation and test sets The outcomes for theNER task are represented in Table 17 For utterance-level F1 a singleclass label (ie next action) is taken for the whole utterance Theresults for the NAP task are shown in Table 18

We additionally computed average dialogue accuracy for the best per-forming NAP models This score indicates how well the predictednext actions compose the conversational flow The average dialogueaccuracy was estimated for 50 conversations in the evaluation andtest sets respectively The results are displayed in Table 19

The obtained results for the NER task revealed that German BERT

performed significantly better than the multilingual BERT model Theperformance of the custom NER unit is at 093 macro F1 points for allpossible named entities (see Table 17) In contrast for the NAP taskthe multilingual BERT model obtained better performance than theGerman BERT model The best performing method in the default set-ting gained a macro F1 of 0677 points for 14 possible classes whereasthe model in the extended setting performed better ndash its best macroF1 score is 0752 for the equal number of classes (see Table 18)

Regarding the dialogue accuracy the extended system fine-tunedwith multilingual BERT obtained better outcomes as the default one(see Table 19) The overall conclusion for the NAP is that the capi-talized setting increased the performance of the model whereas theinclusion of punctuation has not influenced the results

6Uncased pre-trained modification of model was not available

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 51

Task Model Cased Punct F1

Eval Test

NER GER X X 0971 0930

NER Mult X X 0926 0905

Table 17 Word-level F1 for the Named Entity Recognition (NER) task Inbold best performance for evaluation and test sets

Task Model Cased Punct Extended F1

Eval Test

NAP GER X times times 0711 0673

NAP GER X X times 0701 0606

NAP Mult X X times 0688 0625

NAP Mult X times times 0769 0677

NAP Mult X times X 0810 0752

NAP Mult times X times 0664 0596

NAP Mult times times times 0742 0502

Table 18 Utterance-level F1 for the Next Action Prediction (NAP) task Un-derlined best performance for evaluation and test sets for defaultsetting (without previous action context) In bold best perfor-mance for evaluation and test sets on extended setting (with previ-ous action context)

Model Accuracy

Eval Test

NAP default 0765 0724

NAP extended 0813 0801

Table 19 Average dialogue accuracy computed for the Next Action Predic-tion (NAP) task for best preforming models In bold best perfor-mance for evaluation and test sets

[ September 4 2020 at 1017 ndash classicthesis ]

52 deep learning re-implementation of units

Figure5M

acroF1

evaluationN

AP

ndashdefault

modelC

omparison

between

Germ

anand

multilingual

BERT

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 53

Figu

re6

Mac

roF1

test

NA

Pndash

defa

ult

mod

elC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

54 deep learning re-implementation of units

Figure7M

acroF1

evaluationN

AP

ndashextended

model

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 55

Figu

re8

Mac

roF1

test

NA

Pndash

exte

nded

mod

el

[ September 4 2020 at 1017 ndash classicthesis ]

56 deep learning re-implementation of units

Figure9M

acroF1

evaluationN

ERndash

Com

parisonbetw

eenG

erman

andm

ultilingualBER

T

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 57

Figu

re1

0M

acroF1

test

NER

ndashC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

58 deep learning re-implementation of units

48 error analysis

We then investigated the cases where models failed to predict thecorrect action or labeled the named entity span erroneously Belowwe outline the most common errors

next action prediction One of the prevalent flaws in the de-fault model was the mismatch between two consecutive actionsndash the action question number and subtopic That is due to theposition of these actions in the conversational flow The occur-rence of both actions in the dialogue is not strict and largelydepends on the previous action To explain this we illustratethe following possible conversational scenarios

bull The system can either directly proceed with the request forthe question number if the examination mode is the finalexamination

bull If the examination mode is training or exercise the systemhas to ask about the level first (ie chapter or section)

bull If it is a chapter-level the system can again directly pro-ceed with the request for the question number

bull In the case of section-level the system has to ask aboutthe subsection first (if this information was not providedbefore)

The examination of the extended model showed that the inductionof supplementary context (ie previous action) improved theperformance of the system by about 60

named entity recognition The cases where the system failedcover mismatches between the labels chapter and other andthe labels question number and other This failure arose dueto the imperfectly labeled span of a multi-word named entity(eg ldquoElementary Calculus Proportionality Percentagerdquo) The firstor last words in the named entity were excluded from the spanand erroneously labeled with the other label

49 summary

In this chapter we re-implemented two of three fundamental unitsof our IPA conversational agent ndash Named Entity Recognition (NER)and Dialogue Manager (DM) ndash using the methods of Transfer Learn-ing (TL) specifically the Bidirectional Encoder Representations fromTransformers (BERT) model

We believe that the obtained results are reasonably high consider-ing a comparatively little amount of training data we employed tofine-tune the models Therefore we conclude that both NAP and NER

segments could be used to replace or extend the existing rule-based

[ September 4 2020 at 1017 ndash classicthesis ]

410 future work 59

modules Rule-based components as well as ElasticSearch (ES) arelimited in their capacities and not flexible to novel patterns Whereasthe trainable units generalize better and could possibly decrease theamount of inaccurate predictions in case of accidental dialogue be-havior Furthermore to increase the overall robustness rule-basedand trainable components could be used synchronously when onesystem fails (eg ElasticSearch (ES) module can not extract an entity)the conversation continues on the prediction obtained from the othermodel

410 future work

There are possible directions for future work

bull At the time of writing of this thesis both units were trained andinvestigated on its own ndash that means without being incorpo-rated into the initial IPA conversational system Thus one of thepotential directions of future work would be the embodimentof these units into the original system and the examination ofpossible failure cases Furthermore the resulting hybrid sys-tem could then imply the constraint of additional meta policiesto prevent an unexpected systems behavior

bull Further investigation could be towards the multi-language mo-dality for the re-implemented units Since the Online Mathe-matik Bruumlckenkurs (OMB+) platform also supports English andChinese it would be interesting to examine whether the sim-ple translation from target language (ie English Chinese) tosource language (ie German) would be sufficient to employalready-assembled dataset and pre-trained units Presumablyit should be achievable in the case of the Next Action Predic-tion (NAP) unit but could lead to a noticeable performance dropfor the Named Entity Recognition (NER) task since there couldbe a considerable gap between translated and original entitiesspans

bull Last but not least one of the further extensions for the IPA

would be the eventual applicability for more cognitively de-manding tasks For instance it is of interest to extend the modelfor more complex domains like handling of mathematical ques-tions Would it be possible to automate the human assistance atleast by less complicated mathematical inquiries or is it still un-achievable with currently existing NLP techniques and machinelearning methods

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

Part II

C L U S T E R I N G - B A S E D T R E N D A N A LY Z E R

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

5F O U N D AT I O N S

As discussed in Chapter 1 predictive analytics is one of the poten-tial Intelligent Process Automation (IPA) tasks due to its capabilityto forecast future events by using current and historical data andtherethrough assist knowledge workers with for instance sales or in-vestments Modern machine learning algorithms allow an intelligentand fast analysis of large amounts of data However the evaluationof such algorithms and models remains one of the grave problemsin this domain This is mostly due to the lack of publicly availablebenchmarks for the evaluation of topic modeling and trend detectionalgorithms Hence in this part of the thesis we are addressing thischallenge and assembling the benchmark for detecting both trendsand downtrends (ie topics that become less frequent overtime) Tothe best of our knowledge the task of downtrend detection has notbeen well addressed before

In this chapter we introduce the concepts required by the secondpart of the thesis We begin with the enlightenment to the topic mod-eling task and define the emerging trend in Section 51 Next weexamine the related work existing approaches to topic modeling andtrend detection and their limitations in Section 52 In Chapter 6we introduce our TRENDNERT benchmark for trend and downtrenddetection and describe the method we employed for its assembly

51 motivation and challenges

Science changes and evolves rapidly novel research areas emergewhile others fade away Keeping pace with these changes is challeng-ing Therefore recognizing and forecasting emerging research trends isof significant importance for researchers academic publishers as wellas for funding agencies stakeholders and innovation companies

Previously the task of trend detection was mostly solved by do-main experts who used specialized and cumbersome tools to inves-tigate the data to get useful insights from it (eg through data visu-alization or statistics) However manual analysis of large amountsof data could be time-consuming and hiring domain experts is verycostly Furthermore the overall increase of research data (ie GoogleScholar PubMed DBLP) in the past decade makes the approachbased on human domain experts less and less scalable In contrastautomated approaches become a more desirable solution for the taskof emerging trend identification (Kontostathis et al 2004 Salatino2015) Due to a large number of electronic documents appearing on

63

[ September 4 2020 at 1017 ndash classicthesis ]

64 foundations

the web daily several machine learning techniques were developed tofind patterns of word occurrences in those documents using hierarchi-cal probabilistic models These approaches are called ndash topic modelsand they allow the analysis of extensive textual collections and valu-able insights from them The main advantage of topic models is theirability to discover hidden patterns of word-use and connecting docu-ments containing similar patterns In other words the idea of a topicmodel is that documents are a mixture of topics where a topic is de-fined as a probability distribution over words (Alghamdi and Alfalqi2015)

Topics have a property of being able to evolve thus modeling top-ics without considering time may confuse the precise topic identifi-cation Whereas modeling topics by considering time is called topicevolution modeling and it can reveal important hidden informationin the document collection allowing recognizing topics with the ap-pearance of time This approach also provides the ability to check theevolution of topics during a given time window and examine whethera given topic is gaining interest and popularity over time (ie becom-ing an emerging trend) or if its development is stable Considering thelatter approach we turn to the task of emerging trend identification

How is an emerging trend defined An emerging trend is a topicthat is gaining interest and utility over time and it has two attributesnovelty and growth (Small Boyack and Klavans 2014 Tu and Seng2012) Rotolo Hicks and Martin (2015) explain the correlations ofthese two attributes in the following way Up to the point of theemergence a research topic is specified by a high level of novelty Atthat time it does not attract any considerable attention from the sci-entific community and due to the limited impact its growth is nearlyflat After some turning points (ie the appearance of significant sci-entific publications) the research topic starts to grow faster and maybecome a trend if the growth is fast however the level of a noveltydecreases continuously once the emergence becomes clear Then atthe final phase the topic becomes well-established while its noveltylevels out (He and Chen 2018)

To detect trends in textual data one employs computational anal-ysis techniques and Natural Language Processing (NLP) In the sub-sequent section we give an overview of the existing approaches fortopic modeling and trend detection

52 detection of emerging research trends

The task of trend detection is closely associated with the topic mod-eling task since in the first instance it detects the latent topics in thesubstantial collection of data It secondarily employs techniques toinvestigate the evolution of the topic and whether it has the potentialto become a trend

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 65

Topic modeling has an extensive history of applications in the scien-tific domain including but not limited to studies of temporal trends(Griffiths and Steyvers 2004 Wang and McCallum 2006) and investi-gation of the impact prediction (Yogatama et al 2011) Below we givea general overview of the most significant methods used in this do-main explain their importance and if any the potential limitations

521 Methods for Topic Modeling

In order to extract a topic from textual documents (ie research pa-pers blog posts) the precise definition of the model for topic repre-sentation is crucial

latent semantic analysis or formerly called Latent SemanticIndexing is one of the fundamental methods in the topic model-ing domain proposed by Deerwester et al in 1990 Its primarygoal is to generate vector-based representations for documentsand terms and then employ these representations to computethe similarity between documents (resp terms) in order to se-lect the most related ones In this regard Latent Semantic Anal-ysis (LSA) takes a matrix of documents and terms and then de-composes it into a separate document-topic matrix and a topic-term matrix Before turning to the central task of finding latenttopics that capture the relationships among the words and docu-ments it also performs dimensionality reduction utilizing trun-cated Singular Value Decomposition This technique is usuallyapplied to reduce the sparseness noisiness and redundancyacross dimensions in the document-topic matrix Finally usingthe resulting document and term vectors one applies measuressuch as cosine similarity to evaluate the similarity of differentdocuments words or queries

One of the extensions to traditional LSA is probabilistic LatentSemantic Analysis (pLSA) that was introduced by Hofmann in1999 to fix several shortcomings The foremost advantage ofthe follow-up method was that it could successfully automatedocument indexing which in turn improved the LSA in a prob-abilistic sense by using a generative model (Alghamdi and Al-falqi 2015) Furthermore the pLSA can differentiate betweendiverse contexts of word usage without utilizing dictionariesor a thesaurus That affects better polysemy disambiguationand discloses typical similarities by grouping words that sharea similar context Among the works that employ pLSA is theapproach proposed by Gohr et al (2009) where authors applypLSA in a window that slides across the stream of documents toanalyze the evolution of topics Also Mei et al (2008) uses thepLSA in order to create a network of topics

[ September 4 2020 at 1017 ndash classicthesis ]

66 foundations

latent dirichlet allocation was developed by Blei Ng andJordan (2003) to overcome the limitations of both LSA and pLSA

models Nowadays Latent Dirichlet Allocation (LDA) is ex-pected to be one of the most utilized and robust probabilistictopic models The fundamental concept of LDA is the assump-tion that every document is represented as a mixture of topicswhere each topic is a discrete probability distribution that de-fines the probability of each word to appear in a given topicThese topic probabilities provide a compressed representationof a document In other words LDA models all of d documentsin the collection as a mixture over n latent topics where eachof the topics describes a multinomial distribution over awwordin the vocabulary (Alghamdi and Alfalqi 2015)

LDA fixes such weaknesses of the pLSA like the incapacity toassign a probability to previously unseen documents (becausepLSA learns the topic mixtures only for documents seen in thetraining phase) and the overfitting problem (ie the number ofparameters in pLSA increases linearly with the size of the corpus)(Salatino 2015)

An extension of the conventional LDA is the hierarchical LDA

(Griffiths et al 2004) where topics are grouped in hierarchiesEach node in the hierarchy is associated with a topic and a topicis a distribution across words Another comparable approach isthe Relational Topic Model (Chang and Blei 2010) which is amixture of a topic model and a network model for groups oflinked documents The attribute of every document is its text(ie discrete observations taken from a fixed vocabulary) andthe links between documents are hyperlinks or citations

LDA is a prevalent method for discovering topics (Blei and Laf-ferty 2006 Bolelli Ertekin and Giles 2009 Bolelli et al 2009Hall Jurafsky and Manning 2008 Wang and Blei 2011 Wangand McCallum 2006) However it was explored that to trainand fine-tune a stable LDA model could be very challenging andtime-consuming (Agrawal Fu and Menzies 2018)

correlated topic model One of the main drawbacks of the LDA

is its incapability of capturing dependencies among the topics(Alghamdi and Alfalqi 2015 He et al 2017) Therefore Bleiand Lafferty (2007) proposed the Correlated Topic Model as anextension of the LDA approach In this model the authors re-placed the Dirichlet distribution with a logistic-normal distribu-tion that models pairwise topic correlations with the Gaussiancovariance matrix (He et al 2017)

However one of the shortcomings of this model is the compu-tational cost The number of parameters in the covariance ma-trix increases quadratic to the number of topics and parameterestimation for the full-rank matrix can be inaccurate in high-dimensional space (He et al 2017) Therefore He et al (2017)

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 67

proposed their method to overcome this problem The authorsintroduced the model which learns dense topic embeddingsand captures topic correlations through the similarity betweenthe topic vectors According to He et al their method allowsefficient inference in the low-dimensional embedding space andsignificantly reduces previous cubic or quadratic time complex-ity to linear concerning the topic number

522 Topic Evolution of Scientific Publications

Although the topic modeling task is essential it has a significant off-spring task namely the modeling of topic evolution Methods able toidentify topics within the context of time are highly applicable in thescientific domain They allow understanding of the topic lineage andreveal how research on one topic influences another Generally thesemethods could be classified according to the primary sources of in-formation they employ text citations or key phrases

text-based Extensive textual collections have dynamic co-occurre-nces of word patterns that change over time Thus models thattrack topics evolution over time should consider both word co-occurrence patterns and the timestamps Blei and Lafferty 2006

In 2006 Wang and McCallum proposed a Non-Markov Continu-ous Time model for the detection of topical trends in the col-lection of NeurIPS publications (the overall timestamp of 17years was considered) The authors made the following as-sumption If the pattern of the word co-occurrence exists fora short time then the system creates a narrow-time-distributiontopic Whereas for the long-term patterns system generated abroad- time-distribution topic The principal objective of this ap-proach is that it models topic evolution without discretizing thetime ie the state at time t + t1 is independent of the state attime t (Wang and McCallum 2006)

Another work done by Blei and Lafferty (2006) proposed the Dy-namic Topic Models The authors assumed that the collectionof documents is organized based on certain time spans and thedocuments belonging to every time span are represented witha K-component model Thereby the topics are associated withtime span t and emerge from topics corresponding to span timetminus 1 One of the main differences of this model compared toother existing approaches is that it utilizes Gaussian distribu-tion for the topic parameters instead of the conventional Dirich-let distribution and therefore can capture the evolution overvarious time spans

[ September 4 2020 at 1017 ndash classicthesis ]

68 foundations

citation-based analysis is the second popular direction that isconsidered to be effective for trend identification He et al 2009Le Ho and Nakamori 2005 Shibata Kajikawa and Takeda2009 Shibata et al 2008 Small 2006 This method assumes thatcitations in their various forms (ie bibliographic citations co-citation networks citation graphs) indicate the meaningful rela-tionship between topics and uses citations to model the topicevolution in the scientific domain Nie and Sun (2017) utilizethis approach along with the Latent Dirichlet Allocation (LDA)and k-means clustering to identify research trends The authorsfirst use LDA to extract features and determine the optimal num-ber of topics Then they use k-means to obtain thematic clus-ters and finally they compute citation functions to identify thechanges of clusters over time

Though the citation-based approachrsquos main drawback is that aconsistent collection of publications associated with a specificresearch topic is required for these techniques to detect the clus-ter and novelty of the particular topic He and Chen 2018 Fur-thermore there are also many trend detection scenarios (egresearch news articles or research blog posts) in which citationsare not readily available That makes this approach not applica-ble to this kind of data

keyphrase-based Another standard solution is based on the useof keyphrase information extracted from research papers In thiscase every keyword is representative of a single research topicSystems like Saffron (Monaghan et al 2010) as well as Saffron-based models use this approach Saffron is a research toolkitthat provides algorithms for keyphrase extraction entity link-ing and taxonomy extraction Asooja et al (2016) utilize Saffronto extract the keywords from LREC proceedings and then pro-posed trend forecast based on regression models as an extensionto Saffron

However this method raises many questions including the fol-lowing how to deal with the noisiness of keywords (ie notrelevant keywords) and how to handle keywords that have theirhierarchies (ie sub-areas) Other drawbacks of this approachinclude the ambiguity of keywords (ie ldquoJavardquo as an island andldquoJavardquo as a programming language) or necessity to deal withsynonyms which could be treated as different topics

53 summary

An emerging trend detection algorithm aims to recognize topics thatwere earlier inappreciable but now gaining the importance and in-terest within the specific domain Knowledge of emerging trends isespecially essential for stakeholders researchers academic publish-ers and funding organizations Assume a business analyst in the

[ September 4 2020 at 1017 ndash classicthesis ]

53 summary 69

biotech company who is interested in the analysis of technical arti-cles for recent trends that could potentially impact the companiesinvestments A manual review of all the available information in aspecific domain would be extremely time-consuming or even not fea-sible However this process could be solved through Intelligent Pro-cess Automation (IPA) algorithms Such software would assist humanexperts who are tasked with identifying emerging trends through au-tomatic analysis of extensive textual collections and visualization ofoccurrences and tendencies of a given topic

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

6B E N C H M A R K F O R S C I E N T I F I C T R E N D D E T E C T I O N

This chapter covers work already published at interna-tional peer-reviewed conferences The relevant publica-tion is Moiseeva and Schuumltze 2020 The research describedin this chapter was carried out in its entirety by the authorof the thesis The other author of the publication acted asan advisor

Computational analysis and modeling of the evolution of trendsis a significant area of research because of its socio-economic impactHowever no large publicly available benchmark for trend detectioncurrently exists making a comparative evaluation of methods impos-sible We remedy this situation by publishing the benchmark TREND-NERT consisting of a set of gold trends (resp downtrends) and doc-ument labels that is available as an unlimited download and a largeunderlying document collection that can also be obtained for free Wepropose Mean Average Precision (MAP) as an evaluation measure fortrend detection and apply this measure in an investigation of severalbaselines

61 motivation

What is an emerging research topic and how is it defined in the litera-ture At the time of emergence a research topic often does not attractmuch recognition from the scientific society and is exemplified by justa few publications He and Chen (2018) associate the appearance ofa trend to some triggering event eg the publication of an article ina high-impact journal Later the research topic starts to grow fasterand becomes a trend (He and Chen 2018)

We adopt this as our key representation of trend in this paper atrend is a research topic with a strongly increasing number of publi-cations in a particular time interval In opposite to trends there arealso downtrends which refer to the topics that move lower in theirdemand or growth as they evolve Therefore we define a downtrendas the converse to a trend a downtrend is a research topic that has astrongly decreasing number of publications in a particular time inter-val Downtrends can be as important to detect as trends eg for afunding agency planning its budget

To detect both types of topics (ie trends and downtrends) intextual data one employs computational analysis techniques and NLPThese methods enable the analysis of extensive textual collectionsand gain valuable insights into topical trendiness and evolution over

71

[ September 4 2020 at 1017 ndash classicthesis ]

72 benchmark for scientific trend detection

time Despite the overall importance of trend analysis there is to ourknowledge no large publicly available benchmark for trend detectionThis makes a comparative evaluation of methods and algorithms im-possible Most of the previous work has used proprietary or moder-ate datasets or evaluation measures were not directly bound to theobjectives of trend detection (Gollapalli and Li 2015)

We remedy this situation by publishing the benchmark TREND-NERT1 The benchmark consists of the following

1 A set of gold trends compiled based on an extensive analysis ofthe meta-literature

2 A set of labeled documents (based on more than one millionscientific publications) where each document is assigned to atrend a downtrend or the class of flat topics and supported bythe topic-title

3 An evaluation script to compute Mean Average Precision (MAP)as the measure for the accuracy of trend detection

We make the benchmark available as unlimited downloads2 Theunderlying research corpus can be obtained for free from SemanticScholar3 (Ammar et al 2018)

611 Outline and Contributions

In this work we address the challenge of the benchmark creationfor (down)trend detection This chapter is structured as follows Sec-tion 62 introduces our model for corpus creation and its fundamen-tal components In Section 63 we present the crowdsourcing proce-dure for the benchmark labeling Section 65 outlines the proposedbaseline for trend detection our experimental setup and outcomesFinally we illustrate the details on the distributed benchmark in Sec-tion 66 and conclude in Section 67

Our contributions within this work are as follows

bull TRENDNERT is the first publicly available benchmark for (down)-trend detection

bull TRENDNERT is based on a collection of more than one milliondocuments It is among the largest that has been used for trenddetection and therefore offers a realistic setting for developingtrend detection algorithms

bull TRENDNERT addresses the task of detecting both trends anddowntrends To the best of our knowledge the task of down-trend detection has not been addressed before

1The name is a concatenation of trend and its ananym dnert Because it supportsthe evaluation of both trends and downtrends

2httpsdoiorg1017605OSFIODHZCT3httpsapisemanticscholarorgcorpus

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 73

62 corpus creation

In this section we explain the methodology we used to create theTRENDNERT benchmark

621 Underlying Data

We employ a corpus provided by Semantic Scholar4 It includes morethan one million papers published mostly between 2000 and 2015

in about 5000 computer science journals and conference proceedingsEvery record in the collection consists of a title key phrases abstractfull text and metadata

622 Stratification

The distribution of documents over time in the entire original datasetis skewed (see Figure 11) We discovered that clustering the entire col-lection as well as a random sample produces corrupt results becausemore weight is given to later years than to earlier years Thereforewe first generated a stratified sample of the original document collec-tion To this end we randomly pick 10 000 documents for each yearbetween 2000 and 2016 for an overall sample size of 160 000

Figure 11 Overall distribution of papers in the entire dataset Years 1975 to2016

4httpswwwsemanticscholarorg

[ September 4 2020 at 1017 ndash classicthesis ]

74 benchmark for scientific trend detection

623 Document representations amp Clustering

As we mentioned this workrsquos primary focus was the creation of abenchmark not the development of new algorithms or models fortrend detection Therefore for our experiments we have selected analgorithm based on k-means clustering as a simple baseline methodfor trend detection

In conventional document clustering documents are usually repre-sented as bag-of-words (BOW) feature vectors (Manning Raghavanand Schuumltze 2008) Those however have a major weakness theyneglect the semantics of words (Le and Mikolov 2014) Still the re-cent work in representation learning domain and particularly doc2vecndash the method proposed by Le and Mikolov (2014) ndash can provide rep-resentations to document clustering that overcome this weakness

The doc2vec5 algorithm is inspired by techniques for learning wordvectors (Mikolov et al 2013) and is capable of capturing semantic reg-ularities in textual collections This is an unsupervised approach forlearning continuous distributed vector representations for text (ieparagraphs documents) This approach maps documents into a vec-tor space such that semantically related documents are assigned sim-ilar vector representations (eg an article about ldquogenomicsrdquo is closerto an article about ldquogene expressionrdquo than to an article about ldquofuzzysetsrdquo) Formally each paragraph is mapped to the individual vectorrepresented by a column in matrix D and every word is also mappedto the individual vector represented by a column in matrix W Theparagraph vector and word vectors are concatenated to predict thenext word in a context (Le and Mikolov 2014) Other work has al-ready successfully employed this type of document representationsfor topic modeling combining them with both Latent Dirichlet Allo-cation (LDA) and clustering approaches (Curiskis et al 2019 DiengRuiz and Blei 2019 Moody 2016 Xie and Xing 2013)

In our work we run doc2vec on the stratified sample of 160 000papers and represent each document as a length-normalized vector(ie document embedding) These vectors are then clustered intok = 1000 clusters using the scikit-learn6 implementation of k-means(MacQueen 1967) with default parameters The combination of doc-ument representations with a clustering algorithm is conceptuallysimple and interpretable Also the comprehensive comparative eval-uation of topic modeling methods utilizing document embeddingsperformed by Curiskis et al (2019) showed that doc2vec feature repre-sentations with k-means clustering outperform several other methods7

on three evaluation measures (Normalized Mutual Information Ad-justed Mutual Information and Adjusted Rand Index)

5We use the implementation provided by Gensim httpsradimrehurekcom

gensimmodelsdoc2vechtml6httpsscikit-learnorgstable7hierarchical clustering k-medoids NMF and LDA

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 75

We run 10 trials of stratification and clustering resulting in 10 dif-ferent clusterings We do this to protect against the variability ofclustering and because we do not want to rely on a single clusteringfor proposing (down)trends for the benchmark

624 Trend amp Downtrend Estimation

Recalling our definitions of trend and downtrend A trend (resp down-trend) is defined as a research topic that has a strongly increasing(resp decreasing) number of publications in a particular time inter-val

Linear trend estimation is a widely used technique in predictive(time-series) analysis that proves statements about tendencies in thedata by correlating the measurements to the times at which they oc-curred (Hess Iyer and Malm 2001) Considering the definition oftrend (resp downtrend) and the main concept of linear trend esti-mation models we utilize the linear regression to identify trend anddowntrend candidates in resulting clustering sets

Specifically we count the number of documents per year for a clus-ter and then estimate the parameters of the best fitting line (ie slope)Clusters are then ranked according to the resulting slope and then = 150 clusters with the largest (resp smallest) slope are selectedas trend (resp downtrend) candidates Thus our definition of a single-clustering (down)trend candidate is a cluster with extreme ascendingor descending slope Figure 12 and Figure 13 give two examples of(down)trend candidates

625 (Down)trend Candidates Validation

As detailed in Section 623 we run ten series of clustering result-ing in ten discrete clustering sets We then estimated the (down)trendcandidates according to the procedure described in Section 624 Con-sequently for each of the ten clustering sets we obtained 150 trendand 150 downtrend candidates

Nonetheless for the benchmark among all the (down)trend can-didates across the ten clustering sets we want to sort out only thosethat are consistent overall sets To obtain consistent (down)trend candi-dates we apply a Jaccard coefficient metric that is used for measuringthe similarity and diversity of sample sets We define an equivalencerelation simR two candidates (ie clusters) are equivalent if their Jac-card coefficient is gt τsim where τsim = 05 Following this notion wecompute the equivalence of (down)trend candidates across the tensets

Finally we define an equivalence class as an equivalence class trendcandidate (resp equivalence class downtrend candidate) if it contains trend(resp downtrends) candidates in at least half of the clusterings This

[ September 4 2020 at 1017 ndash classicthesis ]

76 benchmark for scientific trend detection

Figure 12 A trend candidate (positive slope) Topic Sentiment and EmotionAnalysis (using Social Media Channels)

Figure 13 A downtrend candidate (negative slope) Topic XML

[ September 4 2020 at 1017 ndash classicthesis ]

63 benchmark annotation 77

procedure gave us in total 110 equivalence class trend candidates and107 equivalence class downtrend candidates that we then annotatefor our benchmark8

63 benchmark annotation

To annotate the equivalence class (down)trend candidates obtainedfrom our model we used the Figure-Eight9 platform It provides anintegrated quality-control mechanism and a training phase before theactual annotation (Kolhatkar Zinsmeister and Hirst 2013) whichminimizes the rate of poorly qualified annotators

We design our annotation task as follows A worker is shown adescription of a particular (down)trend candidate along with the list ofpossible (down)trend tags Descriptions are mixed and presented inrandom order The temporal presentation order of candidates is alsoarbitrary The worker is asked to pick from the list with (down)trendtags an item that matches the cluster (resp candidate) best Evenif there are several items that are a possible match the worker mustchoose only one If there is no matching item the worker can selectthe option other Three different workers evaluated each cluster andthe final label is the majority label If there is no majority label wepresent the cluster repeatedly to the workers until there is a majority

The descriptors of candidates are collected through the followingprocedure

bull First we review the documents assigned to a candidate cluster

bull Then we extract keywords and titles of the papers attributed tothe cluster Semantic Scholar provides keywords and titles as apart of the metadata

bull Finally we compute the top 15 most frequent keywords andrandomly select 15 titles These keywords and titles are thedescriptor of the cluster candidate presented to crowd-workers

We create the (down)trend tags based on keywords titles and meta-data such as publication venue The cluster candidates were manu-ally labeled by graduate employees (ie domain experts in the com-puter science domain) considering the abovementioned items

631 Inter-annotator Agreement

To ensure the quality of obtained annotations we computed the InterAnnotator Agreement (IAA) - a measure of how well two (or more)

8Note that the overall number of 217 refers to the number of the (down)trendcandidates and not to the number of documents Each candidate contains hundredsof documents

9Formerly called CrowdFlower

[ September 4 2020 at 1017 ndash classicthesis ]

78 benchmark for scientific trend detection

annotators can make the same decision for a particular category Todo that we used the following metrics

krippendorffrsquos α is a reliability coefficient developed to measurethe agreement among observers or measuring instruments draw-ing distinctions among typically unstructured phenomena orassign computable values to them (Krippendorff 2011) Wechoose Krippendorffrsquos α as an inter-annotator agreement mea-sure because it ndash unlike other specialized coefficients ndash is a gen-eralization of several reliability criteria and applies to situationslike ours where we have more than two annotators and a givenannotator only labels a subset of the data (ie we have to dealwith missing values)

Krippendorff (2011) has proposed several metric variations eachof which is associated with a specific set of weights We useKrippendorffrsquos α considering any number of observers and miss-ing data In this case Krippendorffrsquos α for the overall numberof 217 annotated documents is α = 0798 According to Landisand Koch (1977) this value corresponds to a substantial level ofagreement

figure-eight agreement rate Figure-Eight10 provides an agree-ment score c for each annotated unit u which is based onthe majority vote of the trusted workers The score has beenproven to perform well compared to the classic metrics (Kol-hatkar Zinsmeister and Hirst 2013) We computed the averageC for the 217 annotated documents C is 0799

Since both Krippendorffrsquos α and internal Figure Eight IAA are ratherhigh we consider the obtained annotations for the benchmark to beof good quality

632 Gold Trends

We compiled a list of computer science trends for the years 2016based on the analysis of the twelve survey publications (Ankerholz2016 Augustin 2016 Brooks 2016 Frot 2016 Harriet 2015 IEEEComputer Society 2016 Markov 2015 Meyerson and Mariette 2016Nessma 2015 Reese 2015 Rivera 2016 Zaino 2016) For this yearwe identified a total of 31 trends see Table 20

Since there is a variation as to the name of a specific trend wecreated a unique name for each of them Out of the 31 trends 28(90 3) are instantiated as trends by our model that we confirmedin the crowdsourcing procedure11 We searched for the three missingtrends using a variety of techniques (ie inspection of non-trendsdowntrends random browsing of documents not assigned to trend

10Former CrowdFlower platform11The author verified this based on her judgment of equivalence of one of the 31

gold trend names in the literature with one of the trend tags

[ September 4 2020 at 1017 ndash classicthesis ]

64 evaluation measure for trend detection 79

candidates and keyword searches) Two of the missing trends donot seem to occur in the collection at all ldquoHuman Augmentationrdquo andldquoFood and Water Technologyrdquo The third missing trend ldquoCryptocurrencyand Cryptographyrdquo does occur in the collection but the occurrencerate is very small (ca 001) We do not consider these three trendsin the rest of the paper

Computer Science Trend Computer Science Trend

Autonomous Agents and Systems Natural Language Processing

Autonomous Vehicles Open Source

Big Data Analytics Privacy

Bioinformatics and (Bio) Neuroscience Quantum Computing

Biometrics and Personal Identification Recommender Systems and Social Networks

Cloud Computing and Software as a Service Reinforcement Learning

Cryptocurrency and Cryptographylowast Renewable Energy

Cyber Security Robotics

E-business Semantic Web

Food and Water Technologylowast Sentiment and Emotion Analysis

Game-based Learning Smart Cities

Games and (Virtual) Augmented Reality Supply Chains and RFIDs

Human Augmentationlowast Technology for Climate Change

MachineDeep Learning Transportation and Energy

Medical Advances and DNA Computing Wearables

Mobile Computing

Table 20 31 areas identified as computer science trends for year 2016 in themedia 28 of these trends are instantiated by our model as welland thus covered in our benchmark Topics marked with asterisk(lowast) were not found in our underlying data collection

In summary based on our analysis of meta-literature we identified28 gold trends that also occur in our collection We will use them asthe gold standard that trend detection algorithms should aim to discover

64 evaluation measure for trend detection

Whether something is a trend is a graded concept For this reasonwe adopt an evaluation measure based on a ranking of candidatesspecifically Mean Average Precision (MAP) The score denotes theaverage of the precision values of ranks of correctly identified andnon-redundant trends

We approach trend detection as computing a ranked list of sets ofdocuments where each set is a trend candidate We refer to docu-ments as trendy downtrendy and flat depending on which class theyare assigned to in our benchmark We consider a trend candidate

[ September 4 2020 at 1017 ndash classicthesis ]

80 benchmark for scientific trend detection

c as a correct recognition of gold trend t if it satisfies the followingrequirements

bull c is trendy

bull t is the largest trend in c

bull |tcap c||c| gt ρ where ρ is a coverage parameter

bull t was not earlier recognized in the list

These criteria give us a true positive (ie recognition of a gold trend)or false positive (otherwise) for every position of the ranked list anda precision score for each trend Precision for a trend that was notobserved is 0 We then compute MAP see Table 21 for an example

Gold Trends Gold Downtrends

Cloud Computing Compilers

Bioinformatics Petri Nets

Sentiment Analysis Fuzzy Sets

Privacy Routing Protocols

Trend Candidate |tcap c||c| tpfp P

c1 Cloud Computing 060 tp 100

c2 Privacy 020 fp 050

c3 Cloud Computing 070 fp 033

c4 Bioinformatics 055 tp 050

c5 Sentiment Analysis 095 tp 060

c6 Sentiment Analysis 059 fp 050

c7 Bioinformatics 041 fp 043

c8 Compilers 060 fp 038

c9 Petri Nets 080 fp 033

c10 Privacy 033 fp 030

Table 21 Example of proposed MAP evaluation (with ρ = 05) MAP is theaverage of the precision values 10 (Cloud Computing) 05 (Bioin-formatics) 06 (Sentiment Analysis) and 00 (Privacy) ie MAP =214 = 0525 True positive tp False positive fp Precision P

65 trend detection baselines

In this section we investigate the impact of four different configura-tion choices on the performance of trend detection by way of ablation

[ September 4 2020 at 1017 ndash classicthesis ]

65 trend detection baselines 81

651 Configurations

abstracts vs full texts The underlying data collection containsboth abstracts and full texts12 Our initial expectation was thatthe full text of a document comprises more information thanthe abstract and it should be a more reliable basis for trend de-tection This is one of the configurations we test in the ablationWe employ our proposed method on full text (solely documenttexts without abstracts) and abstract (only abstract texts) collec-tion separately and observe the results

stratification Due to the uneven distribution of documents overthe years the last years (with increased volume of publications)may get too much impact on results compared to earlier yearsTo investigate this we conduct an ablation experiment wherewe compare (i) randomly sampled 160k documents from theentire collection and (ii) stratified sampling In stratified sam-pling we select 10k documents for each of the years 2000 ndash 2015resulting in a sampled collection of 160k

length Lt of interval Clusters are ranked by trendiness and theresulting ranked list is evaluated To measure the trendiness orgrowth of topics over time we fit a line to an interval (ini|i0 6i lt i0 + Lt by linear regression where Lt is the length of theinterval i is one of the 16 years (2000 2015) and ni is thenumber of documents that were assigned to cluster c in thatyear As a simple default baseline we apply the regression to ahalf of an entire interval ie Lt = 8 There are nine such inter-vals in our 16-year period As the final measure of growth forthe cluster we take the maximal of the nine individual slopesTo determine how much this configuration choice affects our re-sults we also test a linear regression over the entire time spanie Lt = 16 In this case there is a single interval

clustering method We consider Gaussian Mixture Models (GMM)and k-means We use GMM with spherical type of covariancewhere each component has the same variance For both weproceed as described in Section 623

652 Experimental Setup and Results

We conducted experiments on the Semantic Scholar corpus and eval-uated a ranked list of trend candidates against the benchmark con-sidering the configurations mentioned above We adopted Mean Av-erage Precision (MAP) as the principal evaluation measure As a sec-ondary evaluation measure we computed recall at 50 (R50) whichestimates the percentage of gold trends found in the list of 50 highest-ranked trend candidates We found that the configuration (0) in Ta-

12Semantic Scholar provided the underlying dataset at the beginning of our workin 2016

[ September 4 2020 at 1017 ndash classicthesis ]

82 benchmark for scientific trend detection

ble 22 works best We then conducted ablation experiments to dis-cover the importance of the configuration choices

(0) (1) (2) (3) (4)

document part A F A A A

stratification yes yes no yes yes

Lt 8 8 8 16 8

clustering GMMsph GMMsph GMMsph GMMsph kmicro

MAP 36 (03) 07 (19) 30 (03) 32 (04) 34 (21)

R50 avg 50 (012) 25 (018) 50 (016) 46 (018) 53(014)

R50 max 61 37 53 61 61

Table 22 Ablation results Standard deviations are in parentheses GMMclustering performed with the spherical (sph) type of covarianceK-means clustering is denoted as kmicro in the ablation table

comparing (0) and (1) 13 we observed that abstracts (A) are amore reliable representation for our trend detection baselinethan full texts (F) The possible explanation for that observa-tion is that an abstract is a summary of a scientific paper thatcovers only the central points and is semantically very conciseand rich In opposite the full text contains numerous parts thatare secondary (ie future work related work) and thus mayskew the documentrsquos overall meaning

comparing (0) and (2) we recognized that stratification (yes) im-proves results compared to the setting with non-stratified (no)randomly sampled data We would expect the effect of stratifi-cation to be even more substantial for collections in which thedistribution of documents over time is more skewed than in our

comparing (0) and (3) the length of the interval is a vital config-uration choice As we assumed the 16 year interval is ratherlong and 8 year interval could be a better choice primarily ifone aims to find short-term trends

comparing (0) and (4) we recognized that topics we obtain fromk-means have a similar nature to those from GMM HoweverGMM that take variance and covariance into account and esti-mate a soft assignment still perform slightly better

66 benchmark description

Below we report the contents of the TRENDNERT benchmark14

1 Two records containing information on documents One filewith the documents assigned to (down)trends and the otherfile with documents assigned to flat topics

13Numbers (0) (1) (2) (3) (4) are the configuration choices in the ablation table14httpsdoiorg1017605OSFIODHZCT

[ September 4 2020 at 1017 ndash classicthesis ]

67 summary 83

2 A script to produce document hash codes

3 A file to map every hash code to internal IDs in the benchmark

4 A script to compute MAP (default setting is ρ = 25)

5 READMEmd a file with guidance notes

Every document in the benchmark contains the following informa-tion

bull Paper ID internal ID of documents assigned to each documentin the original Semantic Scholar collection

bull Cluster ID unique ID of meta-cluster where the document wasobserved

bull LabelTag Name of the gold (down)trend assigned by crowd-workers (eg Cyber Security Fuzzy Sets and Systems)

bull Type of (down)trend candidate trend (T ) downtrend (D) or flattopic (F)

bull Hash ID15 of each paper from the original Semantic Scholarcollection

67 summary

Emerging trend detection is a promising task for Intelligent ProcessAutomation (IPA) that allows companies and funding agencies to au-tomatically analyze extensive textual collections in order to detectemerging fields and forecast the future employing machine learningalgorithms Thus the availability of publicly available benchmarksfor trend detection is of significant importance as only thereby canone evaluate the performance and robustness of the techniques em-ployed

Within this work we release TRENDNERT ndash the first publicly availablebenchmark for (down)trend detection that offers a realistic setting fordeveloping trend detection algorithms TRENDNERT also supportsdowntrend detection ndash a significant problem that was not addressedbefore We also present several experimental findings on trend detec-tion First stratification improves trend detection if the distribution ofdocuments is skewed over time Third abstract-based trend detectionperforms better than full text if straightforward models are used

68 future work

There are possible directions for future work

bull The presented approach to estimate the trendiness of the topicby means of linear regression is straightforward and can be

15MD5-hash of First Author + Title + Year

[ September 4 2020 at 1017 ndash classicthesis ]

84 benchmark for scientific trend detection

replaced by more sophisticated versions like Kendallrsquos τ cor-relation coefficient or the Least Squares Regression Smoother(LOESS) (Cleveland 1979)

bull It also would be challenging to replace the k-means clustering al-gorithm that we adopted within this work with the LDA modelwhile retaining the document representations as it was donein Dieng Ruiz and Blei 2019 The case to investigate wouldbe then whether the topic modeling outperforms the simpleclustering algorithm in this particular setting And would theresulting system benefit from the topic modeling algorithm

bull Furthermore it would be of interest to conduct more researchon the following What kind of document representation canmake effective use of the information in full texts Recentlyproposed contextualised word embeddings could be a possibledirection for these experiments

[ September 4 2020 at 1017 ndash classicthesis ]

7D I S C U S S I O N A N D F U T U R E W O R K

As we observed in the two parts of this thesis Intelligent Process Au-tomation (IPA) is a challenging research area due to its multifaceted-ness and appliance in a real-world scenario Its main objective is toautomatize time-consuming and routine tasks and free up the knowl-edge worker for more cognitively demanding tasks However thistask addresses many difficulties such as a lack of resources for bothtraining and evaluation of algorithms as well as an insufficient per-formance of machine learning techniques when applied to real-worlddata and practical problems In this thesis we investigated two IPA

applications within the Natural Language Processing (NLP) domainndash conversational agents and predictive analytics ndash as well as addressedsome of the most common issues

bull In the first part of the thesis we have addressed the challenge ofimplementing a robust conversational system for IPA purposeswithin the practical e-learning domain considering the condi-tions of missing training data The primary purpose of the re-sulting system is to perform the onboarding process betweenthe tutors and students This onboarding reduces repetitiveand time-consuming activities while allowing tutors to focuson solely mathematical questions which is the core task withthe most impact on students Besides that by interacting withusers the system augments the resources with structured andlabeled training data that we used to implement learnable dia-logue components

The realization of such a system was associated with severalchallenges Among others were missing structured data andambiguous user-generated content that requires a precise lan-guage understanding Also the system that simultaneouslycommunicates with a large number of users has to maintainadditional meta policies such as fallback and check-up policiesto hold the conversation intelligently

bull Moreover we have shown the rule-based dialogue systemrsquos po-tential extension using transfer learning We utilized a smalldataset of structured dialogues obtained in a trial-run of therule-based system and applied the current state-of-the-art Bidi-rectional Encoder Representations from Transformers (BERT) ar-chitecture to solve the domain-specific tasks Specifically weretrained two of three components in the conversational systemndash Named Entity Recognition (NER) and Dialogue Manager (DM)(ie NAP task) ndash and confirmed the applicability of such algo-

85

[ September 4 2020 at 1017 ndash classicthesis ]

86 discussion and future work

rithms in a real-world low-resource scenario by showing thereasonably high performance of resulting models

bull Last but not least in the second part of the thesis we addressedthe issue of missing publicly available benchmarks for the eval-uation of machine learning algorithms for the trend detectiontask To the best of our knowledge we released the first openbenchmark for this purpose which offers a realistic setting fordeveloping trend detection algorithms Besides that this bench-mark supports downtrend detection ndash a significant problemthat was not sufficiently addressed before We additionally pro-posed the evaluation measure and presented several experimen-tal findings on trend detection

The ultimate objective of Intelligent Process Automation (IPA) shouldbe a creation of robust algorithms in low-resource and domain-specificconditions This is still a challenging task however some of the theexperiments presented in this thesis revealed that this goal could bepotentially achieved by means of Transfer Learning (TL) techniques

[ September 4 2020 at 1017 ndash classicthesis ]

B I B L I O G R A P H Y

Agrawal Amritanshu Wei Fu and Tim Menzies (2018) bdquoWhat iswrong with topic modeling And how to fix it using search-basedsoftware engineeringldquo In Information and Software Technology 98pp 74ndash88 (cit on p 66)

Alan M (1950) bdquoTuringldquo In Computing machinery and intelligenceMind 59236 pp 433ndash460 (cit on p 7)

Alghamdi Rubayyi and Khalid Alfalqi (2015) bdquoA survey of topicmodeling in text miningldquo In Int J Adv Comput Sci Appl(IJACSA)61 (cit on pp 64ndash66)

Ammar Waleed Dirk Groeneveld Chandra Bhagavatula Iz BeltagyMiles Crawford Doug Downey Jason Dunkelberger Ahmed El-gohary Sergey Feldman Vu Ha et al (2018) bdquoConstruction ofthe literature graph in semantic scholarldquo In arXiv preprint arXiv1805 02262 (cit on p 72)

Ankerholz Amber (2016) 2016 Future of Open Source Survey Says OpenSource Is The Modern Architecture httpswwwlinuxcomnews2016-future-open-source-survey-says-open-source-modern-

architecture (cit on p 78)Asooja Kartik Georgeta Bordea Gabriela Vulcu and Paul Buitelaar

(2016) bdquoForecasting emerging trends from scientific literatureldquoIn Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) pp 417ndash420 (cit on p 68)

Augustin Jason (2016) Emergin Science and Technology Trends A Syn-thesis of Leading Forecast httpwwwdefenseinnovationmarket-placemilresources2016_SciTechReport_16June2016pdf (citon p 78)

Blei David M and John D Lafferty (2006) bdquoDynamic topic modelsldquoIn Proceedings of the 23rd international conference on Machine learn-ing ACM pp 113ndash120 (cit on pp 66 67)

Blei David M John D Lafferty et al (2007) bdquoA correlated topic modelof scienceldquo In The Annals of Applied Statistics 11 pp 17ndash35 (citon p 66)

Blei David M Andrew Y Ng and Michael I Jordan (2003) bdquoLatentdirichlet allocationldquo In Journal of machine Learning research 3Janpp 993ndash1022 (cit on p 66)

Bloem Peter (2019) Transformers from Scratch httpwwwpeterbloemnlblogtransformers (cit on p 45)

Bocklisch Tom Joey Faulkner Nick Pawlowski and Alan Nichol(2017) bdquoRasa Open source language understanding and dialoguemanagementldquo In arXiv preprint arXiv171205181 (cit on p 15)

Bolelli Levent Seyda Ertekin and C Giles (2009) bdquoTopic and trenddetection in text collections using latent dirichlet allocationldquo InAdvances in Information Retrieval pp 776ndash780 (cit on p 66)

87

[ September 4 2020 at 1017 ndash classicthesis ]

88 Bibliography

Bolelli Levent Seyda Ertekin Ding Zhou and C Lee Giles (2009)bdquoFinding topic trends in digital librariesldquo In Proceedings of the 9thACMIEEE-CS joint conference on Digital libraries ACM pp 69ndash72

(cit on p 66)Bordes Antoine Y-Lan Boureau and Jason Weston (2016) bdquoLearning

end-to-end goal-oriented dialogldquo In arXiv preprint arXiv160507683(cit on pp 16 17)

Brooks Chuck (2016) 7 Top Tech Trends Impacting Innovators in 2016httpinnovationexcellencecomblog201512267-top-

tech-trends-impacting-innovators-in-2016 (cit on p 78)Burtsev Mikhail Alexander Seliverstov Rafael Airapetyan Mikhail

Arkhipov Dilyara Baymurzina Eugene Botvinovsky NickolayBushkov Olga Gureenkova Andrey Kamenev Vasily Konovalovet al (2018) bdquoDeepPavlov An Open Source Library for Conver-sational AIldquo In (cit on p 15)

Chang Jonathan David M Blei et al (2010) bdquoHierarchical relationalmodels for document networksldquo In The Annals of Applied Statis-tics 41 pp 124ndash150 (cit on p 66)

Chen Hongshen Xiaorui Liu Dawei Yin and Jiliang Tang (2017)bdquoA survey on dialogue systems Recent advances and new fron-tiersldquo In Acm Sigkdd Explorations Newsletter 192 pp 25ndash35 (citon pp 14ndash16)

Chen Lingzhen Alessandro Moschitti Giuseppe Castellucci AndreaFavalli and Raniero Romagnoli (2018) bdquoTransfer Learning for In-dustrial Applications of Named Entity Recognitionldquo In NL4AIAI IA pp 129ndash140 (cit on p 43)

Cleveland William S (1979) bdquoRobust locally weighted regression andsmoothing scatterplotsldquo In Journal of the American statistical asso-ciation 74368 pp 829ndash836 (cit on p 84)

Collobert Ronan Jason Weston Leacuteon Bottou Michael Karlen KorayKavukcuoglu and Pavel Kuksa (2011) bdquoNatural language pro-cessing (almost) from scratchldquo In Journal of machine learning re-search 12Aug pp 2493ndash2537 (cit on p 16)

Cuayaacutehuitl Heriberto Simon Keizer and Oliver Lemon (2015) bdquoStrate-gic dialogue management via deep reinforcement learningldquo InarXiv preprint arXiv151108099 (cit on p 14)

Cui Lei Shaohan Huang Furu Wei Chuanqi Tan Chaoqun Duanand Ming Zhou (2017) bdquoSuperagent A customer service chat-bot for e-commerce websitesldquo In Proceedings of ACL 2017 SystemDemonstrations pp 97ndash102 (cit on p 8)

Curiskis Stephan A Barry Drake Thomas R Osborn and Paul JKennedy (2019) bdquoAn evaluation of document clustering and topicmodelling in two online social networks Twitter and Redditldquo InInformation Processing amp Management (cit on p 74)

Dai Zihang Zhilin Yang Yiming Yang William W Cohen Jaime Car-bonell Quoc V Le and Ruslan Salakhutdinov (2019) bdquoTransformer-xl Attentive language models beyond a fixed-length contextldquo InarXiv preprint arXiv190102860 (cit on pp 43 44)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 89

Daniel Ben (2015) bdquoBig Data and analytics in higher education Op-portunities and challengesldquo In British journal of educational tech-nology 465 pp 904ndash920 (cit on p 22)

Deerwester Scott Susan T Dumais George W Furnas Thomas K Lan-dauer and Richard Harshman (1990) bdquoIndexing by latent seman-tic analysisldquo In Journal of the American society for information sci-ence 416 pp 391ndash407 (cit on p 65)

Deoras Anoop Kaisheng Yao Xiaodong He Li Deng Geoffrey Ger-son Zweig Ruhi Sarikaya Dong Yu Mei-Yuh Hwang and Gre-goire Mesnil (2015) Assignment of semantic labels to a sequence ofwords using neural network architectures US Patent App 14016186

(cit on p 13)Devlin Jacob Ming-Wei Chang Kenton Lee and Kristina Toutanova

(2018) bdquoBert Pre-training of deep bidirectional transformers forlanguage understandingldquo In arXiv preprint arXiv181004805 (citon pp 42ndash45)

Dhingra Bhuwan Lihong Li Xiujun Li Jianfeng Gao Yun-NungChen Faisal Ahmed and Li Deng (2016) bdquoTowards end-to-end re-inforcement learning of dialogue agents for information accessldquoIn arXiv preprint arXiv160900777 (cit on p 16)

Dieng Adji B Francisco JR Ruiz and David M Blei (2019) bdquoTopicModeling in Embedding Spacesldquo In arXiv preprint arXiv190704907(cit on pp 74 84)

Donahue Jeff Yangqing Jia Oriol Vinyals Judy Hoffman Ning ZhangEric Tzeng and Trevor Darrell (2014) bdquoDecaf A deep convolu-tional activation feature for generic visual recognitionldquo In Inter-national conference on machine learning pp 647ndash655 (cit on p 43)

Eric Mihail and Christopher D Manning (2017) bdquoKey-value retrievalnetworks for task-oriented dialogueldquo In arXiv preprint arXiv 170505414 (cit on p 16)

Fang Hao Hao Cheng Maarten Sap Elizabeth Clark Ari HoltzmanYejin Choi Noah A Smith and Mari Ostendorf (2018) bdquoSoundingboard A user-centric and content-driven social chatbotldquo In arXivpreprint arXiv180410202 (cit on p 10)

Friedrich Fabian Jan Mendling and Frank Puhlmann (2011) bdquoPro-cess model generation from natural language textldquo In Interna-tional Conference on Advanced Information Systems Engineering pp 482

ndash496 (cit on p 2)Frot Mathilde (2016) 5 Trends in Computer Science Research https

wwwtopuniversitiescomcoursescomputer-science-info-

rmation-systems5-trends-computer-science-research (citon p 78)

Glasmachers Tobias (2017) bdquoLimits of end-to-end learningldquo In arXivpreprint arXiv170408305 (cit on pp 16 17)

Goddeau David Helen Meng Joseph Polifroni Stephanie Seneffand Senis Busayapongchai (1996) bdquoA form-based dialogue man-ager for spoken language applicationsldquo In Proceeding of FourthInternational Conference on Spoken Language Processing ICSLPrsquo96Vol 2 IEEE pp 701ndash704 (cit on p 14)

[ September 4 2020 at 1017 ndash classicthesis ]

90 Bibliography

Gohr Andre Alexander Hinneburg Rene Schult and Myra Spiliopou-lou (2009) bdquoTopic evolution in a stream of documentsldquo In Pro-ceedings of the 2009 SIAM International Conference on Data MiningSIAM pp 859ndash870 (cit on p 65)

Gollapalli Sujatha Das and Xiaoli Li (2015) bdquoEMNLP versus ACLAnalyzing NLP research over timeldquo In EMNLP pp 2002ndash2006

(cit on p 72)Griffiths Thomas L and Mark Steyvers (2004) bdquoFinding scientific top-

icsldquo In Proceedings of the National academy of Sciences 101suppl 1pp 5228ndash5235 (cit on p 65)

Griffiths Thomas L Michael I Jordan Joshua B Tenenbaum andDavid M Blei (2004) bdquoHierarchical topic models and the nestedChinese restaurant processldquo In Advances in neural information pro-cessing systems pp 17ndash24 (cit on p 66)

Hall David Daniel Jurafsky and Christopher D Manning (2008) bdquoStu-dying the history of ideas using topic modelsldquo In Proceedingsof the conference on empirical methods in natural language processingAssociation for Computational Linguistics pp 363ndash371 (cit onp 66)

Harriet Taylor (2015) Privacy will hit tipping point in 2016 httpwwwcnbccom20151109privacy-will-hit-tipping-point-

in-2016html (cit on p 78)He Jiangen and Chaomei Chen (2018) bdquoPredictive Effects of Novelty

Measured by Temporal Embeddings on the Growth of ScientificLiteratureldquo In Frontiers in Research Metrics and Analytics 3 p 9

(cit on pp 64 68 71)He Junxian Zhiting Hu Taylor Berg-Kirkpatrick Ying Huang and

Eric P Xing (2017) bdquoEfficient correlated topic modeling with topicembeddingldquo In Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ACM pp 225ndash233 (cit on pp 66 67)

He Qi Bi Chen Jian Pei Baojun Qiu Prasenjit Mitra and Lee Giles(2009) bdquoDetecting topic evolution in scientific literature How cancitations helpldquo In Proceedings of the 18th ACM conference on In-formation and knowledge management ACM pp 957ndash966 (cit onp 68)

Henderson Matthew Blaise Thomson and Jason D Williams (2014)bdquoThe second dialog state tracking challengeldquo In Proceedings of the15th Annual Meeting of the Special Interest Group on Discourse andDialogue (SIGDIAL) pp 263ndash272 (cit on p 15)

Henderson Matthew Blaise Thomson and Steve Young (2013) bdquoDeepneural network approach for the dialog state tracking challengeldquoIn Proceedings of the SIGDIAL 2013 Conference pp 467ndash471 (cit onp 14)

Hess Ann Hari Iyer and William Malm (2001) bdquoLinear trend analy-sis a comparison of methodsldquo In Atmospheric Environment 3530Visibility Aerosol and Atmospheric Optics pp 5211 ndash5222 issn1352-2310 doi httpsdoiorg101016S1352- 2310(01)00342-9 url httpwwwsciencedirectcomsciencearticlepiiS1352231001003429 (cit on p 75)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 91

Hofmann Thomas (1999) bdquoProbabilistic latent semantic analysisldquo InProceedings of the Fifteenth conference on Uncertainty in artificial in-telligence Morgan Kaufmann Publishers Inc pp 289ndash296 (cit onp 65)

Honnibal Matthew and Ines Montani (2017) bdquospacy 2 Natural lan-guage understanding with bloom embeddings convolutional neu-ral networks and incremental parsingldquo In To appear 7 (cit onp 15)

Howard Jeremy and Sebastian Ruder (2018) bdquoUniversal languagemodel fine-tuning for text classificationldquo In arXiv preprint arXiv180106146 (cit on p 44)

IEEE Computer Society (2016) Top 9 Computing Technology Trends for2016 httpswwwscientificcomputingcomnews201601top-9-computing-technology-trends-2016 (cit on p 78)

Ivancic Lucija Dalia Suša Vugec and Vesna Bosilj Vukšic (2019) bdquoRobo-tic Process Automation Systematic Literature Reviewldquo In In-ternational Conference on Business Process Management Springerpp 280ndash295 (cit on pp 1 2)

Jain Mohit Pratyush Kumar Ramachandra Kota and Shwetak NPatel (2018) bdquoEvaluating and informing the design of chatbotsldquoIn Proceedings of the 2018 Designing Interactive Systems ConferenceACM pp 895ndash906 (cit on p 8)

Khanpour Hamed Nishitha Guntakandla and Rodney Nielsen (2016)bdquoDialogue act classification in domain-independent conversationsusing a deep recurrent neural networkldquo In Proceedings of COL-ING 2016 the 26th International Conference on Computational Lin-guistics Technical Papers pp 2012ndash2021 (cit on p 13)

Kim Young-Bum Karl Stratos Ruhi Sarikaya and Minwoo Jeong(2015) bdquoNew transfer learning techniques for disparate label setsldquoIn Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference onNatural Language Processing (Volume 1 Long Papers) pp 473ndash482

(cit on p 43)Kolhatkar Varada Heike Zinsmeister and Graeme Hirst (2013) bdquoAn-

notating anaphoric shell nouns with their antecedentsldquo In Pro-ceedings of the 7th Linguistic Annotation Workshop and Interoperabil-ity with Discourse pp 112ndash121 (cit on pp 77 78)

Kontostathis April Leon M Galitsky William M Pottenger SomaRoy and Daniel J Phelps (2004) bdquoA survey of emerging trend de-tection in textual data miningldquo In Survey of text mining Springerpp 185ndash224 (cit on p 63)

Krippendorff Klaus (2011) bdquoComputing Krippendorffrsquos alpha- relia-bilityldquo In Annenberg School for Communication (ASC) DepartmentalPapers 43 (cit on p 78)

Kurata Gakuto Bing Xiang Bowen Zhou and Mo Yu (2016) bdquoLever-aging sentence-level information with encoder lstm for semanticslot fillingldquo In arXiv preprint arXiv160101530 (cit on p 13)

Landis J Richard and Gary G Koch (1977) bdquoThe measurement ofobserver agreement for categorical dataldquo In biometrics pp 159ndash174 (cit on p 78)

[ September 4 2020 at 1017 ndash classicthesis ]

92 Bibliography

Le Minh-Hoang Tu-Bao Ho and Yoshiteru Nakamori (2005) bdquoDe-tecting emerging trends from scientific corporaldquo In InternationalJournal of Knowledge and Systems Sciences 22 pp 53ndash59 (cit onp 68)

Le Quoc V and Tomas Mikolov (2014) bdquoDistributed Representationsof Sentences and Documentsldquo In ICML Vol 14 pp 1188ndash1196

(cit on p 74)Lee Ji Young and Franck Dernoncourt (2016) bdquoSequential short-text

classification with recurrent and convolutional neural networksldquoIn arXiv preprint arXiv160303827 (cit on p 13)

Leopold Henrik Han van der Aa and Hajo A Reijers (2018) bdquoIden-tifying candidate tasks for robotic process automation in textualprocess descriptionsldquo In Enterprise Business-Process and Informa-tion Systems Modeling Springer pp 67ndash81 (cit on p 2)

Levenshtein Vladimir I (1966) bdquoBinary codes capable of correctingdeletions insertions and reversalsldquo In Soviet physics doklady 8pp 707ndash710 (cit on p 25)

Liao Vera Masud Hussain Praveen Chandar Matthew Davis Yasa-man Khazaeni Marco Patricio Crasso Dakuo Wang Michael Mu-ller N Sadat Shami Werner Geyer et al (2018) bdquoAll Work andNo Playldquo In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems ACM p 3 (cit on p 8)

Lowe Ryan Thomas Nissan Pow Iulian Vlad Serban Laurent Char-lin Chia-Wei Liu and Joelle Pineau (2017) bdquoTraining end-to-enddialogue systems with the ubuntu dialogue corpusldquo In Dialogueamp Discourse 81 pp 31ndash65 (cit on p 22)

Luger Ewa and Abigail Sellen (2016) bdquoLike having a really bad PAthe gulf between user expectation and experience of conversa-tional agentsldquo In Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems ACM pp 5286ndash5297 (cit on p 8)

MacQueen James (1967) bdquoSome methods for classification and analy-sis of multivariate observationsldquo In Proceedings of the fifth Berkeleysymposium on mathematical statistics and probability Vol 1 OaklandCA USA pp 281ndash297 (cit on p 74)

Manning Christopher D Prabhakar Raghavan and Hinrich Schuumltze(2008) Introduction to Information Retrieval Web publication at in-formationretrievalorg Cambridge University Press (cit on p 74)

Markov Igor (2015) 13 Of 2015rsquos Hottest Topics In Computer ScienceResearch httpswwwforbescomsitesquora2015042213-of-2015s-hottest-topics-in-computer-science-research

fb0d46b1e88d (cit on p 78)Martin James H and Daniel Jurafsky (2009) Speech and language pro-

cessing An introduction to natural language processing computationallinguistics and speech recognition PearsonPrentice Hall Upper Sad-dle River

Mei Qiaozhu Deng Cai Duo Zhang and ChengXiang Zhai (2008)bdquoTopic modeling with network regularizationldquo In Proceedings ofthe 17th international conference on World Wide Web ACM pp 101ndash110 (cit on p 65)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 93

Meyerson Bernard and DiChristina Mariette (2016) These are the top10 emerging technologies of 2016 httpwww3weforumorgdocsGAC16_Top10_Emerging_Technologies_2016_reportpdf (cit onp 78)

Mikolov Tomas Kai Chen Greg Corrado and Jeffrey Dean (2013)bdquoEfficient estimation of word representations in vector spaceldquo InarXiv preprint arXiv13013781 (cit on p 74)

Miller Alexander Adam Fisch Jesse Dodge Amir-Hossein KarimiAntoine Bordes and Jason Weston (2016) bdquoKey-value memorynetworks for directly reading documentsldquo In (cit on p 16)

Mohanty Soumendra and Sachin Vyas (2018) bdquoHow to Compete inthe Age of Artificial Intelligenceldquo In (cit on pp III V 1 2)

Moiseeva Alena and Hinrich Schuumltze (2020) bdquoTRENDNERT A Bench-mark for Trend and Downtrend Detection in a Scientific DomainldquoIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial In-telligence (cit on p 71)

Moiseeva Alena Dietrich Trautmann Michael Heimann and Hin-rich Schuumltze (2020) bdquoMultipurpose Intelligent Process Automa-tion via Conversational Assistantldquo In arXiv preprint arXiv200102284 (cit on pp 21 41)

Monaghan Fergal Georgeta Bordea Krystian Samp and Paul Buite-laar (2010) bdquoExploring your research Sprinkling some saffron onsemantic web dog foodldquo In Semantic Web Challenge at the Inter-national Semantic Web Conference Vol 117 Citeseer pp 420ndash435

(cit on p 68)Monostori Laacuteszloacute Andraacutes Maacuterkus Hendrik Van Brussel and E West-

kaumlmpfer (1996) bdquoMachine learning approaches to manufactur-ingldquo In CIRP annals 452 pp 675ndash712 (cit on p 15)

Moody Christopher E (2016) bdquoMixing dirichlet topic models andword embeddings to make lda2vecldquo In arXiv preprint arXiv160502019 (cit on p 74)

Mou Lili Zhao Meng Rui Yan Ge Li Yan Xu Lu Zhang and ZhiJin (2016) bdquoHow transferable are neural networks in nlp applica-tionsldquo In arXiv preprint arXiv160306111 (cit on p 43)

Mrkšic Nikola Diarmuid O Seacuteaghdha Tsung-Hsien Wen Blaise Thom-son and Steve Young (2016) bdquoNeural belief tracker Data-drivendialogue state trackingldquo In arXiv preprint arXiv160603777 (citon p 14)

Nessma Joussef (2015) Top 10 Hottest Research Topics in Computer Sci-ence http www pouted com top - 10 - hottest - research -

topics-in-computer-science (cit on p 78)Nie Binling and Shouqian Sun (2017) bdquoUsing text mining techniques

to identify research trends A case study of design researchldquo InApplied Sciences 74 p 401 (cit on p 68)

Pan Sinno Jialin and Qiang Yang (2009) bdquoA survey on transfer learn-ingldquo In IEEE Transactions on knowledge and data engineering 2210pp 1345ndash1359 (cit on p 41)

Qu Lizhen Gabriela Ferraro Liyuan Zhou Weiwei Hou and Tim-othy Baldwin (2016) bdquoNamed entity recognition for novel types

[ September 4 2020 at 1017 ndash classicthesis ]

94 Bibliography

by transfer learningldquo In arXiv preprint arXiv161009914 (cit onp 43)

Radford Alec Jeffrey Wu Rewon Child David Luan Dario Amodeiand Ilya Sutskever (2019) bdquoLanguage models are unsupervisedmultitask learnersldquo In OpenAI Blog 18 (cit on p 44)

Reese Hope (2015) 7 trends for artificial intelligence in 2016 rsquoLike 2015on steroidsrsquo httpwwwtechrepubliccomarticle7-trends-for-artificial-intelligence-in-2016-like-2015-on-steroids

(cit on p 78)Rivera Maricel (2016) Is Digital Game-Based Learning The Future Of

Learning httpselearningindustrycomdigital-game-based-

learning-future (cit on p 78)Rotolo Daniele Diana Hicks and Ben R Martin (2015) bdquoWhat is an

emerging technologyldquo In Research Policy 4410 pp 1827ndash1843

(cit on p 64)Salatino Angelo (2015) bdquoEarly detection and forecasting of research

trendsldquo In (cit on pp 63 66)Serban Iulian V Alessandro Sordoni Yoshua Bengio Aaron Courville

and Joelle Pineau (2016) bdquoBuilding end-to-end dialogue systemsusing generative hierarchical neural network modelsldquo In Thirti-eth AAAI Conference on Artificial Intelligence (cit on p 11)

Sharif Razavian Ali Hossein Azizpour Josephine Sullivan and Ste-fan Carlsson (2014) bdquoCNN features off-the-shelf an astoundingbaseline for recognitionldquo In Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops pp 806ndash813 (citon p 43)

Shibata Naoki Yuya Kajikawa and Yoshiyuki Takeda (2009) bdquoCom-parative study on methods of detecting research fronts using dif-ferent types of citationldquo In Journal of the American Society for in-formation Science and Technology 603 pp 571ndash580 (cit on p 68)

Shibata Naoki Yuya Kajikawa Yoshiyuki Takeda and KatsumoriMatsushima (2008) bdquoDetecting emerging research fronts basedon topological measures in citation networks of scientific publica-tionsldquo In Technovation 2811 pp 758ndash775 (cit on p 68)

Small Henry (2006) bdquoTracking and predicting growth areas in sci-enceldquo In Scientometrics 683 pp 595ndash610 (cit on p 68)

Small Henry Kevin W Boyack and Richard Klavans (2014) bdquoIden-tifying emerging topics in science and technologyldquo In ResearchPolicy 438 pp 1450ndash1467 (cit on p 64)

Su Pei-Hao Nikola Mrkšic Intildeigo Casanueva and Ivan Vulic (2018)bdquoDeep Learning for Conversational AIldquo In Proceedings of the 2018Conference of the North American Chapter of the Association for Com-putational Linguistics Tutorial Abstracts pp 27ndash32 (cit on p 12)

Sutskever Ilya Oriol Vinyals and Quoc V Le (2014) bdquoSequence tosequence learning with neural networksldquo In Advances in neuralinformation processing systems pp 3104ndash3112 (cit on pp 16 17)

Trautmann Dietrich Johannes Daxenberger Christian Stab HinrichSchuumltze and Iryna Gurevych (2019) bdquoRobust Argument Unit Recog-nition and Classificationldquo In arXiv preprint arXiv190409688 (citon p 41)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 95

Tu Yi-Ning and Jia-Lang Seng (2012) bdquoIndices of novelty for emerg-ing topic detectionldquo In Information processing amp management 482pp 303ndash325 (cit on p 64)

Turing Alan (1949) London Times letter to the editor (cit on p 7)Ultes Stefan Lina Barahona Pei-Hao Su David Vandyke Dongho

Kim Inigo Casanueva Paweł Budzianowski Nikola Mrkšic Tsu-ng-Hsien Wen et al (2017) bdquoPydial A multi-domain statistical di-alogue system toolkitldquo In Proceedings of ACL 2017 System Demon-strations pp 73ndash78 (cit on p 15)

Vaswani Ashish Noam Shazeer Niki Parmar Jakob Uszkoreit LlionJones Aidan N Gomez Łukasz Kaiser and Illia Polosukhin (2017)bdquoAttention is all you needldquo In Advances in neural information pro-cessing systems pp 5998ndash6008 (cit on p 45)

Vincent Pascal Hugo Larochelle Yoshua Bengio and Pierre-AntoineManzagol (2008) bdquoExtracting and composing robust features withdenoising autoencodersldquo In Proceedings of the 25th internationalconference on Machine learning ACM pp 1096ndash1103 (cit on p 44)

Vodolaacuten Miroslav Rudolf Kadlec and Jan Kleindienst (2015) bdquoHy-brid dialog state trackerldquo In arXiv preprint arXiv151003710 (citon p 14)

Wang Alex Amanpreet Singh Julian Michael Felix Hill Omer Levyand Samuel R Bowman (2018) bdquoGlue A multi-task benchmarkand analysis platform for natural language understandingldquo InarXiv preprint arXiv180407461 (cit on p 44)

Wang Chong and David M Blei (2011) bdquoCollaborative topic modelingfor recommending scientific articlesldquo In Proceedings of the 17thACM SIGKDD international conference on Knowledge discovery anddata mining ACM pp 448ndash456 (cit on p 66)

Wang Xuerui and Andrew McCallum (2006) bdquoTopics over time anon-Markov continuous-time model of topical trendsldquo In Pro-ceedings of the 12th ACM SIGKDD international conference on Knowl-edge discovery and data mining ACM pp 424ndash433 (cit on pp 65ndash67)

Wang Zhuoran and Oliver Lemon (2013) bdquoA simple and generic be-lief tracking mechanism for the dialog state tracking challengeOn the believability of observed informationldquo In Proceedings ofthe SIGDIAL 2013 Conference pp 423ndash432 (cit on p 14)

Weizenbaum Joseph et al (1966) bdquoELIZAmdasha computer program forthe study of natural language communication between man andmachineldquo In Communications of the ACM 91 pp 36ndash45 (cit onp 7)

Wen Tsung-Hsien Milica Gasic Nikola Mrksic Pei-Hao Su DavidVandyke and Steve Young (2015) bdquoSemantically conditioned lstm-based natural language generation for spoken dialogue systemsldquoIn arXiv preprint arXiv150801745 (cit on p 14)

Wen Tsung-Hsien David Vandyke Nikola Mrksic Milica Gasic LinaM Rojas-Barahona Pei-Hao Su Stefan Ultes and Steve Young(2016) bdquoA network-based end-to-end trainable task-oriented dia-logue systemldquo In arXiv preprint arXiv160404562 (cit on pp 1116 17 22)

[ September 4 2020 at 1017 ndash classicthesis ]

96 Bibliography

Werner Alexander Dietrich Trautmann Dongheui Lee and RobertoLampariello (2015) bdquoGeneralization of optimal motion trajecto-ries for bipedal walkingldquo In pp 1571ndash1577 (cit on p 41)

Williams Jason D Kavosh Asadi and Geoffrey Zweig (2017) bdquoHybridcode networks practical and efficient end-to-end dialog controlwith supervised and reinforcement learningldquo In arXiv preprintarXiv170203274 (cit on p 17)

Williams Jason Antoine Raux and Matthew Henderson (2016) bdquoThedialog state tracking challenge series A reviewldquo In Dialogue ampDiscourse 73 pp 4ndash33 (cit on p 12)

Wu Chien-Sheng Richard Socher and Caiming Xiong (2019) bdquoGlobal-to-local Memory Pointer Networks for Task-Oriented DialogueldquoIn arXiv preprint arXiv190104713 (cit on pp 15ndash17)

Wu Yonghui Mike Schuster Zhifeng Chen Quoc V Le Moham-mad Norouzi Wolfgang Macherey Maxim Krikun Yuan CaoQin Gao Klaus Macherey et al (2016) bdquoGooglersquos neural ma-chine translation system Bridging the gap between human andmachine translationldquo In arXiv preprint arXiv160908144 (cit onp 45)

Xie Pengtao and Eric P Xing (2013) bdquoIntegrating Document Cluster-ing and Topic Modelingldquo In Proceedings of the 30th Conference onUncertainty in Artificial Intelligence (cit on p 74)

Yan Zhao Nan Duan Peng Chen Ming Zhou Jianshe Zhou andZhoujun Li (2017) bdquoBuilding task-oriented dialogue systems foronline shoppingldquo In Thirty-First AAAI Conference on Artificial In-telligence (cit on p 14)

Yogatama Dani Michael Heilman Brendan OrsquoConnor Chris DyerBryan R Routledge and Noah A Smith (2011) bdquoPredicting a scien-tific communityrsquos response to an articleldquo In Proceedings of the Con-ference on Empirical Methods in Natural Language Processing Asso-ciation for Computational Linguistics pp 594ndash604 (cit on p 65)

Young Steve Milica Gašic Blaise Thomson and Jason D Williams(2013) bdquoPomdp-based statistical spoken dialog systems A reviewldquoIn Proceedings of the IEEE 1015 pp 1160ndash1179 (cit on p 17)

Zaino Jennifer (2016) 2016 Trends for Semantic Web and Semantic Tech-nologies http www dataversity net 2017 - predictions -

semantic-web-semantic-technologies (cit on p 78)Zhou Hao Minlie Huang et al (2016) bdquoContext-aware natural lan-

guage generation for spoken dialogue systemsldquo In Proceedingsof COLING 2016 the 26th International Conference on ComputationalLinguistics Technical Papers pp 2032ndash2041 (cit on pp 14 15)

Zhu Yukun Ryan Kiros Rich Zemel Ruslan Salakhutdinov RaquelUrtasun Antonio Torralba and Sanja Fidler (2015) bdquoAligningbooks and movies Towards story-like visual explanations by watch-ing movies and reading booksldquo In Proceedings of the IEEE interna-tional conference on computer vision pp 19ndash27 (cit on p 44)

[ September 4 2020 at 1017 ndash classicthesis ]

  • Abstract
  • Abstract
  • Acknowledgements
  • Contents
  • List of Figures
    • List of Figures
      • List of Tables
        • List of Tables
          • Acronyms
            • Acronyms
              • 1 Introduction
                • 11 Outline
                  • E-learning Conversational Assistant
                    • 2 Foundations
                      • 21 Introduction
                      • 22 Conversational Assistants
                        • 221 Taxonomy
                        • 222 Conventional Architecture
                        • 223 Existing Approaches
                          • 23 Target Domain amp Task Definition
                          • 24 Summary
                            • 3 Multipurpose IPA via Conversational Assistant
                              • 31 Introduction
                                • 311 Outline and Contributions
                                  • 32 Model
                                    • 321 OMB+ Design
                                    • 322 Preprocessing
                                    • 323 Natural Language Understanding
                                    • 324 Dialogue Manager
                                    • 325 Meta Policy
                                    • 326 Response Generation
                                      • 33 Evaluation
                                        • 331 Automated Evaluation
                                        • 332 Human Evaluation amp Error Analysis
                                          • 34 Structured Dialogue Acquisition
                                          • 35 Summary
                                          • 36 Future Work
                                            • 4 Deep Learning Re-Implementation of Units
                                              • 41 Introduction
                                                • 411 Outline and Contributions
                                                  • 42 Related Work
                                                  • 43 BERT Bidirectional Encoder Representations from Transformers
                                                  • 44 Data amp Descriptive Statistics
                                                  • 45 Named Entity Recognition
                                                    • 451 Model Settings
                                                      • 46 Next Action Prediction
                                                        • 461 Model Settings
                                                          • 47 Evaluation and Results
                                                          • 48 Error Analysis
                                                          • 49 Summary
                                                          • 410 Future Work
                                                              • Clustering-based Trend Analyzer
                                                                • 5 Foundations
                                                                  • 51 Motivation and Challenges
                                                                  • 52 Detection of Emerging Research Trends
                                                                    • 521 Methods for Topic Modeling
                                                                    • 522 Topic Evolution of Scientific Publications
                                                                      • 53 Summary
                                                                        • 6 Benchmark for Scientific Trend Detection
                                                                          • 61 Motivation
                                                                            • 611 Outline and Contributions
                                                                              • 62 Corpus Creation
                                                                                • 621 Underlying Data
                                                                                • 622 Stratification
                                                                                • 623 Document representations amp Clustering
                                                                                • 624 Trend amp Downtrend Estimation
                                                                                • 625 (Down)trend Candidates Validation
                                                                                  • 63 Benchmark Annotation
                                                                                    • 631 Inter-annotator Agreement
                                                                                    • 632 Gold Trends
                                                                                      • 64 Evaluation Measure for Trend Detection
                                                                                      • 65 Trend Detection Baselines
                                                                                        • 651 Configurations
                                                                                        • 652 Experimental Setup and Results
                                                                                          • 66 Benchmark Description
                                                                                          • 67 Summary
                                                                                          • 68 Future Work
                                                                                            • 7 Discussion and Future Work
                                                                                            • Bibliography
Page 4: Statistical Natural Language Processing Methods for

guage Within this thesis we realized the IPA conversational botthat takes care of routine and time-consuming tasks regularly per-formed by human tutors of an online mathematical course Thisbot is deployed in a real-world setting within the OMB+ mathemat-ical platform Conducting experiments for this part we observedtwo possibilities to build the conversational agent in industrial set-tings ndash first with purely rule-based methods considering the miss-ing training data and individual aspects of the target domain (iee-learning) Second we re-implemented two of the main system com-ponents (ie Natural Language Understanding (NLU) and DialogueManager (DM) units) using the current state-of-the-art deep-learningarchitecture (ie Bidirectional Encoder Representations from Trans-formers (BERT)) and investigated their performance and potential useas a part of a hybrid model (ie containing both rule-based and ma-chine learning methods)

The second part of the thesis (Chapter 5 ndash Chapter 6) considers anIPA subproblem within the predictive analytics domain and addressesthe task of scientific trend forecasting Predictive analytics forecastsfuture outcomes based on historical and current data Therefore us-ing the benefits of advanced analytics models an organization canfor instance reliably determine trends and emerging topics and thenmanipulate it while making significant business decisions (ie invest-ments) In this work we dealt with the trend detection task ndash specif-ically we addressed the lack of publicly available benchmarks forevaluating trend detection algorithms We assembled the benchmarkfor the detection of both scientific trends and downtrends (ie topicsthat become less frequent overtime) To the best of our knowledgethe task of downtrend detection has not been addressed before Theresulting benchmark is based on a collection of more than one mil-lion documents which is among the largest that has been used fortrend detection before and therefore offers a realistic setting for thedevelopment of trend detection algorithms

IV

[ September 4 2020 at 1017 ndash classicthesis ]

Z U S A M M E N FA S S U N G

Robotergesteuerte Prozessautomatisierung (RPA) ist eine Art von Soft-ware-Bots die manuelle menschliche Taumltigkeiten wie die Eingabevon Daten in das System die Anmeldung in Benutzerkonten oderdie Ausfuumlhrung einfacher aber sich wiederholender Arbeitsablaumlufenachahmt (Mohanty and Vyas 2018) Einer der Hauptvorteile undgleichzeitig Nachteil der RPA-bots ist jedoch deren Faumlhigkeit die ge-stellte Aufgabe punktgenau zu erfuumlllen Einerseits ist ein solches Sys-tem in der Lage die Aufgabe akkurat sorgfaumlltig und schnell auszu-fuumlhren Andererseits ist es sehr anfaumlllig fuumlr Veraumlnderungen in definier-ten Szenarien Da der RPA-Bot fuumlr eine bestimmte Aufgabe konzipiertist ist es oft nicht moumlglich ihn an andere Domaumlnen oder sogar fuumlreinfache Aumlnderungen in einem Arbeitsablauf anzupassen (Mohantyand Vyas 2018) Diese Unfaumlhigkeit sich an veraumlnderte Bedingungenanzupassen fuumlhrte zu einem weiteren Verbesserungsbereich fuumlr RPA-bots ndash den Intelligenten Prozessautomatisierungssystemen (IPA)

IPA-Bots kombinieren RPA mit Kuumlnstlicher Intelligenz (AI) und koumln-nen komplexe und kognitiv anspruchsvollere Aufgaben erfuumlllen dieuA Schlussfolgerungen und natuumlrliches Sprachverstaumlndnis erfordernDiese Systeme uumlbernehmen zeitaufwaumlndige und routinemaumlszligige Auf-gaben ermoumlglichen somit einen intelligenten Arbeitsablauf und be-freien Fachkraumlfte fuumlr die Durchfuumlhrung komplizierterer AufgabenBisher wurden die IPA-Techniken hauptsaumlchlich im Bereich der Bild-verarbeitung eingesetzt In der letzten Zeit wurde die natuumlrlicheSprachverarbeitung (NLP) jedoch auch zu einem der potenziellen An-wendungen fuumlr IPA und zwar aufgrund von der Faumlhigkeit die mensch-liche Sprache zu interpretieren NLP-Methoden werden eingesetztum groszlige Mengen an Textdaten zu analysieren und auf verschiedeneAnfragen zu reagieren Auch wenn die verfuumlgbaren Daten unstruk-turiert sind oder kein vordefiniertes Format haben (zB E-Mails) oderwenn die in einem variablen Format vorliegen (zB Rechnungen ju-ristische Dokumente) dann werden ebenfalls die NLP Techniken an-gewendet um die relevanten Informationen zu extrahieren die dannzur Loumlsung verschiedener Probleme verwendet werden koumlnnen

NLP im Rahmen von IPA beschraumlnkt sich jedoch nicht auf die Extrak-tion relevanter Daten aus Textdokumenten Eine der zentralen An-wendungen von IPA sind Konversationsagenten die zur Interaktionzwischen Mensch und Maschine eingesetzt werden Konversations-agenten erfahren enorme Nachfrage da sie in der Lage sind einegroszlige Anzahl von Benutzern gleichzeitig zu unterstuumltzen und dabeiin einer natuumlrlichen Sprache kommunizieren Die Implementierungeines Chatsystems umfasst verschiedene Arten von NLP-Teilaufgabenbeginnend mit dem Verstaumlndnis der natuumlrlichen Sprache (zB Ab-sichtserkennung Extraktion von Entitaumlten) uumlber das Dialogmanage-

V

[ September 4 2020 at 1017 ndash classicthesis ]

ment (zB Festlegung der naumlchstmoumlglichen Bot-Aktion) bis hin zurResponse-Generierung Ein typisches Dialogsystem fuumlr IPA-Zweckeuumlbernimmt in der Regel unkomplizierte Kundendienstanfragen (zBBeantwortung von FAQs) so dass sich die Mitarbeiter auf komplexereAnfragen konzentrieren koumlnnen

Diese Dissertation umfasst zwei Bereiche die durch das breitereThema vereint sind naumlmlich die Intelligente Prozessautomatisierung(IPA) unter Verwendung statistischer Methoden der natuumlrlichen Sprach-verarbeitung (NLP)

Der erste Block dieser Arbeit (Kapitel 2 ndash Kapitel 4) befasst sichmit der Impementierung eines Konversationsagenten fuumlr IPA-Zweckeinnerhalb der E-Learning-Domaumlne Wie bereits erwaumlhnt sind Chat-bots eine der zentralen Anwendungen fuumlr die IPA-Domaumlne da siezeitaufwaumlndige Aufgaben in einer natuumlrlichen Sprache effektiv aus-fuumlhren koumlnnen Der IPA-Kommunikationsbot der in dieser Arbeitrealisiert wurde kuumlmmert sich ebenfalls um routinemaumlszligige und zeit-aufwaumlndige Aufgaben die sonst von Tutoren in einem Online-Mathe-matikkurs in deutscher Sprache durchgefuumlhrt werden Dieser Botist in der taumlglichen Anwendung innerhalb der mathematischen Platt-form OMB+ eingesetzt Bei der Durchfuumlhrung von Experimenten be-obachteten wir zwei Moumlglichkeiten den Konversationsagenten im in-dustriellen Umfeld zu entwickeln ndash zunaumlchst mit rein regelbasiertenMethoden unter Bedingungen der fehlenden Trainingsdaten und be-sonderer Aspekte der Zieldomaumlne (dh E-Learning) Zweitens habenwir zwei der Hauptsystemkomponenten (SprachverstaumlndnismodulDialog-Manager) mit dem derzeit fortschrittlichsten Deep LearningAlgorithmus reimplementiert und die Performanz dieser Komponen-ten untersucht

Der zweite Teil der Doktorarbeit (Kapitel 5 ndash Kapitel 6) betrachtetein IPA-Problem innerhalb des Vorhersageanalytik-Bereichs Vorher-sageanalytik zielt darauf ab Prognosen uumlber zukuumlnftige Ergebnisseauf der Grundlage von historischen und aktuellen Daten zu erstellenDaher kann ein Unternehmen mit Hilfe der Vorhersagesysteme zBdie Trends oder neu entstehende Themen zuverlaumlssig bestimmen unddiese Informationen dann bei wichtigen Geschaumlftsentscheidungen (zBInvestitionen) einsetzen In diesem Teil der Arbeit beschaumlftigen wiruns mit dem Teilproblem der Trendprognose ndash insbesondere mit demFehlen oumlffentlich zugaumlnglicher Benchmarks fuumlr die Evaluierung vonTrenderkennungsalgorithmen Wir haben den Benchmark zusammen-gestellt und veroumlffentlicht um sowohl Trends als auch Abwaumlrtstrendszu erkennen Nach unserem besten Wissen ist die Aufgabe der Ab-waumlrtstrenderkennung bisher nicht adressiert worden Der resultie-rende Benchmark basiert auf einer Sammlung von mehr als einerMillion Dokumente der zu den groumlszligten gehoumlrt die bisher fuumlr dieTrenderkennung verwendet wurden und somit einen realistischenRahmen fuumlr die Entwicklung von Trenddetektionsalgorithmen bietet

VI

[ September 4 2020 at 1017 ndash classicthesis ]

A C K N O W L E D G E M E N T S

This thesis was possible due to several people who contributed signif-icantly to my work I owe my gratitude to each of them for guidanceencouragement and inspiration during my doctoral study

First of all I would like to thank my advisor Prof Dr HinrichSchuumltze for his mentorship support and thoughtful guidance Myspecial gratitude goes for Prof Dr Ruedi Seiler and his fantastic teamat Integral-Learning GmbH I am thankful for your valuable inputinterest in our collaborative work and for your continuous supportI would also like to show gratitude to Prof Dr Alexander Fraser forco-advising support and helpful feedback

Finally none of this would be possible without my familyrsquos sup-port there are no words to express the depth of gratitude I feel forthem My earnest appreciation goes to my husband Dietrich thankyou for all the time you were by my side and for the encouragementyou provided me through these years I am very grateful to my par-ents for their understanding and backing at any moment I appreciatethe love of my grandparents it is hard to realize how significant theircare is for me Besides that I am also grateful to my sister Alexan-dra elder brother Denis and parents-in-law for merely being alwaysthere for me

VII

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

C O N T E N T S

List of Figures XIList of Tables XIIAcronyms XIII1 introduction 1

11 Outline 3

i e-learning conversational assistant 5

2 foundations 7

21 Introduction 7

22 Conversational Assistants 9

221 Taxonomy 9

222 Conventional Architecture 12

223 Existing Approaches 15

23 Target Domain amp Task Definition 18

24 Summary 19

3 multipurpose ipa via conversational assistant 21

31 Introduction 21

311 Outline and Contributions 22

32 Model 23

321 OMB+ Design 23

322 Preprocessing 23

323 Natural Language Understanding 25

324 Dialogue Manager 26

325 Meta Policy 30

326 Response Generation 32

33 Evaluation 37

331 Automated Evaluation 37

332 Human Evaluation amp Error Analysis 37

34 Structured Dialogue Acquisition 39

35 Summary 39

36 Future Work 40

4 deep learning re-implementation of units 41

41 Introduction 41

411 Outline and Contributions 42

42 Related Work 43

43 BERT Bidirectional Encoder Representations from Trans-formers 44

44 Data amp Descriptive Statistics 46

45 Named Entity Recognition 48

451 Model Settings 49

46 Next Action Prediction 49

461 Model Settings 49

47 Evaluation and Results 50

48 Error Analysis 58

IX

[ September 4 2020 at 1017 ndash classicthesis ]

X contents

49 Summary 58

410 Future Work 59

ii clustering-based trend analyzer 61

5 foundations 63

51 Motivation and Challenges 63

52 Detection of Emerging Research Trends 64

521 Methods for Topic Modeling 65

522 Topic Evolution of Scientific Publications 67

53 Summary 68

6 benchmark for scientific trend detection 71

61 Motivation 71

611 Outline and Contributions 72

62 Corpus Creation 73

621 Underlying Data 73

622 Stratification 73

623 Document representations amp Clustering 74

624 Trend amp Downtrend Estimation 75

625 (Down)trend Candidates Validation 75

63 Benchmark Annotation 77

631 Inter-annotator Agreement 77

632 Gold Trends 78

64 Evaluation Measure for Trend Detection 79

65 Trend Detection Baselines 80

651 Configurations 81

652 Experimental Setup and Results 81

66 Benchmark Description 82

67 Summary 83

68 Future Work 83

7 discussion and future work 85

bibliography 87

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F F I G U R E S

Figure 1 Taxonomy of dialogue systems 10

Figure 2 Example of a conventional dialogue architecture 12

Figure 3 OMB+ online learning platform 24

Figure 4 Ruled-Based System Dialogue flow 36

Figure 5 Macro F1 evaluation NAP ndash default modelComparison between German and multilingualBERT 52

Figure 6 Macro F1 test NAP ndash default model Compari-son between German and multilingual BERT 53

Figure 7 Macro F1 evaluation NAP ndash extended model 54

Figure 8 Macro F1 test NAP ndash extended model 55

Figure 9 Macro F1 evaluation NER ndash Comparison be-tween German and multilingual BERT 56

Figure 10 Macro F1 test NER ndash Comparison between Ger-man and multilingual BERT 57

Figure 11 (Down)trend detection Overall distribution ofpapers in the entire dataset 73

Figure 12 A trend candidate (positive slope) Topic Sen-timent and Emotion Analysis (using Social Me-dia Channels) 76

Figure 13 A downtrend candidate (negative slope) TopicXML 76

XI

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F TA B L E S

Table 1 Natural language representation of a sentence 13

Table 2 Rule 1 Admissible configurations 27

Table 3 Rule 2 Admissible configurations 27

Table 4 Rule 3 Admissible configurations 27

Table 5 Rule 4 Admissible configurations 28

Table 6 Rule 5 Admissible configurations 28

Table 7 Case 1 ID completeness 31

Table 8 Case 2 ID completeness 31

Table 9 Showcase 1 Short flow 33

Table 10 Showcase 2 Organisational question 33

Table 11 Showcase 3 Fallback and human-request poli-cies 34

Table 12 Showcase 4 Contextual question 34

Table 13 Showcase 5 Long flow 35

Table 14 General statistics for conversational dataset 46

Table 15 Detailed statistics on possible systems actionsin conversational dataset 48

Table 16 Detailed statistics on possible named entitiesin conversational dataset 48

Table 17 F1-results for the Named Entity Recognition task 51

Table 18 F1-results for the Next Action Prediction task 51

Table 19 Average dialogue accuracy computed for theNext Action Prediction task 51

Table 20 Trend detection List of gold trends 79

Table 21 Trend detection Example of MAP evaluation 80

Table 22 Trend detection Ablation results 82

XII

[ September 4 2020 at 1017 ndash classicthesis ]

acronyms XIII

AI Artificial Intelligence

API Application Programming Interface

BERT Bidirectional Encoder Representations from Transformers

BiLSTM Bidirectional Long Short Term Memory

CRF Conditional Random Fields

DM Dialogue Manager

DST Dialogue State Tracker

DSTC Dialogue State Tracking Challenge

EC Equivalence Classes

ES ElasticSearch

GM Gaussian Models

IAA Inter Annotator Agreement

ID Informational Dictionary

IPA Intelligent Process Automation

ITS Intelligent Tutoring System

LDA Latent Dirichlet Allocation

LSA Latent Semantic Analysis

LSTM Long Short Term Memory

MAP Mean Average Precision

MER Mutually Exclusive Rules

NAP Next Action Prediction

NER Named Entity Recognition

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OMB+ Online Mathematik Bruumlckenkurs

PL Policy Learning

pLSA probabilistic Latent Semantic Analysis

REST Representational State Transfer

RNN Recurrent Neural Network

RPA Robotic Process Automation

RegEx Regular Expressions

TL Transfer Learning

ToDS Task-oriented Dialogue System

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

1I N T R O D U C T I O N

Robotic Process Automation (RPA) is a type of software bots that im-itates manual human activities like entering data into a system log-ging into accounts or executing simple but repetitive workflows (Mo-hanty and Vyas 2018) In other words RPA-systems allow businessapplications to work without human involvement by replicating hu-man worker activities A further benefit of RPA-bots is that they aremuch cheaper compared to the employment of a human worker andthey are also able to work around-the-clock Overall advantages ofsoftware bots are hugely appealing and include cost savings accuracyand compliance while executing processes improved responsiveness(ie bots are faster than human workers) and finally such bots areusually agile and multi-skilled (Ivancic Vugec and Vukšic 2019)

However one of the main benefits and at the same time ndash draw-backs of the RPA-systems is their ability to precisely fulfill the as-signed task On the one hand the bot can carry out the task accu-rately and diligently On the other hand it is highly susceptible tochanges in defined scenarios Being designed for a particular taskthe RPA-bot is often not adaptable to other domains or even simplechanges in a workflow (Mohanty and Vyas 2018) This inability toadapt to changing conditions gave rise to a further area of improve-ment for RPA-bots ndash Intelligent Process Automation (IPA) systems

IPA-bots combine Robotic Process Automation (RPA) with ArtificialIntelligence (AI) and thus can perform complex and more cognitivelydemanding tasks that require reasoning and language understand-ing Hence IPA-bots went beyond automating simple ldquoclick tasksrdquo (asin RPA) and can accomplish jobs more intelligently ndash employing ma-chine learning algorithms and advanced analytics Such IPA-systemsundertake time-consuming and routine tasks and thus enable smartworkflows and free up skilled workers to accomplish higher-valueactivities

Previously IPA techniques were primarily employed within the com-puter vision domain Starting with the automation of simple displayclick-tasks and going towards sophisticated self-driving cars withtheir ability to interpret a stream of situational data obtained fromsensors and cameras However in recent times Natural LanguageProcessing (NLP) became one of the potential applications for IPA aswell due to its ability to understand and interpret human languageNLP methods are used to analyze large amounts of textual data andto respond to various inquiries Furthermore if the available datais unstructured and does not have any predefined format (ie cus-tomer emails) or if it is available in a variable format (ie invoices

1

[ September 4 2020 at 1017 ndash classicthesis ]

2 introduction

legal documents) then the Natural Language Processing (NLP) tech-niques are applied to extract the multiple relevant information Thisdata then can be utilized to solve distinct problems (ie natural lan-guage understanding predictive analytics) (Mohanty and Vyas 2018)For instance in the case of predictive analytics such a bot wouldmake forecasts about the future using current and historical data andtherethrough assist with sales or investments

However NLP in the context of Intelligent Process Automation (IPA)is not limited to the extraction of relevant data and insights from tex-tual documents One of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots) that are used toenable human-to-machine interaction This type of software is com-monly used for customer support or within recommendation- andbooking systems Conversational agents gain enormous demand dueto their ability to simultaneously support a large number of userswhile communicating in a natural language In a conventional chat-bot system a user provides input in a natural language the bot thenprocesses this query to extract potentially useful information and re-sponds with a relevant reply This process comprises multiple stagesand involves different types of NLP subtasks starting with NaturalLanguage Understanding (NLU) (eg intent recognition named en-tity extraction) and going towards dialogue management (ie deter-mining the next possible bots action considering the dialogue his-tory) and response generation (eg converting the semantic repre-sentation of next bots action to a natural language utterance) Typicaldialogue system for Intelligent Process Automation (IPA) purposesusually undertakes shallow customer support requests (eg answer-ing of FAQs) allowing human workers to focus on more sophisticatedinquiries

Natural Language Processing (NLP) techniques may also be usedindirectly for IPA purposes There are attempts to automatically iden-tify and classify tasks extracted from textual process descriptions asmanual user or automated The goal of such an NLP applicationis to reduce the effort required to identify suitable candidates for fur-ther robotic process automation (Friedrich Mendling and Puhlmann2011 Leopold Aa and Reijers 2018)

Intelligent Process Automation (IPA) is an emerging technology andevolves mainly due to the recognized potential benefit of combiningRobotic Process Automation (RPA) and Artificial Intelligence (AI) tech-niques to solve industrial problems (Ivancic Vugec and Vukšic 2019Mohanty and Vyas 2018) Industries are typically interested in be-ing responsive to customers and markets (Mohanty and Vyas 2018)Thus most customer support applications need to be real-time activeand highly robust to achieve higher client satisfaction rates The man-ual accomplishment of such tasks is often time-consuming expensiveand error-prone Furthermore due to the massive amounts of textualdata available for analysis it seems not to be feasible to process thisdata entirely by human means

[ September 4 2020 at 1017 ndash classicthesis ]

11 outline 3

11 outline

This thesis covers two topics united by the broader theme which isIntelligent Process Automation (IPA) utilizing statistical Natural Lan-guage Processing (NLP) methods

the first block of the thesis (Chapter 2 ndash Chapter 4) addressesthe challenge of implementing a conversational agent for Intelli-gent Process Automation (IPA) purposes within the e-learningfield As mentioned above chatbots are one of the central ap-plications of the IPA domain since they can effectively performtime-consuming tasks requiring natural language understand-ing allowing human workers to concentrate on more valuabletasks The IPA conversational bot that was realized within thisthesis takes care of routine and time-consuming tasks performedby human tutors within an online mathematical course Thisbot is deployed in a real-world setting and interact with stu-dents within the OMB+ mathematical platform Conducting ex-periments we observed two possibilities to build the conversa-tional agent in industrial settings ndash first with purely rule-basedmethods considering the missing training data and individualaspects of the target domain (ie e-learning) Second we re-implemented two of the main system components (ie Natu-ral Language Understanding (NLU) and Dialogue Manager (DM)units) using the current state-of-the-art deep-learning algorithm(ie Bidirectional Encoder Representations from Transformers(BERT)) and investigated their performance and potential use asa part of a hybrid model (ie containing both rule-based andmachine learning methods)

the second block of the thesis (Chapter 5 ndash Chapter 6) con-siders an Intelligent Process Automation (IPA) problem withinthe predictive analytics domain and addresses the task of scien-tific trend detection Predictive analytics makes forecasts aboutfuture outcomes based on historical and current data Henceorganizations can benefit through the advanced analytics mod-els by detecting trends and their behaviors and then employ-ing this information while making crucial business decisions(ie investments) In this work we deal with the subproblemof trend detection task ndash specifically we address the lack ofpublicly available benchmarks for the evaluation of trend de-tection algorithms Therefore we assembled the benchmark forboth trends and downtrends (ie topics that become less frequentovertime) detection To the best of our knowledge the task ofdowntrend detection has not been well addressed before Fur-thermore the resulting benchmark is based on a collection ofmore than one million documents which is among the largestthat has been used for trend detection before and therefore of-fers a realistic setting for developing trend detection algorithms

[ September 4 2020 at 1017 ndash classicthesis ]

4 introduction

All main chapters contain their specific introduction contribution listrelated work section and summary

[ September 4 2020 at 1017 ndash classicthesis ]

Part I

E - L E A R N I N G C O N V E R S AT I O N A L A S S I S TA N T

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

2F O U N D AT I O N S

In this block we address the challenge of designing and implement-ing a conversational assistant for Intelligent Process Automation (IPA)purposes within the e-learning domain The system aims to supporthuman tutors of the Online Mathematik Bruumlckenkurs (OMB+) plat-form during the tutoring round by undertaking routine and time-consuming responsibilities Specifically the system intelligently auto-mates the process of information accumulation and validation thatprecede every conversation Therethrough the IPA-bot frees up tutorsand allows them to concentrate on more complicated and cognitivelydemanding tasks

In this chapter we introduce the fundamental concepts required bythe topic of the first part of the thesis We begin with the foundationsto conversational assistants in Section 22 where we present the taxon-omy of existing dialogue systems in Section 221 give an overview ofthe conventional dialogue architecture in Section 222 with its mostessential components and describe the existing approaches for build-ing a conversational agent in Section 223 We finally introduce thetarget domain for our experiments in Section 23

21 introduction

Conversational Assistants - a type of software that interacts with peo-ple via written or spoken natural language have become widely pop-ularized in recent times Todayrsquos conversational assistants support abroad range of everyday skills including but not limited to creatinga reminder answering factoid questions and controlling smart homedevices

Initially the notion of artificial companion capable of conversingin a natural language was anticipated by Alan Turing in 1949 (Tur-ing 1949) Still the first significant step in this direction was donein 1966 with the appearance of Weizenbaums system ELIZA (Weizen-baum 1966) Its successor was the ALICE (Artificial LinguisticInternet Computer Entity) ndash a natural language processing dialoguesystem that conversed with a human by applying heuristical patternmatching rules Despite being remarkably popular and technologi-cally advanced for their days both systems were unable to pass theTuring test (Alan 1950) Following a long silent period the nextwave of growth of intelligent agents was caused by the advances inNatural Language Processing (NLP) This was enabled by access tolow-cost memory higher processing speed vast amounts of available

7

[ September 4 2020 at 1017 ndash classicthesis ]

8 foundations

data and sophisticated machine learning algorithms Hence one ofthe most notable leaps in the history of conversational agents beganin the year 2011 with intelligent personal assistants In that time Ap-ples Siri followed by Microsofts Cortana and Amazons Alexa in 2014and then Google Assistant in 2016 came to the scene The main focusof these agents was not to mimic humans as in ELIZA or ALICEbut to assist a human by answering simple factoid questions and set-ting notification tasks across a broad range of content Another typeof conversational agent called Task-oriented Dialogue System (ToDS)became a focus of industries in the year 2016 For the most part thesebots aimed to support customer service (Cui et al 2017) and respondto Frequently Asked Questions (Liao et al 2018) Their conversa-tional style provided more depth than those of personal assistantsbut a narrower range since they were patterned for task solving andnot for open-domain discussions

One of the central advantages of conversational agents and the rea-son why they are attracting considerable interest is their ability togive attention to several users simultaneously while supporting nat-ural language communication Due to this conversational assistantsgain momentum in numerous kinds of practical support applications(ie customer support) where a high number of users are involved atthe same time Classical engagement of human assistants is very cost-efficient and such support is usually not full-time accessible There-fore to automate human assistance is desirable but also an ambi-tious task Due to the lack of solid Natural Language Understand-ing (NLU) advancing beyond primary tasks is still challenging (Lugerand Sellen 2016) and even the most prosperous dialogue systemsoften fail to meet userrsquos expectations (Jain et al 2018) Significantchallenges involved in the design and implementation of a conversa-tional agent varying from domain engineering to Natural LanguageUnderstanding (NLU) and designing of an adaptive domain-specificconversational flow The quintessential problems and how-to ques-tions while building conversational assistants include

bull How to deal with variability and flexibility of language

bull How to get high-quality in-domain data

bull How to build meaningful representations

bull How to integrate commonsense and domain knowledge

bull How to build a robust system

Besides that current Natural Language Processing (NLP) and Ar-tificial Intelligence (AI) techniques have weaknesses when applied todomain-specific task-oriented problems especially if underlying datacomes from an industrial source As machine learning techniquesheavily rely on structured and cleaned training data to deliver consis-tent performance typically noisy real-world data without access toadditional knowledge databases negatively impacts the classificationaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 9

Despite that conversational agents could be successfully employedto solve less cognitive but still highly relevant tasks In this case adialogue agent could be operated as a part of the Intelligent ProcessAutomation (IPA) system where it takes care of straightforward butroutine and time-consuming functions performed in a natural lan-guage while allowing a human worker to focus on more cognitivedemanding processes

22 conversational assistants

Development and design of a particular dialogue system usually varyby its objectives among the most fundamental are

bull Purpose Should it be an open-ended dialogue system (egchit-chat) task-oriented system (eg customer support) or apersonal assistant (eg Google Assistant)

bull Skills What kind of behavior should a system demonstrate

bull Data Which data is required to develop (resp train) specificblocks and how could it be assembled and curated

bull Deployment Will the final system be integrated into a partic-ular software platform What are the challenges in choosingsuitable tools How will the deployment happen

bull Implementation Should it be a rule-based information-retrievalmachine-learning or a hybrid system

In the following chapter we will take a look at the most commontaxonomies of dialogue systems and different design and implemen-tation strategies

221 Taxonomy

According to the goals conversational agents could be classified intotwo main categories task-oriented systems (eg for customer support)and social bots (eg for chit-chat purposes) Furthermore systemsalso vary regarding their implementation characteristics Below weprovide a detailed overview of the most common variations

task-oriented dialogue systems are famous for their abilityto lead a dialogue with the ultimate goal of solving a userrsquos problem(see Figure 1) A customer support agent or a flightrestaurant book-ing system exemplify this class of conversational assistants Suchsystems are more meaningful and useful as those with an entirely so-cial goal however they usually affect more complications during theimplementation stage Since Task-oriented Dialogue System (ToDS) re-quire a precise understanding of the userrsquos intent (ie goal) the Natu-ral Language Understanding (NLU) segment must perform flawlessly

[ September 4 2020 at 1017 ndash classicthesis ]

10 foundations

Besides that if applied in an industrial setting ToDS are usually builtmodular which means they mostly rely on handcrafted rules and pre-defined templates and are restricted in their abilities Personal assis-tants (ie Google Assistant) are members of the task-oriented familyas well Though compared to standard agents they involve more so-cial conversation and could be used for entertainment purposes (egldquoOk Google tell me a jokerdquo) The latter is typically not available inconventional task-oriented systems

social bots and chit-chat systems in contrast regularly donot hold any specific goal and are designed for small-talk and enter-tainment conversations (see Figure 1) Those agents are usually end-to-end trainable and highly data-driven which means they requirelarge amounts of training data However due to entirely learnabletraining such systems often produce inappropriate responses mak-ing them extremely unreliable for practical applications Alike chit-chat systems social bots are rich in social conversation however theypossess an ability to solve uncomplicated user requests and conducta more meaningful dialogue (Fang et al 2018) Those requests arenot the same as in task-oriented systems and are mostly pointed toanswer simple factoid questions

Figure 1 Taxonomy of dialogue systems according to the measure of theirtask accomplishment and social contribution Figure originatesfrom Fang et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 11

The aforementioned system types could be decomposed farther ac-cording to the implementation characteristics

open domain amp closed domain A Task-oriented Dialogue Sys-tem (ToDS) is typically built within a closed domain setting Thespace of potential inputs and outputs is restricted since the sys-tem attempts to achieve a specific goal Customer support orshopping assistants are cases of closed domain problems (Wenet al 2016) These systems do not need to be able to talk aboutoff-site topics but they have to fulfill their particular tasks asefficiently as possible

Whereas chit-chat systems do not assume a well-defined goalor intention and thus could be built within an open domainThis means the conversation can progress in any direction andwithout any or minimal functional purpose (Serban et al 2016)However the infinite number of various topics and the fact thata certain amount of commonsense is required to create reason-able responses makes an open-domain setting a hard problem

content-driven amp user-centric Social bots and chit-chat sys-tems are typically utilizing a content-driven dialogue mannerThat means that their responses are based on the large and dy-namic content derived from the daily web mining and knowl-edge graphs This approach builds a dialogue based on trendycontent from diverse sources and thus it is an ideal way forsocial bots to catch usersrsquo attention and bring a user into thedialogue

User-centric approaches in turn assume precise language un-derstanding to detect the sentiment of usersrsquo statements Ac-cording to the sentiment the dialogue manager learns usersrsquopersonalities handles rapid topic changes and tracks engage-ment In this case the dialogue builds on specific user interests

long amp short conversations Models with a short conversationstyle attempt to create a single response to a single input (egweather forecast) In the case with a long conversation a systemhas to go through multiple turns and to keep track of what hasbeen said before (ie dialogue history) The longer the conver-sation the more challenging it is to train a system Customersupport conversations are typically long conversational threadswith multiple questions

response generation One of the essential elements of a dialoguesystem is a response generation This could be either done in aretrieval-based manner or by using generative models (ie NaturalLanguage Generation (NLG))

The former usually utilizes predefined responses and heuristicsto select a relevant response based on the input and context Theheuristic could be a rule-based expression match (ie RegularExpressions (RegEx)) or an ensemble of machine learning clas-

[ September 4 2020 at 1017 ndash classicthesis ]

12 foundations

sifiers (eg entity extraction prediction of next action) Suchsystems do not generate any original text instead they select aresponse from a fixed set of predefined candidates These mod-els are usually used in an industrial setting where the systemmust show high robustness performance and interpretabilityof the outcomes

The latter in contrast do not rely on predefined replies Sincesuch models are typically based on (deep-) machine learningtechniques they attempt to generate new responses from scratchThis is however a complicated task and is mostly used forvanilla chit-chat systems or in a research domain where thestructured training data is available

222 Conventional Architecture

In this and the following subsection we review the pipeline and meth-ods for Task-oriented Dialogue System (ToDS) Such agents are typi-cally composed of several components (Figure 2) addressing diversetasks required to facilitate a natural language dialogue (WilliamsRaux and Henderson 2016)

Figure 2 Example of a conventional dialogue architecture with its maincomponents in the bounding box Language Understanding (NLU)Dialogue Manager (DM) Response Generation (NLG) Figure origi-nates from NAACL 2018 Tutorial Su et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 13

A conventional dialogue architecture implements this with the fol-lowing components

natural language understanding unit has the following ob-jectives it parses the user input classifies the domain (ie cus-tomer support restaurant booking not required for a singledomain systems) and intent (ie find-flight book-taxi) and ex-tracts entities (ie Italian food Berlin)

Generally NLU unit maps every user utterance into semanticframes that are predefined according to different scenarios Ta-ble 1 illustrates an example of sentence representation wherethe departure-location (Berlin) and the arrival-location (Munich)are specified as slot values and the domain (airline travel) andintent (find-flight) are specified as well Typically there are twokinds of possible representations The first one is the utterancecategory represented in the form of the userrsquos intent The sec-ond could be the word-level information extraction in the formof Named Entity Recognition (NER) or Slot Filling tasks

Sentence Find flight from Berlin to Munich today

SlotsConcepts O O O B-dept O B-arr B-date

Named Entity O O O B-city O B-city O

Intent Find_Flight

Domain Airline Travel

Table 1 An illustrative example of a sentence representation1

A dialog act classification sub-module also known as intent clas-sification is dedicated to detecting the primary userrsquos goal atevery dialogue state It labels each userrsquos utterance with oneof the predefined intents Deep neural networks can be directlyapplied to this conventional classification problem (KhanpourGuntakandla and Nielsen 2016 Lee and Dernoncourt 2016)

Slot filling is another sub-module for language understandingUnlike intent detection (ie classification problem) slot fillingis defined as a sequence labeling problem where words in thesequence are labeled with semantic tags The input is a se-quence of words and the output is a sequence of slots onefor every word Variations of Recurrent Neural Network (RNN)architectures (LSTM BiLSTM) (Deoras et al 2015) and (attention-based) encoder-decoder architectures (Kurata et al 2016) havebeen employed for this task

[ September 4 2020 at 1017 ndash classicthesis ]

14 foundations

dialogue manager unit is subdivided into two components ndashfirst is a Dialogue State Tracker (DST) that holds the conver-sational state and second is a Policy Learning (PL) that definesthe systems next action

Dialogue State Tracker (DST) is the core component to guaranteea robust dialogue flow since it manages the userrsquos intent at ev-ery turn The conventional methods which are broadly appliedin commercial implementations often adopt handcrafted rulesto pick the most probable candidate (Goddeau et al 1996 Wangand Lemon 2013) among the predefined Though a variety ofstatistical approaches have emerged since the Dialogue StateTracking Challenge (DSTC)2 was arranged by Jason D Williamsin year 2013 Among the deep-learning based approaches isthe Belief Tracker proposed by Henderson Thomson and Young(2013) It employs a sliding window to produce a sequence ofprobability distributions over an arbitrary number of possiblevalues (Chen et al 2017) In the work of Mrkšic et al (2016) au-thors introduced a Neural Belief Tracker to identify the slot-valuepairs The system used the user intents preceding the user in-put the user utterance itself and a candidate slot-value pairwhich it needs to decide about as the input Then the systemiterated overall candidate slot-value pairs to determine whichones have just been expressed by the user (Chen et al 2017)Finally Vodolaacuten Kadlec and Kleindienst proposed a Hybrid Di-alog State Tracker that achieved the state-of-the-art performancefor DSTC2 shared task Their model used a separate-learned de-coder coupled with a rule-based system

Based on the state representation from the Dialogue State Tracker(DST) the PL unit generates the next system action that is thendecoded by the Natural Language Generation (NLG) moduleUsually a rule-based agent is used to initialize the system (Yanet al 2017) and then supervised learning is applied to theactions generated by these rules Alternatively the dialoguepolicy can be trained end-to-end with reinforcement learningto manage the system making policies toward the final perfor-mance (Cuayaacutehuitl Keizer and Lemon 2015)

natural language generation unit transforms the next actioninto a natural language response Conventional approaches toNatural Language Generation (NLG) typically adopt a template-based style where a set of rules is defined to map every nextaction to a predefined natural language response Such ap-proaches are simple generally error-free and easy to controlhowever their implementation is time-consuming and the styleis repetitive and hardly scalable

Among the deep learning techniques is the work by Wen etal (2015) that introduced neural network-based approaches toNLG with an LSTM-based structure Later Zhou and Huang

2Currently Dialog System Technology Challenge

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 15

(2016) adopted this model along with the question informa-tion semantic slot values and dialogue act types to generatemore precise answers Finally Wu Socher and Xiong (2019)proposed Global to Local Memory Pointer Networks (GLMP) ndash anarchitecture that incorporates knowledge bases directly into alearning framework (due to the Memory Networks component)and is able to re-use information from the user input (due tothe Pointer Networks component)

In particular cases dialogue flow can include speech recognitionand speech synthesis units (if managing a spoken natural languagesystem) and connection to third-party APIs to request informationstored in external databases (eg weather forecast) (as depicted inFigure 2)

223 Existing Approaches

As stated above a conventional task-oriented dialogue system is im-plemented in a pipeline manner That means that once the systemreceives a user query it interprets it and acts according to the dia-logue state and the corresponding policy Additionally based on un-derstanding it may access the knowledge base to find the demandedinformation there Finally a system transforms a systemrsquos next ac-tion (and if given an extracted information) into its surface form as anatural language response (Monostori et al 1996)

Individual components (ie Natural Language Understanding (NLU)Dialogue Manager (DM) Natural Language Generation (NLG)) of aparticular conversational system could be implemented using differ-ent approaches starting with entirely rule- and template-based meth-ods and going towards hybrid approaches (using learnable componentsalong with handcrafted units) and end-to-end trainable machine learn-ing methods

rule-based approaches Though many of the latest researchapproaches handle Natural Language Understanding (NLU) and Nat-ural Language Generation (NLG) units by using statistical NaturalLanguage Processing (NLP) models (Bocklisch et al 2017 Burtsevet al 2018 Honnibal and Montani 2017) most of the industriallydeployed dialogue systems still utilise handcrafted features and rulesfor the state tracking action prediction intent recognition and slotfilling tasks (Chen et al 2017 Ultes et al 2017) So most of thePyDial3 framework modules offer rule-based implementations usingRegular Expressions (RegEx) rule-based dialogue trackers (Hender-son Thomson and Williams 2014) and handcrafted policies Alsoin order to map the next action to text Ultes et al (2017) suggest rulesalong with a template-based response generation

3PyDial Framework httpwwwcamdialorgpydial

[ September 4 2020 at 1017 ndash classicthesis ]

16 foundations

The rule-based approach ensures robustness and stable performancethat is crucial for industrial systems that communicate with a largenumber of users simultaneously However it is highly expensive andtime-consuming to deploy a real dialogue system built in that man-ner The major disadvantage is that the usage of handcrafted systemsis restricted to a specific domain and possible adaptation requiresextensive manual engineering

end-to-end learning approaches Due to the recent advanceof end-to-end neural generative models (Collobert et al 2011) manyefforts have been made to build an end-to-end trainable architecturefor conversational agents Rather than using the traditional pipeline(see Figure 2) the end-to-end model is conceived as a single module(Chen et al 2017)

Wen et al (2016) followed by Bordes Boureau and Weston (2016)were among the pioneers who adapted an end-to-end trainable ap-proach for conversational systems Their approach was to treat dia-logue learning as the problem of learning a mapping from dialoguehistories to system responses and to apply an encoder-decoder model(Sutskever Vinyals and Le 2014) to train the entire system The pro-posed model still had its significant limitations (Glasmachers 2017)such as missing policies to learn a dialogue flow and the inability toincorporate additional knowledge that is especially crucial for train-ing a task-oriented agent To overcome the first problem Dhingraet al (2016) introduced an end-to-end reinforcement learning approachthat jointly trains dialogue state tracker and policy learning unitsThe second weakness was addressed by Eric and Manning (2017)who augmented existing recurrent network architectures with a dif-ferentiable attention-based key-value retrieval mechanism over theentries of a knowledge base This approach was inspired by Key-Value Memory Networks (Miller et al 2016) Later on Wu Socherand Xiong (2019) proposed Global to Local Memory Pointer Networks(GLMP) where the authors incorporated knowledge bases directlyinto a learning framework In their model a global memory encoderand a local memory decoder are employed to share external knowl-edge The encoder encodes dialogue history modifies global con-textual representation and generates a global memory pointer Thedecoder first produces a sketch response with empty slots Next ittransfers the global memory pointer to filter the external knowledgefor relevant information and then instantiates the slots via the localmemory pointers

Despite having better adaptability compared to any rule-based sys-tem and being simpler to train end-to-end approaches remain unattain-able for commercial conversational agents that operating on real-world data A well and carefully constructed task-oriented dialoguesystem in an observed domain using handcrafted rules (in NLU andDST units) and predefined responses for NLG still outperforms the

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 17

end-to-end systems due to its robustness (Glasmachers 2017 WuSocher and Xiong 2019)

hybrid approaches Though end-to-end learning is an attrac-tive solution for conversational systems current techniques are data-intensive and require large amounts of dialogues to learn simple be-haviors To overcome this barrier Williams Asadi and Zweig (2017)introduce Hybrid Code Networks (HCNs) which is an ensemble ofretrieval and trainable units System utilizes an Recurrent NeuralNetwork (RNN) for Natural Language Understanding (NLU) blockdomain-specific knowledge that are encoded as software (ie API

calls) and custom action templates used for Natural Language Gen-eration (NLG) unit Compared to existing end-to-end methods theauthors report that their approach considerably reduces the amountof data required for training (Williams Asadi and Zweig 2017) Ac-cording to the authors HCNs achieve state-of-the-art performance onthe bAbI dialog dataset (Bordes Boureau and Weston 2016) and out-perform two commercial customer dialogue systems4

Along with this work Wen et al (2016) propose a neural network-based model that is end-to-end trainable but still modularly con-nected In their work authors treat dialogue as a sequence to se-quence mapping problem (Sutskever Vinyals and Le 2014) aug-mented with the dialogue history and the current database searchoutcome (Wen et al 2016) The authors explain the learning processas follows At every turn the system takes a user input representedas a sequence of tokens and converts it into two internal represen-tations a distributed representation of the intent and a probabilitydistribution over slot-value pairs called the belief state (Young et al2013) The database operator then picks the most probable values inthe belief state to form the search result At the same time the intentrepresentation and the belief state are transformed and combined bya Policy Learning (PL) unit to form a single vector representing thenext system action This system action is then used to generate asketch response token by the token The final system response iscomposed by substituting the actual values of the database entriesinto the sketch sentence structure (Wen et al 2016)

Hybrid models appear to replace the established rule- and template-based approaches currently utilized in an industrial setting The per-formance of the most Natural Language Understanding (NLU) com-ponents (ie Named Entity Recognition (NER) domain and intentclassification) is reasonably high to apply them for the developmentof commercial conversational agents Whereas the Natural LanguageGeneration (NLG) unit could still be realized as a template-based so-lution by employing predefined response candidates and predictingthe next system action employing deep learning systems

4Internal Microsoft Research datasets in customer support and troubleshootingdomains were employed for training and evaluation

[ September 4 2020 at 1017 ndash classicthesis ]

18 foundations

23 target domain amp task definition

MUMIE5 is an open-source e-learning platform for studying and teach-ing mathematics and computer science Online Mathematik Bruumlck-enkurs (OMB+) is one of the projects powered by MUMIE SpecificallyOnline Mathematik Bruumlckenkurs (OMB+)6 is a German online learningplatform that assists students who are preparing for an engineeringor computer science study at a university

The coursersquos central purpose is to assist students in reviving theirmathematical skills so that they can follow the upcoming universitycourses This online course is thematically segmented into 13 partsand includes free mathematical classes with theoretical and practi-cal content Every chapter begins with an overview of its sectionsand consists of explanatory texts with built-in videos examples andquick checks of understanding Furthermore various examinationmodes to practice the content (ie training exercises) and to checkthe understanding of the theory (ie quiz) are available Besidesthat OMB+ provides a possibility to get assistance from a human tu-tor Usually the students and tutors interact via a chat interface inwritten form The language of communication is German

The current dilemma of the OMB+ platform is that the number ofstudents grows significantly every year but it is challenging to findqualified tutors ndash and it is expensive to hire more human tutors Thisresults in a longer waiting period for students until their problemscan be considered and processed In general all student questionscan be grouped into three main categories Organizational questions(ie course certificate) contextual questions (ie content theorem) andmathematical questions (exercises solutions) To assist a student witha mathematical question a tutor has to know the following regularinformation What kind of topic (or subtopic) a student has a problemwith At which examination mode (quiz chapter level training or exer-cise section level training or exercise or final examination) a studentis working right now And finally the exact question number and exactproblem formulation This means that a tutor has to ask the same on-boarding questions every time a new conversation starts This is how-ever very time-consuming and could be successfully solved within anIntelligent Process Automation (IPA) system

The work presented in Chapter 3 and Chapter 4 was enabled by thecollaboration with MUMIE and Online Mathematik Bruumlckenkurs (OMB+)team and addresses the abovementioned problem Within Chapter 3we implemented an Intelligent Process Automation (IPA) dialogue sys-tem with rule- and retrieval-based algorithms that interacts with stu-dents and performs the onboarding This system is multipurposeon the one hand it eases the assistance process for human tutors byreducing repetitive and time-consuming activities and therefore al-lows workers to focus on more challenging tasks Second interacting

5Mumie httpswwwmumienet6httpswwwombplusde

[ September 4 2020 at 1017 ndash classicthesis ]

24 summary 19

with students it augments the resources with structured and labeledtraining data The latter in turn allowed us to re-train specific sys-tem components utilizing machine and deep learning methods Wedescribe this procedure in Chapter 4

We report that the earlier collected human-to-human data from theOMB+ platform could not be used for training a machine learning sys-tem since it is highly unstructured and contains no direct question-answer matching which is crucial for trainable conversational algo-rithms We analyzed these conversations to get insight from themthat we used to define the dialogue flow

We also do not consider the automatization of mathematical ques-tions in this work since this task would require not only preciselanguage understanding and extensive commonsense knowledge butalso the accurate perception of mathematical notations The tutor-ing process at OMB+ differentiates a lot from the standard tutoringmethods Here instructors attempt to navigate students by givinghints and tips instead of providing an out-of-the-box solution Exist-ing human-to-human dialogues revealed that such kind of task couldbe even challenging for human tutors since it is not always directlyperceptible what precisely the student does not understand or hasdifficulties with Furthermore dialogues containing mathematical ex-planations could last for a long time and the topic (ie intent andfinal goal) can shift multiple times during the conversation

To our knowledge this task would not be feasible to sufficientlysolve with current machine learning algorithms especially consider-ing the missing training data A rule-based system would not bethe right solution for this purpose as well due to a large number ofcomplex rules which would need to be defined Thus in this workwe focus on the automatization of less cognitively demanding taskswhich are still relevant and must be solved in the first instance to easethe tutoring process for instructors

24 summary

Conversational agents are a highly demanding solution for indus-tries due to their potential ability to support a large number of usersmulti-threaded while performing a flawless natural language conver-sation Many messenger platforms are created and custom chatbotsin various constellations emerge daily Unfortunately most of thecurrent conversational agents are often disappointing to use due tothe lack of substantial natural language understanding and reason-ing Furthermore Natural Language Processing (NLP) and ArtificialIntelligence (AI) techniques have limitations when applied to domain-specific and task-oriented problems especially if no structured train-ing data is available

Despite that conversational agents could be successfully applied tosolve less cognitive but still highly relevant tasks ndash that is within the

[ September 4 2020 at 1017 ndash classicthesis ]

20 foundations

Intelligent Process Automation (IPA) domain Such IPA-bots wouldundertake tedious and time-consuming responsibilities to free up theknowledge worker for more complex and intricate tasks IPA-botscould be seen as a member of goal-oriented dialogue family and arefor the most part performed either in a rule-based manner or throughhybrid models when it comes to a real-world setting

[ September 4 2020 at 1017 ndash classicthesis ]

3M U LT I P U R P O S E I PA V I A C O N V E R S AT I O N A LA S S I S TA N T

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research describedin this chapter was carried out in its entirety by the au-thor of this thesis The other authors of the publicationacted as advisers At the time of writing this thesis thesystem described in this work was deployed at the OnlineMathematik Bruumlckenkurs (OMB+) platform

As we outlined in the introduction Intelligent Process Automa-tion (IPA) is an emerging technology with the primary goal of assist-ing the knowledge worker by taking care of repetitive routine andlow-cognitive tasks Conversational assistants that interact with usersin a natural language are a potential application for the IPA domainSuch smart virtual agents can support the user by responding to var-ious inquiries and performing routine tasks carried out in a naturallanguage (ie customer support)

In this work we tackle the challenge of implementing an IPA con-versational agent in a real-world industrial context and conditions ofmissing training data Our system has two meaningful benefits Firstit decreases monotonous and time-consuming tasks and hence letsworkers concentrate on more intellectual processes Second interact-ing with users it augments the resources with structured and labeledtraining data The latter in turn can be used to retrain a systemutilizing machine and deep learning methods

31 introduction

Conversational dialogue systems can give attention to several userssimultaneously while supporting natural communication They arethus exceptionally needed for practical assistance applications wherea high number of users are involved at the same time Regularly di-alogue agents require a lot of natural language understanding andcommonsense knowledge to hold a conversation reasonably and in-telligently Therefore nowadays it is still challenging to substitutehuman assistance with an intelligent agent completely However adialogue system could be successfully employed as a part of the In-telligent Process Automation (IPA) application In this case it wouldtake care of simple but routine and time-consuming tasks performed

21

[ September 4 2020 at 1017 ndash classicthesis ]

22 multipurpose ipa via conversational assistant

in a natural language while allowing a human worker to focus onmore cognitive demanding processes

Recent research in the dialogue generation domain is conducted byemploying Artificial Intelligence (AI) techniques like machine- anddeep learning (Lowe et al 2017 Wen et al 2016) However thosemethods require high-quality data for training that is often not di-rectly available for domain-specific and industrial problems Eventhough companies gather and store vast volumes of data it is of-ten low-quality with regards to machine learning approaches (Daniel2015) Especially if it concerns dialogue data which has to be prop-erly structured as well annotated Whereas if trained on artificiallycreated datasets (which is often the case in the research setting) thesystems can solve only specific problems and are hardly adaptableto other domains Hence despite the popularity of deep learningend-to-end models one still needs to rely on traditional pipelines inpractical dialogue engineering mainly while introducing a new do-main

311 Outline and Contributions

This work addresses the challenge of implementing a dialogue systemfor IPA purposes within the practical e-learning domain under condi-tions of missing training data The chapter is structured as followsIn Section 32 we introduce our method and describe the design ofthe rule-based dialogue architecture with its main components suchas Natural Language Understanding (NLU) Dialogue Manager (DM)and Natural Language Generation (NLG) units In Section 33 wepresent the results of our dual-focused evaluation and Section 34 de-scribes the format of collected structured data Finally we concludein Section 35

Our contributions within this work are as follows

bull We implemented a robust conversational system for IPA pur-poses within a practical e-learning domain and under the con-ditions of missing training (ie dialogue) data

bull The system has two main objectives

ndash First it reduces repetitive and time-consuming activitiesand allows workers of the e-learning platform to focussolely on mathematical inquiries

ndash Second by interacting with users it augments the resourceswith structured and partially labeled training data for fur-ther implementation of trainable dialogue components

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 23

32 model

The central purpose of the introduced system is to interact with stu-dents at the beginning of every chat conversation and gather informa-tion on the topic (and sub-topic) examination mode and level questionnumber and exact problem formulation Such onboarding process savestime for human tutors and allows them to handle mathematical ques-tions solely Furthermore the system is realized in a way such that itaccumulates labeled and structured dialogues in the background

Figure 4 displays the intact conversation flow After the systemaccepts user input it analyzes it at different levels and extracts infor-mation if such is provided If some of the information required fortutoring is missing the system asks the student to provide it Whenall the information points are collected they are automatically vali-dated and forwarded to a human tutor The latter can then directlyproceed with the assistance process In the following we explain thekey components of the system

321 OMB+ Design

Figure 3 illustrates the internal structure and design of the OMB+ plat-form It has topics and sub-topics as well as four various examinationmodes Each topic (Figure 3 tag 1) correspond to a chapter level and al-ways has sub-topics (Figure 3 tag 2) that correspond to a section levelExamination modes training and exercise are ambiguous because theycorrespond to either a chapter (Figure 3 tag 3) or a section (Figure 3tag 5) level and it is important to differentiate between them sincethey contain different types of content The mode final examination(Figure 3 tag 4) always corresponds to a chapter level whereas quiz(Figure 3 tag 5) can belong only to a section level According to thedesign of the OMB+ platform there are several ways of how a possibledialogue flow can proceed

322 Preprocessing

In a natural language conversation one may respond in various formsmaking the extraction of data from user-generated text challengingdue to potential misspellings or confusable spellings (ie Aufgabe 1aAufgabe 1 (a)) Hence to facilitate a substantial recognition of intentsand extraction of entities we normalize (eg correct misspellingsdetect synonyms) and preprocess every user input before moving for-ward to the Natural Language Understanding (NLU) module

[ September 4 2020 at 1017 ndash classicthesis ]

24 multipurpose ipa via conversational assistant

Figure 3 OMB+ Online Learning Platform where 1 is the Topic (corre-sponds to a chapter level) 2 is a Sub-Topic (corresponds to a sectionlevel) 3 is chapter level examination mode 4 is the Final Exam-ination Mode (available only for chapter level) 5 are the Examina-tion Modes Exercise Training (available at section levels) and Quiz(available only at section level) and 6 is the OMB+ Chat

We perform both linguistic and domain-specific preprocessing whichinclude the following steps

bull lowercasing and stemming of words in the input query

bull removal of stop words and punctuation

bull all mentions of X in mathematical formulations are removed toavoid confusion with roman number 10 (ldquoXrdquo)

bull in a combination of the type word ldquoKapitelAufgaberdquo + digitwritten as a word (ie ldquoeinsrdquo ldquozweitesrdquo etc) word is replacedwith a digit (ldquoKapitel einsrdquo rarr ldquoKapitel 1rdquo ldquoim fuumlnften Kapitelrdquo rarrldquoim Kapitel 5rdquo) roman numbers are replaced with a digit as well(ldquoKapitel IVrdquorarr ldquoKapitel 4rdquo)

bull detected ambiguities are normalized (eg ldquoTrainingsaufgaberdquorarrldquoTrainingrdquo)

bull recognized misspellings resp type errors are corrected (egldquoDifeernzialrechnungrdquorarr ldquoDifferentialrechnungrdquo)

bull permalinks are parsed and analyzed From each permalink it ispossible to extract topic examination mode and question num-ber

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 25

323 Natural Language Understanding

We implement the Natural Language Understanding (NLU) unit uti-lizing handcrafted rules Regular Expressions (RegEx) and Elasctic-search1 API

intent classification According to the design of the OMB+platform all student questions can be classified into three cate-gories organizational questions (ie course certificate) contextualquestions (ie content theorem) and mathematical questions (exer-cises solutions) To classify the input query by its intent (ie cat-egory) we employ weighted key-word information in the formof handcrafted rules We assume that specific words are explic-itly associated with a corresponding intent (eg theorem or rootdenote the mathematical question feedback or certificate denoteorganizational inquire) If no intent could be ordered then itis assumed that the NLU unit was not capable of understand-ing and the intent is unknown In this case the virtual agentrequests the user to provide an intent manually by picking onefrom the mentioned three options The queries from organiza-tional and theoretical categories are directly handed over to ahuman tutor while mathematical questions are analyzed by theautomated agent for further information extraction

entity extraction On the next step the system retrieves the en-tities from a user message We specify five following prerequi-site entities which have to be collected before the dialogue canbe handed over to a human tutor topic sub-topic examina-tion mode and level and question number This part is imple-mented using ElasticSearch (ES) and RegEx To facilitate the useof ES we indexed the OMB+ web-page to an internal databaseBesides indexing the titles of topics and sub-topics we also pro-vided supplementary information on possible synonyms andwriting styles We additionally filed OMB+ permalinks whichdirect to the site pages To query the resulting database we em-ploy the internal Elasticsearch multi match function and set theminimum should match parameter to 20 This parameter spec-ifies the number of terms that must match for a document tobe considered relevant Besides that we adopted fuzziness withthe maximum edit distance set to 2 characters The fuzzy queryuses similarity based on Levenshtein edit distance (Levenshtein1966) Finally the system produces a ranked list of potentialmatching entries found in the database within the predefinedrelevance threshold (we set it to θ=15) We then pick the mostprobable entry as the right one and select the correspondingentity from the user input

Overall the Natural Language Understanding (NLU) module re-ceives the user input as a preprocessed text and examines it across all

1httpswwwelasticcoproductselasticsearch

[ September 4 2020 at 1017 ndash classicthesis ]

26 multipurpose ipa via conversational assistant

predefined RegEx statements and for a match in the ElasticSearch (ES)database We use ES to retrieve entities for the topic sub-topic andexamination mode whereas RegEx is used to extract the informationon a question number Every time the entity is extracted it is filled inthe Informational Dictionary (ID) The ID has the following six slotsto be filled in topic sub-topic examination level examination modequestion number and exact problem formulation (this information isrequested on further steps)

324 Dialogue Manager

A conventional Dialogue Manager (DM) consists of two interactingcomponents The first component is the Dialogue State Tracker (DST)that maintains a representation of the current conversational stateand the second is the Policy Learning (PL) that defines the next systemaction (ie response)

In our model every agentrsquos next action is determined by the stateof the previously obtained information accumulated in the Informa-tional Dictionary (ID) For instance if the system recognizes that thestudent works on the final examination it also knows (defined by hand-coded logic in the predefined rules) that there is no necessity to askfor sub-topic because the final examination always corresponds to achapter level (due to the layout of OMB+ platform) If the system rec-ognizes that the user struggles in solving quiz it has to ask for boththe corresponding topic and sub-topic because the quiz always relatesto a section level

To discover all of the potential conversational flows we implementMutually Exclusive Rules (MER) which indicate that two events e1and e2 are mutually exclusive or disjoint since they cannot both oc-cur at the same time (ie the intersection of these events is emptyP(A cap B) = 0) Additionally we defined transition and mapping rulesHence only particular entities can co-occur while others must beexcluded from the specific rule combination Assume a topic t =

Geometry According to MER this topic can co-occur with one of thepertaining sub-topics defined at the OMB+ (ie s = Angel) Thus thelevel of the examination would be l = Section (but l 6= Chapter)because the student already reported a sub-section he or she is work-ing on In this specific case the examination mode could either bee isin [TrainingExerciseQuiz] but not e = Final_ExaminationThis is because e = Final_Examination can co-occur only with atopic (chapter-level) but not with a sub-topic (section-level) The ques-tion number q could be arbitrary and has to co-occur with every othercombination

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 27

This allows the formal solution to be found Assume a list of alltheoretically possible dialogue states S = [topic sub-topic trainingexercise chapter level section level quiz final examination questionnumber] and for each element sn in S is true that

sn =

1 if information is given

0 otherwise(1)

This gives all general (resp possible) dialogue states without ref-erence to the design of the OMB+ platform However to make thedialogue states fully suitable for the OMB+ from the general stateswe select only those which are valid To define the validness of thestate we specify the following five MERrsquos

R1 Topic

not T T

Table 2 Rule 1 ndash Admissible topic configurations

Rule (R1) in Table 2 denotes admissible configurations for topic andmeans that the topic could be either given (T ) or not (notT )

R2 Examination Mode

not TR not E not Q not FE

TR not E not Q not FE

not TR E not Q not FE

not TR not E Q not FE

not TR not E not Q FE

Table 3 Rule 2 ndash Admissible examination mode configurations

Rule (R2) in Table 3 denotes that either no information on the ex-amination mode is given or examination mode is Training (TR) orExercise (E) or Quiz (Q) or Final Examination (FE) but not more thanone mode at the same time

R3 Level

not CL not SL

CL not SL

not CL SL

Table 4 Rule 3 ndash Admissible level configurations

Rule (R3) in Table 4 indicates that either no level information isprovided or the level corresponds to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

[ September 4 2020 at 1017 ndash classicthesis ]

28 multipurpose ipa via conversational assistant

R4 Examination amp Level

not TR not E not SL not CL

TR not E SL not CL

TR not E not SL CL

TR not E not SL not CL

not TR E SL not CL

not TR E not SL CL

not TR E not SL not CL

Table 5 Rule 4 ndash Admissible examination mode (only for Training and Exer-cise) and corresponding level configurations

Rule (R4) in Table 5 means that Training (TR) and Exercise (E) ex-amination modes can either belong to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

R5 Topic amp Sub-Topic

not T not ST

T not ST

not T ST

T ST

Table 6 Rule 5 ndash Admissible topic and corresponding sub-topic configura-tions

Rule (R5) in Table 6 symbolizes that we could be either given onlya topic (T ) or the combination of topic and sub-topic (ST ) at the sametime or only sub-topic or no information on this point at all

We then define a valid dialogue state as a dialogue state that meetsall requirements of the abovementioned rules

foralls(State(s)and R1minus5(s))rarr Valid(s) (2)

Once we have the valid states for dialogues we perform a mappingfrom each valid dialogue state to the next possible systems action Forthat we first define five transition rules The order of transition rulesis important

T1 = notT (3)

means that no topic (T ) is found in the ID (ie could not be ex-tracted from user input)

T2 = notEM (4)

indicates that no examination mode (EM) is found in the ID

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 29

T3 = EM where EM isin [TRE] (5)

denotes that the extracted examination mode (EM) is either Train-ing (TR) or Exercise (E)

T4 =

T notST TR SL

T notST E SL

T notST Q

(6)

means that no sub-topic (ST ) is provided by a user but ID either al-ready contains the combination of topic (T ) training (TR) and sectionlevel (SL) or the combination of topic exercise (E) and section levelor the combination of topic and quiz (Q)

T5 = notQNR (7)

indicates that no question number (QNR) was provided by a stu-dent (or could not be successfully extracted)

Finally we assumed the list of possible next actions for the system

A =

Ask for Topic

Ask for Examination Mode

Ask for Level

Ask for Sub-Topic

Ask for Question Nr

(8)

Following the transition rules we map each valid dialogue state tothe possible next action am in A

existaforalls(Valid(s)and T1(s))rarr a(s T) (9)

in the case where we do not have any topic provided the nextaction is to ask for the topic (T )

existaforalls(Valid(s)and T2(s))rarr a(s EM) (10)

if no examination mode is provided by a user (or it could not besuccessfully extracted from the user query) the next action is definedas ask for examination mode (EM)

existaforalls(Valid(s)and T3(s))rarr a(s L) (11)

in case where we know the examination mode EM isin [TrainingExercise] we have to ask about the level (ie training at chapter levelor training at section level) thus the next action is ask for level (L)

[ September 4 2020 at 1017 ndash classicthesis ]

30 multipurpose ipa via conversational assistant

existaforalls(Valid(s)and T4(s))rarr a(s ST) (12)

if no sub-topic is provided but the examination mode EM isin [Train-ing Exercise] at section level the next action is defined as ask for sub-topic (ST )

existaforalls(Valid(s)and T5(s))rarr a(s QNR) (13)

if no question number is provided by a user then the next action isask for question number (QNR)

Following this scheme we generate 56 state transitions which spec-ify the next system actions Being on a new conversation state wecompare the extracted (or updated) data in the Informational Dictio-nary (ID) with the valid dialogue states and select the mapped actionas the next systemrsquos action

flexibility of dst The DST transitions outlined above are meantto maintain the current configuration of the OMB+ learning platformStill further MERs could be effortlessly generated to create new tran-sitions Examining this we performed tests with the previous de-sign of the platform where all the topics except for the first onehad sub-topics Whereas the first topic contained both sub-topics andsub-sub-topics We could produce the missing transitions with the de-scribed approach The number of permissible transitions in this caseincreased from 56 to 117

325 Meta Policy

Since the proposed system is expected to operate in a real-world set-ting we had to develop additional policies that manage the dialogueflow and verify the systemrsquos accuracy We explain these policies be-low

completeness of informational dictionary The model au-tomatically verifies the completeness of the Informational Dic-tionary (ID) which is determined by the number of essentialslots filled in the informational dictionary There are 6 discretecases when the ID is deemed to be complete For example ifa user works on a final examination mode the agent does nothas to request a sub-topic or examination level Hence the ID

has to be filled only with data for a topic examination typeand question number Whereas if the user works on trainingmode the system has to collect information about the topic sub-topic examination level examination mode and the questionnumber Once the ID is intact it is provided to the verificationstep Otherwise the agent proceeds according to the next ac-tion The system extracts the information in each dialogue stateand therefore if the user gives updated information on any sub-

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 31

ject later in the dialogue history the corresponding slot will beupdated in the ID

Below are examples of two final cases (out of six) where Infor-mational Dictionary (ID) is considered to be complete

Case1 =

intent = Math

topic 6= None

exam mode = Final Examination

level = None

question nr 6= None

(14)

Table 7 Case 1 ndash Any topic examination mode is final examination exami-nation level does not matter any question number

Case2 =

intent = Math

topic 6= None

sub-topic 6= None

exam mode isin [Training Exercise]

level = Section

question nr 6= None

(15)

Table 8 Case 2 Any topic any related sub-topic examination mode is ei-ther training or exercise examination level is section any questionnumber

verification step After the system has collected all the essentialdata (ie ID is complete) it continues to the final verification stepHere the obtained data is presented to the current student inthe dialogue session The student is asked to review the exactnessof the data and if any entries are faulty to correct them TheInformational Dictionary (ID) is where necessary updated withthe user-provided data This procedure repeats until the userconfirms the correctness of the compiled data

fallback policy In the cases where the system fails to retrieveinformation from a studentrsquos query it re-asks a user and at-tempts to retrieve information from a follow-up response Themaximum number of re-ask trials is a parameter and set to threetimes (r = 3) If the system is still incapable of extracting in-formation the user input is considered as the ground truth andfilled to the appropriate slot in ID An exception to this ruleapplies where the user has to specify the intent manually Afterthree unclassified trials a dialogues session is directly handedover to a human tutor

[ September 4 2020 at 1017 ndash classicthesis ]

32 multipurpose ipa via conversational assistant

human request In every dialogue state a user can shift to a hu-man tutor For this a user can enter the ldquohumanrdquo (or ldquoMenschrdquoin German language) key-word Therefore every user messageis investigated for the presence of this key-word

326 Response Generation

The semantic representation of the systems next action is transformedinto a natural language utterance Hence each possible action ismapped to precisely one response that is predefined in the templateSome of the responses are fixed (ie ldquoWelches Kapitel bearbeitest dugeraderdquo) 2 Others have placeholders for custom values In the lattercase the utterance can be formulated dependent on the InformationalDictionary (ID)

Assume predefined response ldquoDeine Angaben waren wie folgt a)Kapitel K b) Abschnitt A c) Aufgabentyp M d) Aufgabe Q e)Ebene E Habe ich dich richtig verstanden Bitte antworte mit Ja oderNeinrdquo3 The slots [KAMQE] are the placeholders for the informa-tion stored in the ID During the conversation the system selects thetemplate according to the next action and fills the slots with extractedvalues if required After the extracted information is filled in the cor-responding placeholders in the utterance the final response could beas follows ldquoDeine Angaben waren wie folgt a) Kapitel I ElementaresRechnen b) Abschnitt Rechenregel fuumlr Potenzen c) Aufgabentyp Quiz d)Aufgabe 1(a) e) Ebene Abschnitt Habe ich dich richtig verstanden Bitteantworte mit Ja oder Neinrdquo4

Below we demonstrate five sample dialogues in original Germanlanguage with variable flows

bull short dialogue where certain information is already providedin the first studentrsquos input

bull organizational question case

bull contextual question case

bull example with the manual intent classification

bull example with fallback and human policies

bull long flow with entry correction on the final verification phase

2Translates to ldquoWhich chapter are you currently working onrdquo3Translates to ldquoYour details were as follows a) Chapter b) Section c) Examina-

tion Mode d) Question Number e) Level Did I understand you correctly Pleaseanswer yes or no

4Translates to ldquoYour details were as follows a) Elementary Calculus b) PowerLaws c) Quiz d) Question 1 (a) e) Section Level Did I understand you correctlyPlease answer yes or no

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 33

Student Hallo ich habe Schwierigkeiten beim Kapitel I Elementares Rech-nen mit der Uumlbungsaufgabe 1 a

Next-Action [Ask for a Level Chapter or Section]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel ldquoGeometrierdquo) oder um eine Aufgabe auf der Ebene einesAbschnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt ldquoWinkelrdquoim Kapitel ldquoGeometrierdquo) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlrAbschnitt

Student Kapitel

Next-Action [Final-Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Aufgabentyp Exercisec) Aufgabe 1ad) Ebene KapitelHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yep

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 9 Showcase 1 Short flow most of the information was extracted fromthe initial user input In bold are the extracted entities

Student Hallo ich habe eine Frage bzgl eines Zertifikates und zwar wiekann ich es austellen lassen

Next-Action [Organisational Question]Bot Es scheint sich wohl um eine organisatorische Frage zu handeln Bittefasse deine Frage in einem kurzen Text zusammen damit ich sie an meinenmenschlichen Kollegen weiterleiten kann

Student Wie bekomme ich ein Zertifikat

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 10 Showcase 2 Organisational Question Underlined are the key-words which point on the organisational intent

[ September 4 2020 at 1017 ndash classicthesis ]

34 multipurpose ipa via conversational assistant

Student hallo

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student ma

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student math

Next-Action [Ask for Topic]Bot Welches Kapitel bearbeitest du gerade

Student Lineare Gleichungssysteme

Next-Action [Ask for Examination Mode]Bot Bearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfung

Student human

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 11 Showcase 3 Manual intent classification Fallback and Human Re-quest policies (in bold type)

Student Halli-Hallo koumlnnte mir bitte jemand die Regel 72 (vertikale streck-ung und stauchung des graphen) erklaumlren Danke im Voraus

Next-Action [Contextual Question]Bot Es scheint sich wohl um eine inhaltliche Frage zu handeln Bitte fassedeine Frage in einem kurzen Text zusammen damit ich sie an meinen men-schlichen Kollegen weiterleiten kann

Student Ich habe Schwierigkeiten beim Verstehen der Regel 72 Was istgenau die vertikale Streckung des Graphen

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 12 Showcase 4 Contextual Question Underlined are the key-wordswhich point on the contextual intent

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 35

Student Hallo ich kann beim Kapitel 1 nicht weiterkommen Trainingsauf-gabe 1 (a)

Next-Action [Ask for Level]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel Geometrie) oder um eine Aufgabe auf der Ebene eines Ab-schnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt Winkel imKapitel Geometrie) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlr Ab-schnitt

Student ich glaube es ist eine Sektion

Next-Action [Ask for Sub-Topic]Bot Wie heiszligt der Abschnitt in dem sich die Aufgabe befindet auf die sichdeine Frage bezieht

Student ich bearbeite rechenregeln und potenzen

Next-Action [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Quizd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student nope

Next-Action [Ask_For]Bot Welche der folgenden Punkten habe ich falsch erkannt Schreibe bittewelche der folgenden Punkte es sind a b c d e

Student c

State-Transition [Correct-Fields]Bot Gib bitte die korrekte Information fuumlr c) ein

Student Training

State-Transition [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Trainingd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yes

State-Transition [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 13 Showcase 5 Long Flow Correction of entries In bold type are theextracted entities

[ September 4 2020 at 1017 ndash classicthesis ]

36 multipurpose ipa via conversational assistant

Figure 4 Dialogue Flow Abbreviations UNK - unknown ID - informa-tional dictionary RegEx - regular expressions

[ September 4 2020 at 1017 ndash classicthesis ]

33 evaluation 37

33 evaluation

In order to get the feedback on the quality functionality and use-fulness of the proposed model we evaluated it in two steps firstwith an automated method (Section 331) utilizing 130 manually an-notated conversations to verify the robustness of the system Secondwith the help of human tutors (Section 332) from OMB+ to exam-ine the user experience We describe the details as well as the mostcommon errors below

331 Automated Evaluation

To manage an automated evaluation we manually assembled a datasetwith 130 conversational flows We predefined viable initial user ques-tions user answers as well as gold system responses These flowscover most frequent dialogues that we previously observed in human-to-human dialogue data We examined our system by implementinga self-chatting evaluation bot The evaluation cycle is as follows

bull The system accepts the first question from a predefined dia-logue via an API request preprocesses and analyzes it to de-termine the intent and extract entities

bull Then it estimates the appropriate next action and responds ac-cordingly

bull This response is then compared to the systems gold answerIf the predicted result is correct then the system receives thenext predefined user input from the templated dialogue and re-sponds again as defined above This procedure proceeds untilthe dialogue terminates (ie ID is complete) Otherwise thesystem reports an unsuccessful case number

The system successfully passed this evaluation for all 130 cases

332 Human Evaluation amp Error Analysis

To evaluate the system on irregular cases we carried out experimentswith human tutors from the OMB+ platform The tutors are experi-enced regarding rare or complex issues ambiguous responses mis-spellings and other infrequent but still relevant problems which oc-cur during a natural language dialogue In the following we reviewsome common errors and describe additional observations

misspellings and confusable spelling frequently befall in theuser-generated text Since we attempt to let the conversationremain natural to provide better user experience we do notinstruct students for formal writing Therefore we have to ac-count for various writing issues One of the prevalent problems

[ September 4 2020 at 1017 ndash classicthesis ]

38 multipurpose ipa via conversational assistant

are misspellings German words are commonly long and canbe complex and because users often type quickly it can lead tothe wrong order of characters within a given word To tacklethis challenge we utilized fuzzy match within ES Though themaximum allowed edit distance in ElasticSearch (ES) is set to 2characters which means that all the misspellings beyond thisthreshold could not be correctly identified by ES (eg Differenzial-rechnung vs Differnetialrechnung) Another representative exam-ple would be the formulation of the section or question num-ber The equivalent information can be communicated in sev-eral discrete ways which has to be considered by RegEx unit(eg Aufgabe 5 a Aufgabe V a Aufgabe 5 (a)) A similar prob-lem occurs with confusable spelling (ie Differenzialrechnung vsDifferenzialgleichung)

We analyzed the cases stated above and improved our modelby adding some of the most common misspellings to the Elas-ticSearch (ES) database or handling them with RegEx during thepreprocessing step

elasticsearch threshold We observed both cases where thesystem failed to extract information although the user providedit and examples where ES extracted information not mentionedin a user query According to our understanding these errorsoccur due to the relevancy scoring algorithm of ElasticSearch (ES)where a documentrsquos score is a combination of textual similarityand other metadata based scores Our examination revealedthat ES mostly fails to extract the information if the user mes-sage is rather short (eg about 5 tokens) To overcome this diffi-culty we coupled the current input ut with the dialogue history(h = utmiddotmiddotmiddot tminus2) That eliminated the problem and improved theretrieval quality

To solve the opposite problem where ES extracts false infor-mation (or information that was not mentioned in a query)was more challenging We learned that the problem comesfrom short words or sub-words (ie suffix prefix) which ES

locates in a database and considers them credible enough TheES documentation suggests getting rid of stop words to elim-inate this behavior However this did not improve the searchAlso fine-tuning of ES parameters such as the relevance thresholdprefix length5 and minimum should match6 parameter did not re-sult in notable improvements To cope with this problem weimplemented a verification step where a user can edit the erro-neously retrieved data

5The amount of fuzzified characters This parameter helps to decrease the num-ber of terms which must be reviewed

6Indicates a number of terms that must match for a document to be consideredrelevant

[ September 4 2020 at 1017 ndash classicthesis ]

34 structured dialogue acquisition 39

34 structured dialogue acquisition

As we stated the implemented system not only supports the humantutor by assisting students but also collects structured and labeledtraining data in the background In a trial run of the rule-based sys-tem we accumulated a toy-dataset with dialogues The assembleddataset includes the following

bull Plain dialogues with unique dialogue-indexes

bull Plain ID information collected for the whole conversation

bull Pairs of questions (ie user requests) and responses (ie botreplies) with the unique dialogue- and turn-indexes

bull Triples in the form of (Question Next Action Response) Informa-tion on the next systemrsquos action could be employed to train aDM unit with (deep-) machine learning algorithms

bull On every conversational state we keep the entities that the sys-tem was able to extract along with their position in the utter-ance This information could be used to train a custom domain-specific NER model

35 summary

In this work we realized a conversational agent for IPA purposesthat addresses two problems First it decreases repetitive and time-consuming activities which allows workers of the e-learning plat-form to focus solely on mathematical and hence cognitively demand-ing questions Second by interacting with users it augments theresources with structured and labeled training data The latter canbe utilized for further implementation of learnable dialogue compo-nents

The realization of such a system was connected with multiple chal-lenges Among others were missing structured conversational dataambiguous or erroneous user-generated text and the necessity todeal with existing corporate tools and their design

The proposed conversational agent enabled us to accumulate struc-tured labeled data without any special efforts from the human (ietutors) side (eg manual annotation and post-processing of existingdata change of the conversational structure within the tutoring pro-cess) Once we collected structured dialogues we were able to re-train particular segments of the system with deep learning methodsand achieved consistent performance for both of the proposed tasksWe report the re-implemented elements in Chapter 4

At the time of writing this thesis the system described in this workwas deployed at the OMB+ platform

[ September 4 2020 at 1017 ndash classicthesis ]

40 multipurpose ipa via conversational assistant

36 future work

Possible extensions of our work are

bull In this work we implemented a rule-based dialogue modelfrom scratch and adjusted it for the specific domain (ie e-learning) The core of the model ndash is a dialogue manager thatdetermines the current state of the conversation and the possi-ble next action Rule-based systems are generally consideredto be hardly adaptable to new domains however our dialoguemanager proved to be flexible to slight modifications in a work-flow One of the possible directions of future work would bethus the investigation of the general adaptability of the dialoguemanager core to other scenarios and domains (eg differentcourse)

bull A further possible extension would we the introduction of dif-ferent languages The current model is implemented for the Ger-man language but the OMB+ platform provides this course alsoin English and Chinese Therefore it would be interesting toinvestigate whether it is possible to adjust the existing systemto other languages without drastic changes in the model

[ September 4 2020 at 1017 ndash classicthesis ]

4D E E P L E A R N I N G R E - I M P L E M E N TAT I O N O F U N I T S

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research described inthis chapter was carried out in its entirety by the authorof this thesis The other authors of the publication actedas advisors

In this chapter we address the challenge of re-implementation oftwo central components of our IPA dialogue system described in Chap-ter 3 by employing deep learning techniques Since the data for thetraining was collected in a (short) trial-run of the rule-based systemit is not sufficient to train a neural network from scratch Thereforewe utilize Transfer Learning (TL) methods and employ the previouslypre-trained Bidirectional Encoder Representations from Transform-ers (BERT) model that we fine-tune for two of our custom tasks

41 introduction

Data mining and machine learning technologies have gained notableadvancement in many research fields including but not limited tocomputer vision robotics advanced analytics and computational lin-guistics (Pan and Yang 2009 Trautmann et al 2019 Werner et al2015) Besides that in recent years deep neural networks with theirability to accurately map from inputs to outputs (ie labels imagessentences) while analyzing patterns in data became a potential solu-tion for many industrial automatization tasks (eg fake news detec-tion sentiment analysis)

Nevertheless conventional supervised models still have limitationswhen applied to real-world data and industrial tasks The first chal-lenge here is a training phase since a robust model requires an im-mense amount of structured and labeled data Still there are just afew cases where such data is publicly available (eg research datasets)but for most of the tasks domains and languages the data has tobe gathered over a long time The second hurdle is that if trainedon existing data from a distinct domain a supervised model cannotgeneralize to conditions that are different from those faced in thetraining dataset Most of the machine learning methods perform cor-rectly only under a general assumption the training and test dataare mapped to the same feature space and the same distribution (Panand Yang 2009) When the distribution changes most statistical mod-els need to be retrained from scratch using newly obtained training

41

[ September 4 2020 at 1017 ndash classicthesis ]

42 deep learning re-implementation of units

samples It is especially crucial if applying such approaches to real-world data that is usually very noisy and contains a large numberof scenarios In this case a model would be ill-trained to make deci-sions about novel patterns Furthermore for most of the real-worldapplications it is expensive or even not feasible at all to recollect therequired training data to retrain the model Hence the possibility totransfer the knowledge obtained during the training on existing datato the new unseen domain is one of the possible solutions for indus-trial problems Such a technique is called Transfer Learning (TL) andit allows dealing with the abovementioned scenarios by leveragingexisting labeled data of some related tasks or domains

411 Outline and Contributions

This work addresses the challenge of re-implementing two out ofthree central components in our rule-based IPA conversational assis-tant with deep learning methods As we discussed in Chapter 2 hy-brid dialogue models (ie consisting of both rule-based and trainablecomponents) appear to replace the established rule- and template-based methods which are to the most part employed in the in-dustrial setting To investigate this assumption and examine currentstate-of-the-art deep learning techniques we conducted experimentson our custom domain-specific data Since the available data werecollected in a trial-run and the obtained dataset was relatively smallto train a conventional machine learning model from scratch we uti-lized the Transfer Learning (TL) approach and fine-tuned the existingpre-trained model for our target domain

For the experiments we defined two tasks

bull First we considered the NER problem in a custom domain set-ting We defined a sequence labeling task and employed a Bidirec-tional Encoder Representations from Transformers (BERT) (De-vlin et al 2018) as one of the transfer learning methods widelyutilized in the NLP area We applied the model on our datasetand fine-tuned it for seven (7) domain-specific (ie e-learning)entities

bull Second we examined the effectiveness of transfer learning forthe dialogue manager core The DM is one of the fundamentalparts of the conversational agent and requires compelling learn-ing techniques to hold the conversation intelligently For thatexperiment we defined a classification task and applied the BERT

model to predict the systems next action for every given userinput in a conversation We then computed the macro F1-scorefor 14 possible classes (ie actions) and an average dialogueaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

42 related work 43

The main characteristics of both settings are

bull the set of named entities and labels (ie actions) between sourceand target domain is different

bull for both settings we utilized the BERT model in German andmultilingual versions with additional parameters

bull the dataset used to train the target model is small and consistsof highly domain-specific labels The latter is often appearingin industrial scenarios

We finally verified that the selected method performed reasonablywell on both tasks We reached the performance of 093 macro F1

points for Named Entity Recognition (NER) and 075 macro F1 pointsfor the Next Action Prediction (NAP) task Therefore we concludethat both NER and NAP components could be employed to substituteor extend the existing rule-based modules

42 related work

As discussed in the previous section the main benefit of TransferLearning (TL) is that it allows the domains and tasks used duringtraining and testing phases to be different Initial steps in the for-mulation of transfer learning were made after NeurIPS-951 workshopthat was focused on the demand for lifelong machine-learning meth-ods that preserve and reuse previously learned knowledge (Chen etal 2018) Research on transfer learning has attracted more and moreattention since then especially in the computer vision domain andin particular for image recognition tasks (Donahue et al 2014 SharifRazavian et al 2014)

Recent studies have shown that TL can also be successfully appliedwithin the Natural Language Processing (NLP) domain especially onsemantically equivalent tasks (Dai et al 2019 Devlin et al 2018Mou et al 2016) Several research works were carried out specificallyon the Named Entity Recognition (NER) task and explored the capa-bilities of transfer learning when applied to different named entitycategories (ie different output spaces) For instance Qu et al (2016)pre-trained a linear-chain Conditional Random Fields (CRF) on a largeamount of labeled data in the source domain Then they introduceda two linear layer neural network that learns the difference betweenthe source and target label distributions Finally the authors initial-ized another CRF with parameters learned in linear layers to predictthe labels of the target domain Kim et al (2015) conducted experi-ments with transferring features and model parameters between re-lated domains where the label classes are different but may have asemantic similarity Their primary approach was to construct labelembeddings to automatically map the source and target label classesto improve the transfer (Chen et al 2018)

1Conference on Neural Information Processing Systems Formerly NIPS

[ September 4 2020 at 1017 ndash classicthesis ]

44 deep learning re-implementation of units

However there are also models trained in a manner such that theycould be applied to multiple NLP tasks simultaneously Among sucharchitectures are ULMFit (Howard and Ruder 2018) BERT (Devlin etal 2018) GPT2 (Radford et al 2019) and TransformerXL (Dai et al2019) ndash powerful models that were recently proposed for transferlearning within the NLP domain These architectures are called lan-guage models since they attempt to learn language structure from largetextual collections in an unsupervised manner and then predict thenext word in a sequence based on previously observed context wordsThe Bidirectional Encoder Representations from Transformers (BERT)(Devlin et al 2018) model is one of the most popular architectureamong the abovementioned and it is also one of the first that showeda close to human-level performance on the General Language Un-derstanding Evaluation (GLUE) benchmark2 (Wang et al 2018) thatincludes tasks on sentiment analysis question answering semanticsimilarity and language inference The architecture of BERT consistsof stacked transformer blocks that are pre-trained on a large corpusconsisting of 800M words from modern English books (ie BooksCor-pus by Zhu et al) and 2 500M words of text from pre-processed (iewithout markup) English Wikipedia articles Overall BERT advancedthe state-of-the-art performance for eleven (11) NLP tasks We describethis approach in detail in the subsequent section

43 bert bidirectional encoder representations from

transformers

Bidirectional Encoder Representations from Transformers (BERT) ar-chitecture is divided into two steps pre-training and fine-tuning Thepre-training step was enabled through two following tasks

bull Masked Language Model ndash the idea behind this approach is totrain a deep bidirectional representation that is different fromconventional language models which are either trained left-to-right or right-to-left To train a bidirectional model the authorsmask out a certain number of tokens in a sequence The modelis then asked to predict the original words for masked tokens Itis essential that in contrast to denoising auto-encoders (Vincentet al 2008) the model does not need to reconstruct the entireinput instead it attempts to predict the masked words Sincethe model does not know which words exactly it will be askedabout it learns a representation for every token in a sequence(Devlin et al 2018)

bull Next Sentence Prediction ndash aims to capture relationships be-tween two sentences A and B In order to train the model theauthors pre-trained a binarized next sentence prediction taskwhere pairs of sequences were labeled according to the criteriaA and B either follow each other directly in the corpus (label

2httpsgluebenchmarkcom

[ September 4 2020 at 1017 ndash classicthesis ]

43 bert bidirectional encoder representations from transformers 45

IsNext) or are both taken from random places (label NotNext)The model is then asked to predict whether the former or thelatter is the case The authors demonstrated that pre-trainingtowards this task is extremely useful for both question answeringand natural language inference tasks (Devlin et al 2018)

Furthermore BERT uses WordPiece tokenization for embeddings(Wu et al 2016) which segments words into subword-level tokensThis approach allows the model to make additional inferences basedon word structure (ie -ing ending denotes similar grammatical cat-egories)

The fine-tuning step follows the pre-training process For that theBERT model is initialized with parameters obtained during the pre-training step and a task-specific fully-connected layer is used aftertransformer blocks which maps the general representations to cus-tom labels (employing softmax operation)

self-attention The key operation of BERT architecture is theself-attention which is a sequence-to-sequence operation with conven-tional encoder-decoder structure (Vaswani et al 2017) The encodermaps an input sequence (x1 x2 xn) to a sequence of continuousrepresentations z = (z1 z2 zn) Given z the decoder then gener-ates an output sequence (y1y2 ym) of symbols one element ata time (Vaswani et al 2017) To produce output vector yi the selfattention operation takes a weighted average over all input vectors

yi =sumj

wijxj (16)

where j is an index over the entire sequence and the weights sumto one (1) over all j The weight wij is derived from a dot productfunction over xi and xj (Bloem 2019 Vaswani et al 2017)

w primeij = xTi xj (17)

The dot product returns a value between negative and positive in-finity and a softmax maps these values to [0 1] to ensure that theysum to one (1) over the entire sequence (Bloem 2019 Vaswani et al2017)

wij =expwlsquoijsumj expwlsquoij

(18)

A self-attention is not the only operation in transformers but is theessential one since it propagates information between vectors Otheroperations in the transformer are applied to every vector in the inputsequence without interactions between vectors (Bloem 2019 Vaswaniet al 2017)

[ September 4 2020 at 1017 ndash classicthesis ]

46 deep learning re-implementation of units

44 data amp descriptive statistics

The benchmark that we accumulated through the trial-run includes300 structured and partially labeled dialogues with the average lengthof dialogue being six (6) utterances All the conversations were heldin the German language Detailed general statistics can be found inTable 14

Value

Max Len Dialogue (in utterances) 15

Avg Len Dialogue (in utterances) 6

Max Len Utterance (in tokens) 100

Avg Len Utterance (in tokens) 9

Overall Unique Action Labels 13

Overall Unique Entity Labels 7

Train ndash Dialogues ( Utterances) 200 (1161)

Eval ndash Dialogues ( Utterances) 50 (279)

Test ndash Dialogues ( Utterances) 50 (300)

Table 14 General statistics for conversational dataset

The dataset covers 14 unique next actions and seven (7) uniquenamed entities Detailed information on next actions and entities canbe found in Table 15 and Table 16 Below we listed possible actionswith the corresponding explanation and predefined mapped state-ments

unk ndash requests to define the intent of the question mathematicalorganizational or contextual (ie fallback policy in case if theautomated intent recognition unit fails) rarr ldquoHast du eine Fragezu einer Aufgabe (MATH) zu einem Text im Kurs (TEXT) oder eineorganisatorische Frage (ORG)rdquo

org ndash handovers discussion to a human tutor in case of organiza-tional question rarr ldquoEs scheint sich wohl um eine organisatorischeFrage zu handeln Bitte fasse deine Frage in einem kurzen Text zusam-men damit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

text ndash handovers discussion to a human tutor in case of contextualquestion rarr ldquoEs scheint sich wohl um eine inhaltliche Frage zuhandeln Bitte fasse deine Frage in einem kurzen Text zusammendamit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

[ September 4 2020 at 1017 ndash classicthesis ]

44 data amp descriptive statistics 47

topic ndash requests information on the topic rarr ldquoWelches Kapitel bear-beitest du gerade Antworte bitte mit der Kapitelnummer zB IA IB II oder mit dem Kapitelnamen zB Geometrierdquo

examination ndash requests information about the examination moderarrldquoBearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfungrdquo

subtopic ndash requests information about the subtopic in case of quizor section-level trainingexercise modes rarr ldquoWie heiszligt der Ab-schnitt in dem sich die Aufgabe befindet auf die sich deine Fragebeziehtrdquo

level ndash requests information about the level of examination modechapter or section rarr ldquoGeht es um eine Aufgabe auf Kapitel-Ebene(zB um eine Trainingsaufgabe im Kapitel Geometrie) oder um eineAufgabe auf der Ebene eines Abschnitts in einem Kapitel (zB umeine Aufgabe zum Abschnitt Winkel im Kapitel Geometrie) Antwortebitte mit KAP fuumlr Kapitel und SEK fuumlr Abschnittrdquo

question number ndash requests information about the question num-ber rarr ldquoWelche Aufgabe bearbeitest du gerade Wenn du zB beiTeilaufgabe c in der zweiten Trainingsaufgabe (oder der zweiten Uumlbungs-aufgabe) bist antworte mit 2c Bearbeitest du gerade einen Quiz odereine Schlusspruumlfung gib bitte nur die Teilaufgabe anrdquo

final request ndash considers the Informational Dictionary (ID) to becomplete and initiates the verification step rarr ldquoDeine Angabenwaren wie folgt Habe ich dich richtig verstanden Bitte antwortemit Ja oder Neinrdquo

verify request ndash requests the student to verify the assembled in-formation rarr ldquoWelche der folgenden Punkte habe ich falsch erkanntSchreibe bitte welche der folgenden Punkte es sind [a b c ]3 rdquo

correct request ndash requests the student to provide a correct infor-mation if there is an error in the assembled data rarr ldquoGib bittedie korrekte Information fuumlr [a]4 einrdquo

exact question ndash requests the student to provide the detailed ex-planation of the problem rarr ldquoBitte fasse deine Frage in einemkurzen Text zusammen damit ich sie an meinen menschlichen Kolle-gen weiterleiten kannrdquo

human handover ndash finalizes the conversation rarr ldquoVielen Dankunsere menschlichen Kollegen melden sich gleich bei dirrdquo

3Variables that depend on the state of Informational Dictionary (ID)4Variable that was selected as erroneous and needs to be corrected

[ September 4 2020 at 1017 ndash classicthesis ]

48 deep learning re-implementation of units

Action Counts Action Counts

Final Request 321 Unk 80

Human Handover 300 Subtopic 55

Exact Question 286 Correct Request 40

Question Number 176 Verify Request 34

Examination 175 Org 17

Topic 137 Text 13

Level 130

Table 15 Detailed statistics on possible systems actions Columns Countsdenote the number of occurrences of each action in the entiredataset

Entity Counts

Question Nr 317

Chapter 311

Examination 303

Subtopic 198

Level 80

Intent 70

Table 16 Detailed statistics on possible named entities Column Counts de-note the number of occurrences of each entity in the entire dataset

45 named entity recognition

We defined a sequence labeling task to extract custom entities fromuser input We considered seven (7) viable entities (see Table 16) tobe recognized by the model topic subtopic examination mode and levelquestion number intent as well as the entity other for remaining wordsin the utterance Since the data collected from the rule-based sys-tem already includes information on the entities we able to train adomain-specific NER unit However since the original user-input wasinformal the same information could be provided in different writ-ing styles That means that a single entity could have different surfaceforms (eg synonyms writing styles) Therefore entities that we ex-tracted from the rule-based system were transformed into a universalstandard (eg official chapter names) To consider all of the variableentity forms while post-labeling the original dataset we determinedgeneric entity names (eg chapter question nr) and mapped vari-ations of entities from the user input (eg Chapter = [ElementaryCalculus Chapter I ]) to them The overall dataset consists of 300labeled dialogues where 200 (with 1161 utterances) were employedfor training and 100 for evaluation and test sets (50 dialogues withca 300 utterances for each set respectively)

[ September 4 2020 at 1017 ndash classicthesis ]

46 next action prediction 49

451 Model Settings

We conducted experiments with German and multilingual BERT im-plementations5 The capitalization of words is significant for the Ger-man language therefore we run the experiments on the capitalizeddata while preserving the original punctuation We employed the avail-able base model for both multilingual and German BERT implemen-tations in the cased version We initiated the learning rate for bothmodels to 1e minus 4 and the maximum length of the tokenized inputwas set to 128 tokens We conducted the experiments multiple timeswith different seeds for a maximum of 50 epochs with the trainingbatch size set to 32 We utilized AdamW as the optimizer and appliedearly stopping if the performance did not improve significantly after5 epochs

46 next action prediction

We defined a classification problem where we aim to predict the sys-temrsquos next action according to the given user input We assumed 14custom actions (see Table 15) that we considered being our classesWhile collecting the toy dataset every input was automatically la-beled by the rule-based system with the corresponding next actionand the dialogue-id Thus no additional post-labeling was requiredon this step

We investigated two settings for this task

bull Default Setting Utilizing a user input and the correspondingclass (ie next action) without any supplementary context Bydefault we conduct all of our experiments in this setting

bull Extended Setting Employing a user input corresponding nextaction and previous systems action as a source of supplementarycontext For this setting we experimented with the best per-forming model from the default setting

The overall dataset consists of 300 labeled dialogues where 200(with 1161 utterances) were employed for training and 100 for evalu-ation and test sets (50 dialogues with ca 300 utterances for each setrespectively)

461 Model Settings

For the NAP task we carried out experiments with German and multi-lingual BERT implementations as well We examined the performanceof both capitalized and lower-cased inputs as well as plain and prepro-cessed data For the multilingual BERT we used the base model inboth cased and uncased modifications For the German BERT we

5httpsgithubcomhuggingfacetransformers

[ September 4 2020 at 1017 ndash classicthesis ]

50 deep learning re-implementation of units

utilized the base model in the cased modification only6 For bothmodels we initiated the learning rate to 4e minus 5 and the maximumlength of the tokenized input was set to 128 tokens We conductedthe experiments multiple times with different seeds for a maximumof 300 epochs with the training batch size set to 32 We utilized AdamW

as the optimizer and applied early stopping if the performance didnot improve significantly after 15 epochs

47 evaluation and results

To evaluate the models we computed word-level macro F1 score forthe Named Entity Recognition (NER) task and utterance-level macro F1

score for the Next Action Prediction (NAP) task The word-level F1

is estimated as the average of the F1 scores per class each computedfrom all words in the evaluation and test sets The outcomes for theNER task are represented in Table 17 For utterance-level F1 a singleclass label (ie next action) is taken for the whole utterance Theresults for the NAP task are shown in Table 18

We additionally computed average dialogue accuracy for the best per-forming NAP models This score indicates how well the predictednext actions compose the conversational flow The average dialogueaccuracy was estimated for 50 conversations in the evaluation andtest sets respectively The results are displayed in Table 19

The obtained results for the NER task revealed that German BERT

performed significantly better than the multilingual BERT model Theperformance of the custom NER unit is at 093 macro F1 points for allpossible named entities (see Table 17) In contrast for the NAP taskthe multilingual BERT model obtained better performance than theGerman BERT model The best performing method in the default set-ting gained a macro F1 of 0677 points for 14 possible classes whereasthe model in the extended setting performed better ndash its best macroF1 score is 0752 for the equal number of classes (see Table 18)

Regarding the dialogue accuracy the extended system fine-tunedwith multilingual BERT obtained better outcomes as the default one(see Table 19) The overall conclusion for the NAP is that the capi-talized setting increased the performance of the model whereas theinclusion of punctuation has not influenced the results

6Uncased pre-trained modification of model was not available

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 51

Task Model Cased Punct F1

Eval Test

NER GER X X 0971 0930

NER Mult X X 0926 0905

Table 17 Word-level F1 for the Named Entity Recognition (NER) task Inbold best performance for evaluation and test sets

Task Model Cased Punct Extended F1

Eval Test

NAP GER X times times 0711 0673

NAP GER X X times 0701 0606

NAP Mult X X times 0688 0625

NAP Mult X times times 0769 0677

NAP Mult X times X 0810 0752

NAP Mult times X times 0664 0596

NAP Mult times times times 0742 0502

Table 18 Utterance-level F1 for the Next Action Prediction (NAP) task Un-derlined best performance for evaluation and test sets for defaultsetting (without previous action context) In bold best perfor-mance for evaluation and test sets on extended setting (with previ-ous action context)

Model Accuracy

Eval Test

NAP default 0765 0724

NAP extended 0813 0801

Table 19 Average dialogue accuracy computed for the Next Action Predic-tion (NAP) task for best preforming models In bold best perfor-mance for evaluation and test sets

[ September 4 2020 at 1017 ndash classicthesis ]

52 deep learning re-implementation of units

Figure5M

acroF1

evaluationN

AP

ndashdefault

modelC

omparison

between

Germ

anand

multilingual

BERT

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 53

Figu

re6

Mac

roF1

test

NA

Pndash

defa

ult

mod

elC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

54 deep learning re-implementation of units

Figure7M

acroF1

evaluationN

AP

ndashextended

model

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 55

Figu

re8

Mac

roF1

test

NA

Pndash

exte

nded

mod

el

[ September 4 2020 at 1017 ndash classicthesis ]

56 deep learning re-implementation of units

Figure9M

acroF1

evaluationN

ERndash

Com

parisonbetw

eenG

erman

andm

ultilingualBER

T

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 57

Figu

re1

0M

acroF1

test

NER

ndashC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

58 deep learning re-implementation of units

48 error analysis

We then investigated the cases where models failed to predict thecorrect action or labeled the named entity span erroneously Belowwe outline the most common errors

next action prediction One of the prevalent flaws in the de-fault model was the mismatch between two consecutive actionsndash the action question number and subtopic That is due to theposition of these actions in the conversational flow The occur-rence of both actions in the dialogue is not strict and largelydepends on the previous action To explain this we illustratethe following possible conversational scenarios

bull The system can either directly proceed with the request forthe question number if the examination mode is the finalexamination

bull If the examination mode is training or exercise the systemhas to ask about the level first (ie chapter or section)

bull If it is a chapter-level the system can again directly pro-ceed with the request for the question number

bull In the case of section-level the system has to ask aboutthe subsection first (if this information was not providedbefore)

The examination of the extended model showed that the inductionof supplementary context (ie previous action) improved theperformance of the system by about 60

named entity recognition The cases where the system failedcover mismatches between the labels chapter and other andthe labels question number and other This failure arose dueto the imperfectly labeled span of a multi-word named entity(eg ldquoElementary Calculus Proportionality Percentagerdquo) The firstor last words in the named entity were excluded from the spanand erroneously labeled with the other label

49 summary

In this chapter we re-implemented two of three fundamental unitsof our IPA conversational agent ndash Named Entity Recognition (NER)and Dialogue Manager (DM) ndash using the methods of Transfer Learn-ing (TL) specifically the Bidirectional Encoder Representations fromTransformers (BERT) model

We believe that the obtained results are reasonably high consider-ing a comparatively little amount of training data we employed tofine-tune the models Therefore we conclude that both NAP and NER

segments could be used to replace or extend the existing rule-based

[ September 4 2020 at 1017 ndash classicthesis ]

410 future work 59

modules Rule-based components as well as ElasticSearch (ES) arelimited in their capacities and not flexible to novel patterns Whereasthe trainable units generalize better and could possibly decrease theamount of inaccurate predictions in case of accidental dialogue be-havior Furthermore to increase the overall robustness rule-basedand trainable components could be used synchronously when onesystem fails (eg ElasticSearch (ES) module can not extract an entity)the conversation continues on the prediction obtained from the othermodel

410 future work

There are possible directions for future work

bull At the time of writing of this thesis both units were trained andinvestigated on its own ndash that means without being incorpo-rated into the initial IPA conversational system Thus one of thepotential directions of future work would be the embodimentof these units into the original system and the examination ofpossible failure cases Furthermore the resulting hybrid sys-tem could then imply the constraint of additional meta policiesto prevent an unexpected systems behavior

bull Further investigation could be towards the multi-language mo-dality for the re-implemented units Since the Online Mathe-matik Bruumlckenkurs (OMB+) platform also supports English andChinese it would be interesting to examine whether the sim-ple translation from target language (ie English Chinese) tosource language (ie German) would be sufficient to employalready-assembled dataset and pre-trained units Presumablyit should be achievable in the case of the Next Action Predic-tion (NAP) unit but could lead to a noticeable performance dropfor the Named Entity Recognition (NER) task since there couldbe a considerable gap between translated and original entitiesspans

bull Last but not least one of the further extensions for the IPA

would be the eventual applicability for more cognitively de-manding tasks For instance it is of interest to extend the modelfor more complex domains like handling of mathematical ques-tions Would it be possible to automate the human assistance atleast by less complicated mathematical inquiries or is it still un-achievable with currently existing NLP techniques and machinelearning methods

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

Part II

C L U S T E R I N G - B A S E D T R E N D A N A LY Z E R

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

5F O U N D AT I O N S

As discussed in Chapter 1 predictive analytics is one of the poten-tial Intelligent Process Automation (IPA) tasks due to its capabilityto forecast future events by using current and historical data andtherethrough assist knowledge workers with for instance sales or in-vestments Modern machine learning algorithms allow an intelligentand fast analysis of large amounts of data However the evaluationof such algorithms and models remains one of the grave problemsin this domain This is mostly due to the lack of publicly availablebenchmarks for the evaluation of topic modeling and trend detectionalgorithms Hence in this part of the thesis we are addressing thischallenge and assembling the benchmark for detecting both trendsand downtrends (ie topics that become less frequent overtime) Tothe best of our knowledge the task of downtrend detection has notbeen well addressed before

In this chapter we introduce the concepts required by the secondpart of the thesis We begin with the enlightenment to the topic mod-eling task and define the emerging trend in Section 51 Next weexamine the related work existing approaches to topic modeling andtrend detection and their limitations in Section 52 In Chapter 6we introduce our TRENDNERT benchmark for trend and downtrenddetection and describe the method we employed for its assembly

51 motivation and challenges

Science changes and evolves rapidly novel research areas emergewhile others fade away Keeping pace with these changes is challeng-ing Therefore recognizing and forecasting emerging research trends isof significant importance for researchers academic publishers as wellas for funding agencies stakeholders and innovation companies

Previously the task of trend detection was mostly solved by do-main experts who used specialized and cumbersome tools to inves-tigate the data to get useful insights from it (eg through data visu-alization or statistics) However manual analysis of large amountsof data could be time-consuming and hiring domain experts is verycostly Furthermore the overall increase of research data (ie GoogleScholar PubMed DBLP) in the past decade makes the approachbased on human domain experts less and less scalable In contrastautomated approaches become a more desirable solution for the taskof emerging trend identification (Kontostathis et al 2004 Salatino2015) Due to a large number of electronic documents appearing on

63

[ September 4 2020 at 1017 ndash classicthesis ]

64 foundations

the web daily several machine learning techniques were developed tofind patterns of word occurrences in those documents using hierarchi-cal probabilistic models These approaches are called ndash topic modelsand they allow the analysis of extensive textual collections and valu-able insights from them The main advantage of topic models is theirability to discover hidden patterns of word-use and connecting docu-ments containing similar patterns In other words the idea of a topicmodel is that documents are a mixture of topics where a topic is de-fined as a probability distribution over words (Alghamdi and Alfalqi2015)

Topics have a property of being able to evolve thus modeling top-ics without considering time may confuse the precise topic identifi-cation Whereas modeling topics by considering time is called topicevolution modeling and it can reveal important hidden informationin the document collection allowing recognizing topics with the ap-pearance of time This approach also provides the ability to check theevolution of topics during a given time window and examine whethera given topic is gaining interest and popularity over time (ie becom-ing an emerging trend) or if its development is stable Considering thelatter approach we turn to the task of emerging trend identification

How is an emerging trend defined An emerging trend is a topicthat is gaining interest and utility over time and it has two attributesnovelty and growth (Small Boyack and Klavans 2014 Tu and Seng2012) Rotolo Hicks and Martin (2015) explain the correlations ofthese two attributes in the following way Up to the point of theemergence a research topic is specified by a high level of novelty Atthat time it does not attract any considerable attention from the sci-entific community and due to the limited impact its growth is nearlyflat After some turning points (ie the appearance of significant sci-entific publications) the research topic starts to grow faster and maybecome a trend if the growth is fast however the level of a noveltydecreases continuously once the emergence becomes clear Then atthe final phase the topic becomes well-established while its noveltylevels out (He and Chen 2018)

To detect trends in textual data one employs computational anal-ysis techniques and Natural Language Processing (NLP) In the sub-sequent section we give an overview of the existing approaches fortopic modeling and trend detection

52 detection of emerging research trends

The task of trend detection is closely associated with the topic mod-eling task since in the first instance it detects the latent topics in thesubstantial collection of data It secondarily employs techniques toinvestigate the evolution of the topic and whether it has the potentialto become a trend

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 65

Topic modeling has an extensive history of applications in the scien-tific domain including but not limited to studies of temporal trends(Griffiths and Steyvers 2004 Wang and McCallum 2006) and investi-gation of the impact prediction (Yogatama et al 2011) Below we givea general overview of the most significant methods used in this do-main explain their importance and if any the potential limitations

521 Methods for Topic Modeling

In order to extract a topic from textual documents (ie research pa-pers blog posts) the precise definition of the model for topic repre-sentation is crucial

latent semantic analysis or formerly called Latent SemanticIndexing is one of the fundamental methods in the topic model-ing domain proposed by Deerwester et al in 1990 Its primarygoal is to generate vector-based representations for documentsand terms and then employ these representations to computethe similarity between documents (resp terms) in order to se-lect the most related ones In this regard Latent Semantic Anal-ysis (LSA) takes a matrix of documents and terms and then de-composes it into a separate document-topic matrix and a topic-term matrix Before turning to the central task of finding latenttopics that capture the relationships among the words and docu-ments it also performs dimensionality reduction utilizing trun-cated Singular Value Decomposition This technique is usuallyapplied to reduce the sparseness noisiness and redundancyacross dimensions in the document-topic matrix Finally usingthe resulting document and term vectors one applies measuressuch as cosine similarity to evaluate the similarity of differentdocuments words or queries

One of the extensions to traditional LSA is probabilistic LatentSemantic Analysis (pLSA) that was introduced by Hofmann in1999 to fix several shortcomings The foremost advantage ofthe follow-up method was that it could successfully automatedocument indexing which in turn improved the LSA in a prob-abilistic sense by using a generative model (Alghamdi and Al-falqi 2015) Furthermore the pLSA can differentiate betweendiverse contexts of word usage without utilizing dictionariesor a thesaurus That affects better polysemy disambiguationand discloses typical similarities by grouping words that sharea similar context Among the works that employ pLSA is theapproach proposed by Gohr et al (2009) where authors applypLSA in a window that slides across the stream of documents toanalyze the evolution of topics Also Mei et al (2008) uses thepLSA in order to create a network of topics

[ September 4 2020 at 1017 ndash classicthesis ]

66 foundations

latent dirichlet allocation was developed by Blei Ng andJordan (2003) to overcome the limitations of both LSA and pLSA

models Nowadays Latent Dirichlet Allocation (LDA) is ex-pected to be one of the most utilized and robust probabilistictopic models The fundamental concept of LDA is the assump-tion that every document is represented as a mixture of topicswhere each topic is a discrete probability distribution that de-fines the probability of each word to appear in a given topicThese topic probabilities provide a compressed representationof a document In other words LDA models all of d documentsin the collection as a mixture over n latent topics where eachof the topics describes a multinomial distribution over awwordin the vocabulary (Alghamdi and Alfalqi 2015)

LDA fixes such weaknesses of the pLSA like the incapacity toassign a probability to previously unseen documents (becausepLSA learns the topic mixtures only for documents seen in thetraining phase) and the overfitting problem (ie the number ofparameters in pLSA increases linearly with the size of the corpus)(Salatino 2015)

An extension of the conventional LDA is the hierarchical LDA

(Griffiths et al 2004) where topics are grouped in hierarchiesEach node in the hierarchy is associated with a topic and a topicis a distribution across words Another comparable approach isthe Relational Topic Model (Chang and Blei 2010) which is amixture of a topic model and a network model for groups oflinked documents The attribute of every document is its text(ie discrete observations taken from a fixed vocabulary) andthe links between documents are hyperlinks or citations

LDA is a prevalent method for discovering topics (Blei and Laf-ferty 2006 Bolelli Ertekin and Giles 2009 Bolelli et al 2009Hall Jurafsky and Manning 2008 Wang and Blei 2011 Wangand McCallum 2006) However it was explored that to trainand fine-tune a stable LDA model could be very challenging andtime-consuming (Agrawal Fu and Menzies 2018)

correlated topic model One of the main drawbacks of the LDA

is its incapability of capturing dependencies among the topics(Alghamdi and Alfalqi 2015 He et al 2017) Therefore Bleiand Lafferty (2007) proposed the Correlated Topic Model as anextension of the LDA approach In this model the authors re-placed the Dirichlet distribution with a logistic-normal distribu-tion that models pairwise topic correlations with the Gaussiancovariance matrix (He et al 2017)

However one of the shortcomings of this model is the compu-tational cost The number of parameters in the covariance ma-trix increases quadratic to the number of topics and parameterestimation for the full-rank matrix can be inaccurate in high-dimensional space (He et al 2017) Therefore He et al (2017)

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 67

proposed their method to overcome this problem The authorsintroduced the model which learns dense topic embeddingsand captures topic correlations through the similarity betweenthe topic vectors According to He et al their method allowsefficient inference in the low-dimensional embedding space andsignificantly reduces previous cubic or quadratic time complex-ity to linear concerning the topic number

522 Topic Evolution of Scientific Publications

Although the topic modeling task is essential it has a significant off-spring task namely the modeling of topic evolution Methods able toidentify topics within the context of time are highly applicable in thescientific domain They allow understanding of the topic lineage andreveal how research on one topic influences another Generally thesemethods could be classified according to the primary sources of in-formation they employ text citations or key phrases

text-based Extensive textual collections have dynamic co-occurre-nces of word patterns that change over time Thus models thattrack topics evolution over time should consider both word co-occurrence patterns and the timestamps Blei and Lafferty 2006

In 2006 Wang and McCallum proposed a Non-Markov Continu-ous Time model for the detection of topical trends in the col-lection of NeurIPS publications (the overall timestamp of 17years was considered) The authors made the following as-sumption If the pattern of the word co-occurrence exists fora short time then the system creates a narrow-time-distributiontopic Whereas for the long-term patterns system generated abroad- time-distribution topic The principal objective of this ap-proach is that it models topic evolution without discretizing thetime ie the state at time t + t1 is independent of the state attime t (Wang and McCallum 2006)

Another work done by Blei and Lafferty (2006) proposed the Dy-namic Topic Models The authors assumed that the collectionof documents is organized based on certain time spans and thedocuments belonging to every time span are represented witha K-component model Thereby the topics are associated withtime span t and emerge from topics corresponding to span timetminus 1 One of the main differences of this model compared toother existing approaches is that it utilizes Gaussian distribu-tion for the topic parameters instead of the conventional Dirich-let distribution and therefore can capture the evolution overvarious time spans

[ September 4 2020 at 1017 ndash classicthesis ]

68 foundations

citation-based analysis is the second popular direction that isconsidered to be effective for trend identification He et al 2009Le Ho and Nakamori 2005 Shibata Kajikawa and Takeda2009 Shibata et al 2008 Small 2006 This method assumes thatcitations in their various forms (ie bibliographic citations co-citation networks citation graphs) indicate the meaningful rela-tionship between topics and uses citations to model the topicevolution in the scientific domain Nie and Sun (2017) utilizethis approach along with the Latent Dirichlet Allocation (LDA)and k-means clustering to identify research trends The authorsfirst use LDA to extract features and determine the optimal num-ber of topics Then they use k-means to obtain thematic clus-ters and finally they compute citation functions to identify thechanges of clusters over time

Though the citation-based approachrsquos main drawback is that aconsistent collection of publications associated with a specificresearch topic is required for these techniques to detect the clus-ter and novelty of the particular topic He and Chen 2018 Fur-thermore there are also many trend detection scenarios (egresearch news articles or research blog posts) in which citationsare not readily available That makes this approach not applica-ble to this kind of data

keyphrase-based Another standard solution is based on the useof keyphrase information extracted from research papers In thiscase every keyword is representative of a single research topicSystems like Saffron (Monaghan et al 2010) as well as Saffron-based models use this approach Saffron is a research toolkitthat provides algorithms for keyphrase extraction entity link-ing and taxonomy extraction Asooja et al (2016) utilize Saffronto extract the keywords from LREC proceedings and then pro-posed trend forecast based on regression models as an extensionto Saffron

However this method raises many questions including the fol-lowing how to deal with the noisiness of keywords (ie notrelevant keywords) and how to handle keywords that have theirhierarchies (ie sub-areas) Other drawbacks of this approachinclude the ambiguity of keywords (ie ldquoJavardquo as an island andldquoJavardquo as a programming language) or necessity to deal withsynonyms which could be treated as different topics

53 summary

An emerging trend detection algorithm aims to recognize topics thatwere earlier inappreciable but now gaining the importance and in-terest within the specific domain Knowledge of emerging trends isespecially essential for stakeholders researchers academic publish-ers and funding organizations Assume a business analyst in the

[ September 4 2020 at 1017 ndash classicthesis ]

53 summary 69

biotech company who is interested in the analysis of technical arti-cles for recent trends that could potentially impact the companiesinvestments A manual review of all the available information in aspecific domain would be extremely time-consuming or even not fea-sible However this process could be solved through Intelligent Pro-cess Automation (IPA) algorithms Such software would assist humanexperts who are tasked with identifying emerging trends through au-tomatic analysis of extensive textual collections and visualization ofoccurrences and tendencies of a given topic

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

6B E N C H M A R K F O R S C I E N T I F I C T R E N D D E T E C T I O N

This chapter covers work already published at interna-tional peer-reviewed conferences The relevant publica-tion is Moiseeva and Schuumltze 2020 The research describedin this chapter was carried out in its entirety by the authorof the thesis The other author of the publication acted asan advisor

Computational analysis and modeling of the evolution of trendsis a significant area of research because of its socio-economic impactHowever no large publicly available benchmark for trend detectioncurrently exists making a comparative evaluation of methods impos-sible We remedy this situation by publishing the benchmark TREND-NERT consisting of a set of gold trends (resp downtrends) and doc-ument labels that is available as an unlimited download and a largeunderlying document collection that can also be obtained for free Wepropose Mean Average Precision (MAP) as an evaluation measure fortrend detection and apply this measure in an investigation of severalbaselines

61 motivation

What is an emerging research topic and how is it defined in the litera-ture At the time of emergence a research topic often does not attractmuch recognition from the scientific society and is exemplified by justa few publications He and Chen (2018) associate the appearance ofa trend to some triggering event eg the publication of an article ina high-impact journal Later the research topic starts to grow fasterand becomes a trend (He and Chen 2018)

We adopt this as our key representation of trend in this paper atrend is a research topic with a strongly increasing number of publi-cations in a particular time interval In opposite to trends there arealso downtrends which refer to the topics that move lower in theirdemand or growth as they evolve Therefore we define a downtrendas the converse to a trend a downtrend is a research topic that has astrongly decreasing number of publications in a particular time inter-val Downtrends can be as important to detect as trends eg for afunding agency planning its budget

To detect both types of topics (ie trends and downtrends) intextual data one employs computational analysis techniques and NLPThese methods enable the analysis of extensive textual collectionsand gain valuable insights into topical trendiness and evolution over

71

[ September 4 2020 at 1017 ndash classicthesis ]

72 benchmark for scientific trend detection

time Despite the overall importance of trend analysis there is to ourknowledge no large publicly available benchmark for trend detectionThis makes a comparative evaluation of methods and algorithms im-possible Most of the previous work has used proprietary or moder-ate datasets or evaluation measures were not directly bound to theobjectives of trend detection (Gollapalli and Li 2015)

We remedy this situation by publishing the benchmark TREND-NERT1 The benchmark consists of the following

1 A set of gold trends compiled based on an extensive analysis ofthe meta-literature

2 A set of labeled documents (based on more than one millionscientific publications) where each document is assigned to atrend a downtrend or the class of flat topics and supported bythe topic-title

3 An evaluation script to compute Mean Average Precision (MAP)as the measure for the accuracy of trend detection

We make the benchmark available as unlimited downloads2 Theunderlying research corpus can be obtained for free from SemanticScholar3 (Ammar et al 2018)

611 Outline and Contributions

In this work we address the challenge of the benchmark creationfor (down)trend detection This chapter is structured as follows Sec-tion 62 introduces our model for corpus creation and its fundamen-tal components In Section 63 we present the crowdsourcing proce-dure for the benchmark labeling Section 65 outlines the proposedbaseline for trend detection our experimental setup and outcomesFinally we illustrate the details on the distributed benchmark in Sec-tion 66 and conclude in Section 67

Our contributions within this work are as follows

bull TRENDNERT is the first publicly available benchmark for (down)-trend detection

bull TRENDNERT is based on a collection of more than one milliondocuments It is among the largest that has been used for trenddetection and therefore offers a realistic setting for developingtrend detection algorithms

bull TRENDNERT addresses the task of detecting both trends anddowntrends To the best of our knowledge the task of down-trend detection has not been addressed before

1The name is a concatenation of trend and its ananym dnert Because it supportsthe evaluation of both trends and downtrends

2httpsdoiorg1017605OSFIODHZCT3httpsapisemanticscholarorgcorpus

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 73

62 corpus creation

In this section we explain the methodology we used to create theTRENDNERT benchmark

621 Underlying Data

We employ a corpus provided by Semantic Scholar4 It includes morethan one million papers published mostly between 2000 and 2015

in about 5000 computer science journals and conference proceedingsEvery record in the collection consists of a title key phrases abstractfull text and metadata

622 Stratification

The distribution of documents over time in the entire original datasetis skewed (see Figure 11) We discovered that clustering the entire col-lection as well as a random sample produces corrupt results becausemore weight is given to later years than to earlier years Thereforewe first generated a stratified sample of the original document collec-tion To this end we randomly pick 10 000 documents for each yearbetween 2000 and 2016 for an overall sample size of 160 000

Figure 11 Overall distribution of papers in the entire dataset Years 1975 to2016

4httpswwwsemanticscholarorg

[ September 4 2020 at 1017 ndash classicthesis ]

74 benchmark for scientific trend detection

623 Document representations amp Clustering

As we mentioned this workrsquos primary focus was the creation of abenchmark not the development of new algorithms or models fortrend detection Therefore for our experiments we have selected analgorithm based on k-means clustering as a simple baseline methodfor trend detection

In conventional document clustering documents are usually repre-sented as bag-of-words (BOW) feature vectors (Manning Raghavanand Schuumltze 2008) Those however have a major weakness theyneglect the semantics of words (Le and Mikolov 2014) Still the re-cent work in representation learning domain and particularly doc2vecndash the method proposed by Le and Mikolov (2014) ndash can provide rep-resentations to document clustering that overcome this weakness

The doc2vec5 algorithm is inspired by techniques for learning wordvectors (Mikolov et al 2013) and is capable of capturing semantic reg-ularities in textual collections This is an unsupervised approach forlearning continuous distributed vector representations for text (ieparagraphs documents) This approach maps documents into a vec-tor space such that semantically related documents are assigned sim-ilar vector representations (eg an article about ldquogenomicsrdquo is closerto an article about ldquogene expressionrdquo than to an article about ldquofuzzysetsrdquo) Formally each paragraph is mapped to the individual vectorrepresented by a column in matrix D and every word is also mappedto the individual vector represented by a column in matrix W Theparagraph vector and word vectors are concatenated to predict thenext word in a context (Le and Mikolov 2014) Other work has al-ready successfully employed this type of document representationsfor topic modeling combining them with both Latent Dirichlet Allo-cation (LDA) and clustering approaches (Curiskis et al 2019 DiengRuiz and Blei 2019 Moody 2016 Xie and Xing 2013)

In our work we run doc2vec on the stratified sample of 160 000papers and represent each document as a length-normalized vector(ie document embedding) These vectors are then clustered intok = 1000 clusters using the scikit-learn6 implementation of k-means(MacQueen 1967) with default parameters The combination of doc-ument representations with a clustering algorithm is conceptuallysimple and interpretable Also the comprehensive comparative eval-uation of topic modeling methods utilizing document embeddingsperformed by Curiskis et al (2019) showed that doc2vec feature repre-sentations with k-means clustering outperform several other methods7

on three evaluation measures (Normalized Mutual Information Ad-justed Mutual Information and Adjusted Rand Index)

5We use the implementation provided by Gensim httpsradimrehurekcom

gensimmodelsdoc2vechtml6httpsscikit-learnorgstable7hierarchical clustering k-medoids NMF and LDA

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 75

We run 10 trials of stratification and clustering resulting in 10 dif-ferent clusterings We do this to protect against the variability ofclustering and because we do not want to rely on a single clusteringfor proposing (down)trends for the benchmark

624 Trend amp Downtrend Estimation

Recalling our definitions of trend and downtrend A trend (resp down-trend) is defined as a research topic that has a strongly increasing(resp decreasing) number of publications in a particular time inter-val

Linear trend estimation is a widely used technique in predictive(time-series) analysis that proves statements about tendencies in thedata by correlating the measurements to the times at which they oc-curred (Hess Iyer and Malm 2001) Considering the definition oftrend (resp downtrend) and the main concept of linear trend esti-mation models we utilize the linear regression to identify trend anddowntrend candidates in resulting clustering sets

Specifically we count the number of documents per year for a clus-ter and then estimate the parameters of the best fitting line (ie slope)Clusters are then ranked according to the resulting slope and then = 150 clusters with the largest (resp smallest) slope are selectedas trend (resp downtrend) candidates Thus our definition of a single-clustering (down)trend candidate is a cluster with extreme ascendingor descending slope Figure 12 and Figure 13 give two examples of(down)trend candidates

625 (Down)trend Candidates Validation

As detailed in Section 623 we run ten series of clustering result-ing in ten discrete clustering sets We then estimated the (down)trendcandidates according to the procedure described in Section 624 Con-sequently for each of the ten clustering sets we obtained 150 trendand 150 downtrend candidates

Nonetheless for the benchmark among all the (down)trend can-didates across the ten clustering sets we want to sort out only thosethat are consistent overall sets To obtain consistent (down)trend candi-dates we apply a Jaccard coefficient metric that is used for measuringthe similarity and diversity of sample sets We define an equivalencerelation simR two candidates (ie clusters) are equivalent if their Jac-card coefficient is gt τsim where τsim = 05 Following this notion wecompute the equivalence of (down)trend candidates across the tensets

Finally we define an equivalence class as an equivalence class trendcandidate (resp equivalence class downtrend candidate) if it contains trend(resp downtrends) candidates in at least half of the clusterings This

[ September 4 2020 at 1017 ndash classicthesis ]

76 benchmark for scientific trend detection

Figure 12 A trend candidate (positive slope) Topic Sentiment and EmotionAnalysis (using Social Media Channels)

Figure 13 A downtrend candidate (negative slope) Topic XML

[ September 4 2020 at 1017 ndash classicthesis ]

63 benchmark annotation 77

procedure gave us in total 110 equivalence class trend candidates and107 equivalence class downtrend candidates that we then annotatefor our benchmark8

63 benchmark annotation

To annotate the equivalence class (down)trend candidates obtainedfrom our model we used the Figure-Eight9 platform It provides anintegrated quality-control mechanism and a training phase before theactual annotation (Kolhatkar Zinsmeister and Hirst 2013) whichminimizes the rate of poorly qualified annotators

We design our annotation task as follows A worker is shown adescription of a particular (down)trend candidate along with the list ofpossible (down)trend tags Descriptions are mixed and presented inrandom order The temporal presentation order of candidates is alsoarbitrary The worker is asked to pick from the list with (down)trendtags an item that matches the cluster (resp candidate) best Evenif there are several items that are a possible match the worker mustchoose only one If there is no matching item the worker can selectthe option other Three different workers evaluated each cluster andthe final label is the majority label If there is no majority label wepresent the cluster repeatedly to the workers until there is a majority

The descriptors of candidates are collected through the followingprocedure

bull First we review the documents assigned to a candidate cluster

bull Then we extract keywords and titles of the papers attributed tothe cluster Semantic Scholar provides keywords and titles as apart of the metadata

bull Finally we compute the top 15 most frequent keywords andrandomly select 15 titles These keywords and titles are thedescriptor of the cluster candidate presented to crowd-workers

We create the (down)trend tags based on keywords titles and meta-data such as publication venue The cluster candidates were manu-ally labeled by graduate employees (ie domain experts in the com-puter science domain) considering the abovementioned items

631 Inter-annotator Agreement

To ensure the quality of obtained annotations we computed the InterAnnotator Agreement (IAA) - a measure of how well two (or more)

8Note that the overall number of 217 refers to the number of the (down)trendcandidates and not to the number of documents Each candidate contains hundredsof documents

9Formerly called CrowdFlower

[ September 4 2020 at 1017 ndash classicthesis ]

78 benchmark for scientific trend detection

annotators can make the same decision for a particular category Todo that we used the following metrics

krippendorffrsquos α is a reliability coefficient developed to measurethe agreement among observers or measuring instruments draw-ing distinctions among typically unstructured phenomena orassign computable values to them (Krippendorff 2011) Wechoose Krippendorffrsquos α as an inter-annotator agreement mea-sure because it ndash unlike other specialized coefficients ndash is a gen-eralization of several reliability criteria and applies to situationslike ours where we have more than two annotators and a givenannotator only labels a subset of the data (ie we have to dealwith missing values)

Krippendorff (2011) has proposed several metric variations eachof which is associated with a specific set of weights We useKrippendorffrsquos α considering any number of observers and miss-ing data In this case Krippendorffrsquos α for the overall numberof 217 annotated documents is α = 0798 According to Landisand Koch (1977) this value corresponds to a substantial level ofagreement

figure-eight agreement rate Figure-Eight10 provides an agree-ment score c for each annotated unit u which is based onthe majority vote of the trusted workers The score has beenproven to perform well compared to the classic metrics (Kol-hatkar Zinsmeister and Hirst 2013) We computed the averageC for the 217 annotated documents C is 0799

Since both Krippendorffrsquos α and internal Figure Eight IAA are ratherhigh we consider the obtained annotations for the benchmark to beof good quality

632 Gold Trends

We compiled a list of computer science trends for the years 2016based on the analysis of the twelve survey publications (Ankerholz2016 Augustin 2016 Brooks 2016 Frot 2016 Harriet 2015 IEEEComputer Society 2016 Markov 2015 Meyerson and Mariette 2016Nessma 2015 Reese 2015 Rivera 2016 Zaino 2016) For this yearwe identified a total of 31 trends see Table 20

Since there is a variation as to the name of a specific trend wecreated a unique name for each of them Out of the 31 trends 28(90 3) are instantiated as trends by our model that we confirmedin the crowdsourcing procedure11 We searched for the three missingtrends using a variety of techniques (ie inspection of non-trendsdowntrends random browsing of documents not assigned to trend

10Former CrowdFlower platform11The author verified this based on her judgment of equivalence of one of the 31

gold trend names in the literature with one of the trend tags

[ September 4 2020 at 1017 ndash classicthesis ]

64 evaluation measure for trend detection 79

candidates and keyword searches) Two of the missing trends donot seem to occur in the collection at all ldquoHuman Augmentationrdquo andldquoFood and Water Technologyrdquo The third missing trend ldquoCryptocurrencyand Cryptographyrdquo does occur in the collection but the occurrencerate is very small (ca 001) We do not consider these three trendsin the rest of the paper

Computer Science Trend Computer Science Trend

Autonomous Agents and Systems Natural Language Processing

Autonomous Vehicles Open Source

Big Data Analytics Privacy

Bioinformatics and (Bio) Neuroscience Quantum Computing

Biometrics and Personal Identification Recommender Systems and Social Networks

Cloud Computing and Software as a Service Reinforcement Learning

Cryptocurrency and Cryptographylowast Renewable Energy

Cyber Security Robotics

E-business Semantic Web

Food and Water Technologylowast Sentiment and Emotion Analysis

Game-based Learning Smart Cities

Games and (Virtual) Augmented Reality Supply Chains and RFIDs

Human Augmentationlowast Technology for Climate Change

MachineDeep Learning Transportation and Energy

Medical Advances and DNA Computing Wearables

Mobile Computing

Table 20 31 areas identified as computer science trends for year 2016 in themedia 28 of these trends are instantiated by our model as welland thus covered in our benchmark Topics marked with asterisk(lowast) were not found in our underlying data collection

In summary based on our analysis of meta-literature we identified28 gold trends that also occur in our collection We will use them asthe gold standard that trend detection algorithms should aim to discover

64 evaluation measure for trend detection

Whether something is a trend is a graded concept For this reasonwe adopt an evaluation measure based on a ranking of candidatesspecifically Mean Average Precision (MAP) The score denotes theaverage of the precision values of ranks of correctly identified andnon-redundant trends

We approach trend detection as computing a ranked list of sets ofdocuments where each set is a trend candidate We refer to docu-ments as trendy downtrendy and flat depending on which class theyare assigned to in our benchmark We consider a trend candidate

[ September 4 2020 at 1017 ndash classicthesis ]

80 benchmark for scientific trend detection

c as a correct recognition of gold trend t if it satisfies the followingrequirements

bull c is trendy

bull t is the largest trend in c

bull |tcap c||c| gt ρ where ρ is a coverage parameter

bull t was not earlier recognized in the list

These criteria give us a true positive (ie recognition of a gold trend)or false positive (otherwise) for every position of the ranked list anda precision score for each trend Precision for a trend that was notobserved is 0 We then compute MAP see Table 21 for an example

Gold Trends Gold Downtrends

Cloud Computing Compilers

Bioinformatics Petri Nets

Sentiment Analysis Fuzzy Sets

Privacy Routing Protocols

Trend Candidate |tcap c||c| tpfp P

c1 Cloud Computing 060 tp 100

c2 Privacy 020 fp 050

c3 Cloud Computing 070 fp 033

c4 Bioinformatics 055 tp 050

c5 Sentiment Analysis 095 tp 060

c6 Sentiment Analysis 059 fp 050

c7 Bioinformatics 041 fp 043

c8 Compilers 060 fp 038

c9 Petri Nets 080 fp 033

c10 Privacy 033 fp 030

Table 21 Example of proposed MAP evaluation (with ρ = 05) MAP is theaverage of the precision values 10 (Cloud Computing) 05 (Bioin-formatics) 06 (Sentiment Analysis) and 00 (Privacy) ie MAP =214 = 0525 True positive tp False positive fp Precision P

65 trend detection baselines

In this section we investigate the impact of four different configura-tion choices on the performance of trend detection by way of ablation

[ September 4 2020 at 1017 ndash classicthesis ]

65 trend detection baselines 81

651 Configurations

abstracts vs full texts The underlying data collection containsboth abstracts and full texts12 Our initial expectation was thatthe full text of a document comprises more information thanthe abstract and it should be a more reliable basis for trend de-tection This is one of the configurations we test in the ablationWe employ our proposed method on full text (solely documenttexts without abstracts) and abstract (only abstract texts) collec-tion separately and observe the results

stratification Due to the uneven distribution of documents overthe years the last years (with increased volume of publications)may get too much impact on results compared to earlier yearsTo investigate this we conduct an ablation experiment wherewe compare (i) randomly sampled 160k documents from theentire collection and (ii) stratified sampling In stratified sam-pling we select 10k documents for each of the years 2000 ndash 2015resulting in a sampled collection of 160k

length Lt of interval Clusters are ranked by trendiness and theresulting ranked list is evaluated To measure the trendiness orgrowth of topics over time we fit a line to an interval (ini|i0 6i lt i0 + Lt by linear regression where Lt is the length of theinterval i is one of the 16 years (2000 2015) and ni is thenumber of documents that were assigned to cluster c in thatyear As a simple default baseline we apply the regression to ahalf of an entire interval ie Lt = 8 There are nine such inter-vals in our 16-year period As the final measure of growth forthe cluster we take the maximal of the nine individual slopesTo determine how much this configuration choice affects our re-sults we also test a linear regression over the entire time spanie Lt = 16 In this case there is a single interval

clustering method We consider Gaussian Mixture Models (GMM)and k-means We use GMM with spherical type of covariancewhere each component has the same variance For both weproceed as described in Section 623

652 Experimental Setup and Results

We conducted experiments on the Semantic Scholar corpus and eval-uated a ranked list of trend candidates against the benchmark con-sidering the configurations mentioned above We adopted Mean Av-erage Precision (MAP) as the principal evaluation measure As a sec-ondary evaluation measure we computed recall at 50 (R50) whichestimates the percentage of gold trends found in the list of 50 highest-ranked trend candidates We found that the configuration (0) in Ta-

12Semantic Scholar provided the underlying dataset at the beginning of our workin 2016

[ September 4 2020 at 1017 ndash classicthesis ]

82 benchmark for scientific trend detection

ble 22 works best We then conducted ablation experiments to dis-cover the importance of the configuration choices

(0) (1) (2) (3) (4)

document part A F A A A

stratification yes yes no yes yes

Lt 8 8 8 16 8

clustering GMMsph GMMsph GMMsph GMMsph kmicro

MAP 36 (03) 07 (19) 30 (03) 32 (04) 34 (21)

R50 avg 50 (012) 25 (018) 50 (016) 46 (018) 53(014)

R50 max 61 37 53 61 61

Table 22 Ablation results Standard deviations are in parentheses GMMclustering performed with the spherical (sph) type of covarianceK-means clustering is denoted as kmicro in the ablation table

comparing (0) and (1) 13 we observed that abstracts (A) are amore reliable representation for our trend detection baselinethan full texts (F) The possible explanation for that observa-tion is that an abstract is a summary of a scientific paper thatcovers only the central points and is semantically very conciseand rich In opposite the full text contains numerous parts thatare secondary (ie future work related work) and thus mayskew the documentrsquos overall meaning

comparing (0) and (2) we recognized that stratification (yes) im-proves results compared to the setting with non-stratified (no)randomly sampled data We would expect the effect of stratifi-cation to be even more substantial for collections in which thedistribution of documents over time is more skewed than in our

comparing (0) and (3) the length of the interval is a vital config-uration choice As we assumed the 16 year interval is ratherlong and 8 year interval could be a better choice primarily ifone aims to find short-term trends

comparing (0) and (4) we recognized that topics we obtain fromk-means have a similar nature to those from GMM HoweverGMM that take variance and covariance into account and esti-mate a soft assignment still perform slightly better

66 benchmark description

Below we report the contents of the TRENDNERT benchmark14

1 Two records containing information on documents One filewith the documents assigned to (down)trends and the otherfile with documents assigned to flat topics

13Numbers (0) (1) (2) (3) (4) are the configuration choices in the ablation table14httpsdoiorg1017605OSFIODHZCT

[ September 4 2020 at 1017 ndash classicthesis ]

67 summary 83

2 A script to produce document hash codes

3 A file to map every hash code to internal IDs in the benchmark

4 A script to compute MAP (default setting is ρ = 25)

5 READMEmd a file with guidance notes

Every document in the benchmark contains the following informa-tion

bull Paper ID internal ID of documents assigned to each documentin the original Semantic Scholar collection

bull Cluster ID unique ID of meta-cluster where the document wasobserved

bull LabelTag Name of the gold (down)trend assigned by crowd-workers (eg Cyber Security Fuzzy Sets and Systems)

bull Type of (down)trend candidate trend (T ) downtrend (D) or flattopic (F)

bull Hash ID15 of each paper from the original Semantic Scholarcollection

67 summary

Emerging trend detection is a promising task for Intelligent ProcessAutomation (IPA) that allows companies and funding agencies to au-tomatically analyze extensive textual collections in order to detectemerging fields and forecast the future employing machine learningalgorithms Thus the availability of publicly available benchmarksfor trend detection is of significant importance as only thereby canone evaluate the performance and robustness of the techniques em-ployed

Within this work we release TRENDNERT ndash the first publicly availablebenchmark for (down)trend detection that offers a realistic setting fordeveloping trend detection algorithms TRENDNERT also supportsdowntrend detection ndash a significant problem that was not addressedbefore We also present several experimental findings on trend detec-tion First stratification improves trend detection if the distribution ofdocuments is skewed over time Third abstract-based trend detectionperforms better than full text if straightforward models are used

68 future work

There are possible directions for future work

bull The presented approach to estimate the trendiness of the topicby means of linear regression is straightforward and can be

15MD5-hash of First Author + Title + Year

[ September 4 2020 at 1017 ndash classicthesis ]

84 benchmark for scientific trend detection

replaced by more sophisticated versions like Kendallrsquos τ cor-relation coefficient or the Least Squares Regression Smoother(LOESS) (Cleveland 1979)

bull It also would be challenging to replace the k-means clustering al-gorithm that we adopted within this work with the LDA modelwhile retaining the document representations as it was donein Dieng Ruiz and Blei 2019 The case to investigate wouldbe then whether the topic modeling outperforms the simpleclustering algorithm in this particular setting And would theresulting system benefit from the topic modeling algorithm

bull Furthermore it would be of interest to conduct more researchon the following What kind of document representation canmake effective use of the information in full texts Recentlyproposed contextualised word embeddings could be a possibledirection for these experiments

[ September 4 2020 at 1017 ndash classicthesis ]

7D I S C U S S I O N A N D F U T U R E W O R K

As we observed in the two parts of this thesis Intelligent Process Au-tomation (IPA) is a challenging research area due to its multifaceted-ness and appliance in a real-world scenario Its main objective is toautomatize time-consuming and routine tasks and free up the knowl-edge worker for more cognitively demanding tasks However thistask addresses many difficulties such as a lack of resources for bothtraining and evaluation of algorithms as well as an insufficient per-formance of machine learning techniques when applied to real-worlddata and practical problems In this thesis we investigated two IPA

applications within the Natural Language Processing (NLP) domainndash conversational agents and predictive analytics ndash as well as addressedsome of the most common issues

bull In the first part of the thesis we have addressed the challenge ofimplementing a robust conversational system for IPA purposeswithin the practical e-learning domain considering the condi-tions of missing training data The primary purpose of the re-sulting system is to perform the onboarding process betweenthe tutors and students This onboarding reduces repetitiveand time-consuming activities while allowing tutors to focuson solely mathematical questions which is the core task withthe most impact on students Besides that by interacting withusers the system augments the resources with structured andlabeled training data that we used to implement learnable dia-logue components

The realization of such a system was associated with severalchallenges Among others were missing structured data andambiguous user-generated content that requires a precise lan-guage understanding Also the system that simultaneouslycommunicates with a large number of users has to maintainadditional meta policies such as fallback and check-up policiesto hold the conversation intelligently

bull Moreover we have shown the rule-based dialogue systemrsquos po-tential extension using transfer learning We utilized a smalldataset of structured dialogues obtained in a trial-run of therule-based system and applied the current state-of-the-art Bidi-rectional Encoder Representations from Transformers (BERT) ar-chitecture to solve the domain-specific tasks Specifically weretrained two of three components in the conversational systemndash Named Entity Recognition (NER) and Dialogue Manager (DM)(ie NAP task) ndash and confirmed the applicability of such algo-

85

[ September 4 2020 at 1017 ndash classicthesis ]

86 discussion and future work

rithms in a real-world low-resource scenario by showing thereasonably high performance of resulting models

bull Last but not least in the second part of the thesis we addressedthe issue of missing publicly available benchmarks for the eval-uation of machine learning algorithms for the trend detectiontask To the best of our knowledge we released the first openbenchmark for this purpose which offers a realistic setting fordeveloping trend detection algorithms Besides that this bench-mark supports downtrend detection ndash a significant problemthat was not sufficiently addressed before We additionally pro-posed the evaluation measure and presented several experimen-tal findings on trend detection

The ultimate objective of Intelligent Process Automation (IPA) shouldbe a creation of robust algorithms in low-resource and domain-specificconditions This is still a challenging task however some of the theexperiments presented in this thesis revealed that this goal could bepotentially achieved by means of Transfer Learning (TL) techniques

[ September 4 2020 at 1017 ndash classicthesis ]

B I B L I O G R A P H Y

Agrawal Amritanshu Wei Fu and Tim Menzies (2018) bdquoWhat iswrong with topic modeling And how to fix it using search-basedsoftware engineeringldquo In Information and Software Technology 98pp 74ndash88 (cit on p 66)

Alan M (1950) bdquoTuringldquo In Computing machinery and intelligenceMind 59236 pp 433ndash460 (cit on p 7)

Alghamdi Rubayyi and Khalid Alfalqi (2015) bdquoA survey of topicmodeling in text miningldquo In Int J Adv Comput Sci Appl(IJACSA)61 (cit on pp 64ndash66)

Ammar Waleed Dirk Groeneveld Chandra Bhagavatula Iz BeltagyMiles Crawford Doug Downey Jason Dunkelberger Ahmed El-gohary Sergey Feldman Vu Ha et al (2018) bdquoConstruction ofthe literature graph in semantic scholarldquo In arXiv preprint arXiv1805 02262 (cit on p 72)

Ankerholz Amber (2016) 2016 Future of Open Source Survey Says OpenSource Is The Modern Architecture httpswwwlinuxcomnews2016-future-open-source-survey-says-open-source-modern-

architecture (cit on p 78)Asooja Kartik Georgeta Bordea Gabriela Vulcu and Paul Buitelaar

(2016) bdquoForecasting emerging trends from scientific literatureldquoIn Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) pp 417ndash420 (cit on p 68)

Augustin Jason (2016) Emergin Science and Technology Trends A Syn-thesis of Leading Forecast httpwwwdefenseinnovationmarket-placemilresources2016_SciTechReport_16June2016pdf (citon p 78)

Blei David M and John D Lafferty (2006) bdquoDynamic topic modelsldquoIn Proceedings of the 23rd international conference on Machine learn-ing ACM pp 113ndash120 (cit on pp 66 67)

Blei David M John D Lafferty et al (2007) bdquoA correlated topic modelof scienceldquo In The Annals of Applied Statistics 11 pp 17ndash35 (citon p 66)

Blei David M Andrew Y Ng and Michael I Jordan (2003) bdquoLatentdirichlet allocationldquo In Journal of machine Learning research 3Janpp 993ndash1022 (cit on p 66)

Bloem Peter (2019) Transformers from Scratch httpwwwpeterbloemnlblogtransformers (cit on p 45)

Bocklisch Tom Joey Faulkner Nick Pawlowski and Alan Nichol(2017) bdquoRasa Open source language understanding and dialoguemanagementldquo In arXiv preprint arXiv171205181 (cit on p 15)

Bolelli Levent Seyda Ertekin and C Giles (2009) bdquoTopic and trenddetection in text collections using latent dirichlet allocationldquo InAdvances in Information Retrieval pp 776ndash780 (cit on p 66)

87

[ September 4 2020 at 1017 ndash classicthesis ]

88 Bibliography

Bolelli Levent Seyda Ertekin Ding Zhou and C Lee Giles (2009)bdquoFinding topic trends in digital librariesldquo In Proceedings of the 9thACMIEEE-CS joint conference on Digital libraries ACM pp 69ndash72

(cit on p 66)Bordes Antoine Y-Lan Boureau and Jason Weston (2016) bdquoLearning

end-to-end goal-oriented dialogldquo In arXiv preprint arXiv160507683(cit on pp 16 17)

Brooks Chuck (2016) 7 Top Tech Trends Impacting Innovators in 2016httpinnovationexcellencecomblog201512267-top-

tech-trends-impacting-innovators-in-2016 (cit on p 78)Burtsev Mikhail Alexander Seliverstov Rafael Airapetyan Mikhail

Arkhipov Dilyara Baymurzina Eugene Botvinovsky NickolayBushkov Olga Gureenkova Andrey Kamenev Vasily Konovalovet al (2018) bdquoDeepPavlov An Open Source Library for Conver-sational AIldquo In (cit on p 15)

Chang Jonathan David M Blei et al (2010) bdquoHierarchical relationalmodels for document networksldquo In The Annals of Applied Statis-tics 41 pp 124ndash150 (cit on p 66)

Chen Hongshen Xiaorui Liu Dawei Yin and Jiliang Tang (2017)bdquoA survey on dialogue systems Recent advances and new fron-tiersldquo In Acm Sigkdd Explorations Newsletter 192 pp 25ndash35 (citon pp 14ndash16)

Chen Lingzhen Alessandro Moschitti Giuseppe Castellucci AndreaFavalli and Raniero Romagnoli (2018) bdquoTransfer Learning for In-dustrial Applications of Named Entity Recognitionldquo In NL4AIAI IA pp 129ndash140 (cit on p 43)

Cleveland William S (1979) bdquoRobust locally weighted regression andsmoothing scatterplotsldquo In Journal of the American statistical asso-ciation 74368 pp 829ndash836 (cit on p 84)

Collobert Ronan Jason Weston Leacuteon Bottou Michael Karlen KorayKavukcuoglu and Pavel Kuksa (2011) bdquoNatural language pro-cessing (almost) from scratchldquo In Journal of machine learning re-search 12Aug pp 2493ndash2537 (cit on p 16)

Cuayaacutehuitl Heriberto Simon Keizer and Oliver Lemon (2015) bdquoStrate-gic dialogue management via deep reinforcement learningldquo InarXiv preprint arXiv151108099 (cit on p 14)

Cui Lei Shaohan Huang Furu Wei Chuanqi Tan Chaoqun Duanand Ming Zhou (2017) bdquoSuperagent A customer service chat-bot for e-commerce websitesldquo In Proceedings of ACL 2017 SystemDemonstrations pp 97ndash102 (cit on p 8)

Curiskis Stephan A Barry Drake Thomas R Osborn and Paul JKennedy (2019) bdquoAn evaluation of document clustering and topicmodelling in two online social networks Twitter and Redditldquo InInformation Processing amp Management (cit on p 74)

Dai Zihang Zhilin Yang Yiming Yang William W Cohen Jaime Car-bonell Quoc V Le and Ruslan Salakhutdinov (2019) bdquoTransformer-xl Attentive language models beyond a fixed-length contextldquo InarXiv preprint arXiv190102860 (cit on pp 43 44)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 89

Daniel Ben (2015) bdquoBig Data and analytics in higher education Op-portunities and challengesldquo In British journal of educational tech-nology 465 pp 904ndash920 (cit on p 22)

Deerwester Scott Susan T Dumais George W Furnas Thomas K Lan-dauer and Richard Harshman (1990) bdquoIndexing by latent seman-tic analysisldquo In Journal of the American society for information sci-ence 416 pp 391ndash407 (cit on p 65)

Deoras Anoop Kaisheng Yao Xiaodong He Li Deng Geoffrey Ger-son Zweig Ruhi Sarikaya Dong Yu Mei-Yuh Hwang and Gre-goire Mesnil (2015) Assignment of semantic labels to a sequence ofwords using neural network architectures US Patent App 14016186

(cit on p 13)Devlin Jacob Ming-Wei Chang Kenton Lee and Kristina Toutanova

(2018) bdquoBert Pre-training of deep bidirectional transformers forlanguage understandingldquo In arXiv preprint arXiv181004805 (citon pp 42ndash45)

Dhingra Bhuwan Lihong Li Xiujun Li Jianfeng Gao Yun-NungChen Faisal Ahmed and Li Deng (2016) bdquoTowards end-to-end re-inforcement learning of dialogue agents for information accessldquoIn arXiv preprint arXiv160900777 (cit on p 16)

Dieng Adji B Francisco JR Ruiz and David M Blei (2019) bdquoTopicModeling in Embedding Spacesldquo In arXiv preprint arXiv190704907(cit on pp 74 84)

Donahue Jeff Yangqing Jia Oriol Vinyals Judy Hoffman Ning ZhangEric Tzeng and Trevor Darrell (2014) bdquoDecaf A deep convolu-tional activation feature for generic visual recognitionldquo In Inter-national conference on machine learning pp 647ndash655 (cit on p 43)

Eric Mihail and Christopher D Manning (2017) bdquoKey-value retrievalnetworks for task-oriented dialogueldquo In arXiv preprint arXiv 170505414 (cit on p 16)

Fang Hao Hao Cheng Maarten Sap Elizabeth Clark Ari HoltzmanYejin Choi Noah A Smith and Mari Ostendorf (2018) bdquoSoundingboard A user-centric and content-driven social chatbotldquo In arXivpreprint arXiv180410202 (cit on p 10)

Friedrich Fabian Jan Mendling and Frank Puhlmann (2011) bdquoPro-cess model generation from natural language textldquo In Interna-tional Conference on Advanced Information Systems Engineering pp 482

ndash496 (cit on p 2)Frot Mathilde (2016) 5 Trends in Computer Science Research https

wwwtopuniversitiescomcoursescomputer-science-info-

rmation-systems5-trends-computer-science-research (citon p 78)

Glasmachers Tobias (2017) bdquoLimits of end-to-end learningldquo In arXivpreprint arXiv170408305 (cit on pp 16 17)

Goddeau David Helen Meng Joseph Polifroni Stephanie Seneffand Senis Busayapongchai (1996) bdquoA form-based dialogue man-ager for spoken language applicationsldquo In Proceeding of FourthInternational Conference on Spoken Language Processing ICSLPrsquo96Vol 2 IEEE pp 701ndash704 (cit on p 14)

[ September 4 2020 at 1017 ndash classicthesis ]

90 Bibliography

Gohr Andre Alexander Hinneburg Rene Schult and Myra Spiliopou-lou (2009) bdquoTopic evolution in a stream of documentsldquo In Pro-ceedings of the 2009 SIAM International Conference on Data MiningSIAM pp 859ndash870 (cit on p 65)

Gollapalli Sujatha Das and Xiaoli Li (2015) bdquoEMNLP versus ACLAnalyzing NLP research over timeldquo In EMNLP pp 2002ndash2006

(cit on p 72)Griffiths Thomas L and Mark Steyvers (2004) bdquoFinding scientific top-

icsldquo In Proceedings of the National academy of Sciences 101suppl 1pp 5228ndash5235 (cit on p 65)

Griffiths Thomas L Michael I Jordan Joshua B Tenenbaum andDavid M Blei (2004) bdquoHierarchical topic models and the nestedChinese restaurant processldquo In Advances in neural information pro-cessing systems pp 17ndash24 (cit on p 66)

Hall David Daniel Jurafsky and Christopher D Manning (2008) bdquoStu-dying the history of ideas using topic modelsldquo In Proceedingsof the conference on empirical methods in natural language processingAssociation for Computational Linguistics pp 363ndash371 (cit onp 66)

Harriet Taylor (2015) Privacy will hit tipping point in 2016 httpwwwcnbccom20151109privacy-will-hit-tipping-point-

in-2016html (cit on p 78)He Jiangen and Chaomei Chen (2018) bdquoPredictive Effects of Novelty

Measured by Temporal Embeddings on the Growth of ScientificLiteratureldquo In Frontiers in Research Metrics and Analytics 3 p 9

(cit on pp 64 68 71)He Junxian Zhiting Hu Taylor Berg-Kirkpatrick Ying Huang and

Eric P Xing (2017) bdquoEfficient correlated topic modeling with topicembeddingldquo In Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ACM pp 225ndash233 (cit on pp 66 67)

He Qi Bi Chen Jian Pei Baojun Qiu Prasenjit Mitra and Lee Giles(2009) bdquoDetecting topic evolution in scientific literature How cancitations helpldquo In Proceedings of the 18th ACM conference on In-formation and knowledge management ACM pp 957ndash966 (cit onp 68)

Henderson Matthew Blaise Thomson and Jason D Williams (2014)bdquoThe second dialog state tracking challengeldquo In Proceedings of the15th Annual Meeting of the Special Interest Group on Discourse andDialogue (SIGDIAL) pp 263ndash272 (cit on p 15)

Henderson Matthew Blaise Thomson and Steve Young (2013) bdquoDeepneural network approach for the dialog state tracking challengeldquoIn Proceedings of the SIGDIAL 2013 Conference pp 467ndash471 (cit onp 14)

Hess Ann Hari Iyer and William Malm (2001) bdquoLinear trend analy-sis a comparison of methodsldquo In Atmospheric Environment 3530Visibility Aerosol and Atmospheric Optics pp 5211 ndash5222 issn1352-2310 doi httpsdoiorg101016S1352- 2310(01)00342-9 url httpwwwsciencedirectcomsciencearticlepiiS1352231001003429 (cit on p 75)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 91

Hofmann Thomas (1999) bdquoProbabilistic latent semantic analysisldquo InProceedings of the Fifteenth conference on Uncertainty in artificial in-telligence Morgan Kaufmann Publishers Inc pp 289ndash296 (cit onp 65)

Honnibal Matthew and Ines Montani (2017) bdquospacy 2 Natural lan-guage understanding with bloom embeddings convolutional neu-ral networks and incremental parsingldquo In To appear 7 (cit onp 15)

Howard Jeremy and Sebastian Ruder (2018) bdquoUniversal languagemodel fine-tuning for text classificationldquo In arXiv preprint arXiv180106146 (cit on p 44)

IEEE Computer Society (2016) Top 9 Computing Technology Trends for2016 httpswwwscientificcomputingcomnews201601top-9-computing-technology-trends-2016 (cit on p 78)

Ivancic Lucija Dalia Suša Vugec and Vesna Bosilj Vukšic (2019) bdquoRobo-tic Process Automation Systematic Literature Reviewldquo In In-ternational Conference on Business Process Management Springerpp 280ndash295 (cit on pp 1 2)

Jain Mohit Pratyush Kumar Ramachandra Kota and Shwetak NPatel (2018) bdquoEvaluating and informing the design of chatbotsldquoIn Proceedings of the 2018 Designing Interactive Systems ConferenceACM pp 895ndash906 (cit on p 8)

Khanpour Hamed Nishitha Guntakandla and Rodney Nielsen (2016)bdquoDialogue act classification in domain-independent conversationsusing a deep recurrent neural networkldquo In Proceedings of COL-ING 2016 the 26th International Conference on Computational Lin-guistics Technical Papers pp 2012ndash2021 (cit on p 13)

Kim Young-Bum Karl Stratos Ruhi Sarikaya and Minwoo Jeong(2015) bdquoNew transfer learning techniques for disparate label setsldquoIn Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference onNatural Language Processing (Volume 1 Long Papers) pp 473ndash482

(cit on p 43)Kolhatkar Varada Heike Zinsmeister and Graeme Hirst (2013) bdquoAn-

notating anaphoric shell nouns with their antecedentsldquo In Pro-ceedings of the 7th Linguistic Annotation Workshop and Interoperabil-ity with Discourse pp 112ndash121 (cit on pp 77 78)

Kontostathis April Leon M Galitsky William M Pottenger SomaRoy and Daniel J Phelps (2004) bdquoA survey of emerging trend de-tection in textual data miningldquo In Survey of text mining Springerpp 185ndash224 (cit on p 63)

Krippendorff Klaus (2011) bdquoComputing Krippendorffrsquos alpha- relia-bilityldquo In Annenberg School for Communication (ASC) DepartmentalPapers 43 (cit on p 78)

Kurata Gakuto Bing Xiang Bowen Zhou and Mo Yu (2016) bdquoLever-aging sentence-level information with encoder lstm for semanticslot fillingldquo In arXiv preprint arXiv160101530 (cit on p 13)

Landis J Richard and Gary G Koch (1977) bdquoThe measurement ofobserver agreement for categorical dataldquo In biometrics pp 159ndash174 (cit on p 78)

[ September 4 2020 at 1017 ndash classicthesis ]

92 Bibliography

Le Minh-Hoang Tu-Bao Ho and Yoshiteru Nakamori (2005) bdquoDe-tecting emerging trends from scientific corporaldquo In InternationalJournal of Knowledge and Systems Sciences 22 pp 53ndash59 (cit onp 68)

Le Quoc V and Tomas Mikolov (2014) bdquoDistributed Representationsof Sentences and Documentsldquo In ICML Vol 14 pp 1188ndash1196

(cit on p 74)Lee Ji Young and Franck Dernoncourt (2016) bdquoSequential short-text

classification with recurrent and convolutional neural networksldquoIn arXiv preprint arXiv160303827 (cit on p 13)

Leopold Henrik Han van der Aa and Hajo A Reijers (2018) bdquoIden-tifying candidate tasks for robotic process automation in textualprocess descriptionsldquo In Enterprise Business-Process and Informa-tion Systems Modeling Springer pp 67ndash81 (cit on p 2)

Levenshtein Vladimir I (1966) bdquoBinary codes capable of correctingdeletions insertions and reversalsldquo In Soviet physics doklady 8pp 707ndash710 (cit on p 25)

Liao Vera Masud Hussain Praveen Chandar Matthew Davis Yasa-man Khazaeni Marco Patricio Crasso Dakuo Wang Michael Mu-ller N Sadat Shami Werner Geyer et al (2018) bdquoAll Work andNo Playldquo In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems ACM p 3 (cit on p 8)

Lowe Ryan Thomas Nissan Pow Iulian Vlad Serban Laurent Char-lin Chia-Wei Liu and Joelle Pineau (2017) bdquoTraining end-to-enddialogue systems with the ubuntu dialogue corpusldquo In Dialogueamp Discourse 81 pp 31ndash65 (cit on p 22)

Luger Ewa and Abigail Sellen (2016) bdquoLike having a really bad PAthe gulf between user expectation and experience of conversa-tional agentsldquo In Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems ACM pp 5286ndash5297 (cit on p 8)

MacQueen James (1967) bdquoSome methods for classification and analy-sis of multivariate observationsldquo In Proceedings of the fifth Berkeleysymposium on mathematical statistics and probability Vol 1 OaklandCA USA pp 281ndash297 (cit on p 74)

Manning Christopher D Prabhakar Raghavan and Hinrich Schuumltze(2008) Introduction to Information Retrieval Web publication at in-formationretrievalorg Cambridge University Press (cit on p 74)

Markov Igor (2015) 13 Of 2015rsquos Hottest Topics In Computer ScienceResearch httpswwwforbescomsitesquora2015042213-of-2015s-hottest-topics-in-computer-science-research

fb0d46b1e88d (cit on p 78)Martin James H and Daniel Jurafsky (2009) Speech and language pro-

cessing An introduction to natural language processing computationallinguistics and speech recognition PearsonPrentice Hall Upper Sad-dle River

Mei Qiaozhu Deng Cai Duo Zhang and ChengXiang Zhai (2008)bdquoTopic modeling with network regularizationldquo In Proceedings ofthe 17th international conference on World Wide Web ACM pp 101ndash110 (cit on p 65)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 93

Meyerson Bernard and DiChristina Mariette (2016) These are the top10 emerging technologies of 2016 httpwww3weforumorgdocsGAC16_Top10_Emerging_Technologies_2016_reportpdf (cit onp 78)

Mikolov Tomas Kai Chen Greg Corrado and Jeffrey Dean (2013)bdquoEfficient estimation of word representations in vector spaceldquo InarXiv preprint arXiv13013781 (cit on p 74)

Miller Alexander Adam Fisch Jesse Dodge Amir-Hossein KarimiAntoine Bordes and Jason Weston (2016) bdquoKey-value memorynetworks for directly reading documentsldquo In (cit on p 16)

Mohanty Soumendra and Sachin Vyas (2018) bdquoHow to Compete inthe Age of Artificial Intelligenceldquo In (cit on pp III V 1 2)

Moiseeva Alena and Hinrich Schuumltze (2020) bdquoTRENDNERT A Bench-mark for Trend and Downtrend Detection in a Scientific DomainldquoIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial In-telligence (cit on p 71)

Moiseeva Alena Dietrich Trautmann Michael Heimann and Hin-rich Schuumltze (2020) bdquoMultipurpose Intelligent Process Automa-tion via Conversational Assistantldquo In arXiv preprint arXiv200102284 (cit on pp 21 41)

Monaghan Fergal Georgeta Bordea Krystian Samp and Paul Buite-laar (2010) bdquoExploring your research Sprinkling some saffron onsemantic web dog foodldquo In Semantic Web Challenge at the Inter-national Semantic Web Conference Vol 117 Citeseer pp 420ndash435

(cit on p 68)Monostori Laacuteszloacute Andraacutes Maacuterkus Hendrik Van Brussel and E West-

kaumlmpfer (1996) bdquoMachine learning approaches to manufactur-ingldquo In CIRP annals 452 pp 675ndash712 (cit on p 15)

Moody Christopher E (2016) bdquoMixing dirichlet topic models andword embeddings to make lda2vecldquo In arXiv preprint arXiv160502019 (cit on p 74)

Mou Lili Zhao Meng Rui Yan Ge Li Yan Xu Lu Zhang and ZhiJin (2016) bdquoHow transferable are neural networks in nlp applica-tionsldquo In arXiv preprint arXiv160306111 (cit on p 43)

Mrkšic Nikola Diarmuid O Seacuteaghdha Tsung-Hsien Wen Blaise Thom-son and Steve Young (2016) bdquoNeural belief tracker Data-drivendialogue state trackingldquo In arXiv preprint arXiv160603777 (citon p 14)

Nessma Joussef (2015) Top 10 Hottest Research Topics in Computer Sci-ence http www pouted com top - 10 - hottest - research -

topics-in-computer-science (cit on p 78)Nie Binling and Shouqian Sun (2017) bdquoUsing text mining techniques

to identify research trends A case study of design researchldquo InApplied Sciences 74 p 401 (cit on p 68)

Pan Sinno Jialin and Qiang Yang (2009) bdquoA survey on transfer learn-ingldquo In IEEE Transactions on knowledge and data engineering 2210pp 1345ndash1359 (cit on p 41)

Qu Lizhen Gabriela Ferraro Liyuan Zhou Weiwei Hou and Tim-othy Baldwin (2016) bdquoNamed entity recognition for novel types

[ September 4 2020 at 1017 ndash classicthesis ]

94 Bibliography

by transfer learningldquo In arXiv preprint arXiv161009914 (cit onp 43)

Radford Alec Jeffrey Wu Rewon Child David Luan Dario Amodeiand Ilya Sutskever (2019) bdquoLanguage models are unsupervisedmultitask learnersldquo In OpenAI Blog 18 (cit on p 44)

Reese Hope (2015) 7 trends for artificial intelligence in 2016 rsquoLike 2015on steroidsrsquo httpwwwtechrepubliccomarticle7-trends-for-artificial-intelligence-in-2016-like-2015-on-steroids

(cit on p 78)Rivera Maricel (2016) Is Digital Game-Based Learning The Future Of

Learning httpselearningindustrycomdigital-game-based-

learning-future (cit on p 78)Rotolo Daniele Diana Hicks and Ben R Martin (2015) bdquoWhat is an

emerging technologyldquo In Research Policy 4410 pp 1827ndash1843

(cit on p 64)Salatino Angelo (2015) bdquoEarly detection and forecasting of research

trendsldquo In (cit on pp 63 66)Serban Iulian V Alessandro Sordoni Yoshua Bengio Aaron Courville

and Joelle Pineau (2016) bdquoBuilding end-to-end dialogue systemsusing generative hierarchical neural network modelsldquo In Thirti-eth AAAI Conference on Artificial Intelligence (cit on p 11)

Sharif Razavian Ali Hossein Azizpour Josephine Sullivan and Ste-fan Carlsson (2014) bdquoCNN features off-the-shelf an astoundingbaseline for recognitionldquo In Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops pp 806ndash813 (citon p 43)

Shibata Naoki Yuya Kajikawa and Yoshiyuki Takeda (2009) bdquoCom-parative study on methods of detecting research fronts using dif-ferent types of citationldquo In Journal of the American Society for in-formation Science and Technology 603 pp 571ndash580 (cit on p 68)

Shibata Naoki Yuya Kajikawa Yoshiyuki Takeda and KatsumoriMatsushima (2008) bdquoDetecting emerging research fronts basedon topological measures in citation networks of scientific publica-tionsldquo In Technovation 2811 pp 758ndash775 (cit on p 68)

Small Henry (2006) bdquoTracking and predicting growth areas in sci-enceldquo In Scientometrics 683 pp 595ndash610 (cit on p 68)

Small Henry Kevin W Boyack and Richard Klavans (2014) bdquoIden-tifying emerging topics in science and technologyldquo In ResearchPolicy 438 pp 1450ndash1467 (cit on p 64)

Su Pei-Hao Nikola Mrkšic Intildeigo Casanueva and Ivan Vulic (2018)bdquoDeep Learning for Conversational AIldquo In Proceedings of the 2018Conference of the North American Chapter of the Association for Com-putational Linguistics Tutorial Abstracts pp 27ndash32 (cit on p 12)

Sutskever Ilya Oriol Vinyals and Quoc V Le (2014) bdquoSequence tosequence learning with neural networksldquo In Advances in neuralinformation processing systems pp 3104ndash3112 (cit on pp 16 17)

Trautmann Dietrich Johannes Daxenberger Christian Stab HinrichSchuumltze and Iryna Gurevych (2019) bdquoRobust Argument Unit Recog-nition and Classificationldquo In arXiv preprint arXiv190409688 (citon p 41)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 95

Tu Yi-Ning and Jia-Lang Seng (2012) bdquoIndices of novelty for emerg-ing topic detectionldquo In Information processing amp management 482pp 303ndash325 (cit on p 64)

Turing Alan (1949) London Times letter to the editor (cit on p 7)Ultes Stefan Lina Barahona Pei-Hao Su David Vandyke Dongho

Kim Inigo Casanueva Paweł Budzianowski Nikola Mrkšic Tsu-ng-Hsien Wen et al (2017) bdquoPydial A multi-domain statistical di-alogue system toolkitldquo In Proceedings of ACL 2017 System Demon-strations pp 73ndash78 (cit on p 15)

Vaswani Ashish Noam Shazeer Niki Parmar Jakob Uszkoreit LlionJones Aidan N Gomez Łukasz Kaiser and Illia Polosukhin (2017)bdquoAttention is all you needldquo In Advances in neural information pro-cessing systems pp 5998ndash6008 (cit on p 45)

Vincent Pascal Hugo Larochelle Yoshua Bengio and Pierre-AntoineManzagol (2008) bdquoExtracting and composing robust features withdenoising autoencodersldquo In Proceedings of the 25th internationalconference on Machine learning ACM pp 1096ndash1103 (cit on p 44)

Vodolaacuten Miroslav Rudolf Kadlec and Jan Kleindienst (2015) bdquoHy-brid dialog state trackerldquo In arXiv preprint arXiv151003710 (citon p 14)

Wang Alex Amanpreet Singh Julian Michael Felix Hill Omer Levyand Samuel R Bowman (2018) bdquoGlue A multi-task benchmarkand analysis platform for natural language understandingldquo InarXiv preprint arXiv180407461 (cit on p 44)

Wang Chong and David M Blei (2011) bdquoCollaborative topic modelingfor recommending scientific articlesldquo In Proceedings of the 17thACM SIGKDD international conference on Knowledge discovery anddata mining ACM pp 448ndash456 (cit on p 66)

Wang Xuerui and Andrew McCallum (2006) bdquoTopics over time anon-Markov continuous-time model of topical trendsldquo In Pro-ceedings of the 12th ACM SIGKDD international conference on Knowl-edge discovery and data mining ACM pp 424ndash433 (cit on pp 65ndash67)

Wang Zhuoran and Oliver Lemon (2013) bdquoA simple and generic be-lief tracking mechanism for the dialog state tracking challengeOn the believability of observed informationldquo In Proceedings ofthe SIGDIAL 2013 Conference pp 423ndash432 (cit on p 14)

Weizenbaum Joseph et al (1966) bdquoELIZAmdasha computer program forthe study of natural language communication between man andmachineldquo In Communications of the ACM 91 pp 36ndash45 (cit onp 7)

Wen Tsung-Hsien Milica Gasic Nikola Mrksic Pei-Hao Su DavidVandyke and Steve Young (2015) bdquoSemantically conditioned lstm-based natural language generation for spoken dialogue systemsldquoIn arXiv preprint arXiv150801745 (cit on p 14)

Wen Tsung-Hsien David Vandyke Nikola Mrksic Milica Gasic LinaM Rojas-Barahona Pei-Hao Su Stefan Ultes and Steve Young(2016) bdquoA network-based end-to-end trainable task-oriented dia-logue systemldquo In arXiv preprint arXiv160404562 (cit on pp 1116 17 22)

[ September 4 2020 at 1017 ndash classicthesis ]

96 Bibliography

Werner Alexander Dietrich Trautmann Dongheui Lee and RobertoLampariello (2015) bdquoGeneralization of optimal motion trajecto-ries for bipedal walkingldquo In pp 1571ndash1577 (cit on p 41)

Williams Jason D Kavosh Asadi and Geoffrey Zweig (2017) bdquoHybridcode networks practical and efficient end-to-end dialog controlwith supervised and reinforcement learningldquo In arXiv preprintarXiv170203274 (cit on p 17)

Williams Jason Antoine Raux and Matthew Henderson (2016) bdquoThedialog state tracking challenge series A reviewldquo In Dialogue ampDiscourse 73 pp 4ndash33 (cit on p 12)

Wu Chien-Sheng Richard Socher and Caiming Xiong (2019) bdquoGlobal-to-local Memory Pointer Networks for Task-Oriented DialogueldquoIn arXiv preprint arXiv190104713 (cit on pp 15ndash17)

Wu Yonghui Mike Schuster Zhifeng Chen Quoc V Le Moham-mad Norouzi Wolfgang Macherey Maxim Krikun Yuan CaoQin Gao Klaus Macherey et al (2016) bdquoGooglersquos neural ma-chine translation system Bridging the gap between human andmachine translationldquo In arXiv preprint arXiv160908144 (cit onp 45)

Xie Pengtao and Eric P Xing (2013) bdquoIntegrating Document Cluster-ing and Topic Modelingldquo In Proceedings of the 30th Conference onUncertainty in Artificial Intelligence (cit on p 74)

Yan Zhao Nan Duan Peng Chen Ming Zhou Jianshe Zhou andZhoujun Li (2017) bdquoBuilding task-oriented dialogue systems foronline shoppingldquo In Thirty-First AAAI Conference on Artificial In-telligence (cit on p 14)

Yogatama Dani Michael Heilman Brendan OrsquoConnor Chris DyerBryan R Routledge and Noah A Smith (2011) bdquoPredicting a scien-tific communityrsquos response to an articleldquo In Proceedings of the Con-ference on Empirical Methods in Natural Language Processing Asso-ciation for Computational Linguistics pp 594ndash604 (cit on p 65)

Young Steve Milica Gašic Blaise Thomson and Jason D Williams(2013) bdquoPomdp-based statistical spoken dialog systems A reviewldquoIn Proceedings of the IEEE 1015 pp 1160ndash1179 (cit on p 17)

Zaino Jennifer (2016) 2016 Trends for Semantic Web and Semantic Tech-nologies http www dataversity net 2017 - predictions -

semantic-web-semantic-technologies (cit on p 78)Zhou Hao Minlie Huang et al (2016) bdquoContext-aware natural lan-

guage generation for spoken dialogue systemsldquo In Proceedingsof COLING 2016 the 26th International Conference on ComputationalLinguistics Technical Papers pp 2032ndash2041 (cit on pp 14 15)

Zhu Yukun Ryan Kiros Rich Zemel Ruslan Salakhutdinov RaquelUrtasun Antonio Torralba and Sanja Fidler (2015) bdquoAligningbooks and movies Towards story-like visual explanations by watch-ing movies and reading booksldquo In Proceedings of the IEEE interna-tional conference on computer vision pp 19ndash27 (cit on p 44)

[ September 4 2020 at 1017 ndash classicthesis ]

  • Abstract
  • Abstract
  • Acknowledgements
  • Contents
  • List of Figures
    • List of Figures
      • List of Tables
        • List of Tables
          • Acronyms
            • Acronyms
              • 1 Introduction
                • 11 Outline
                  • E-learning Conversational Assistant
                    • 2 Foundations
                      • 21 Introduction
                      • 22 Conversational Assistants
                        • 221 Taxonomy
                        • 222 Conventional Architecture
                        • 223 Existing Approaches
                          • 23 Target Domain amp Task Definition
                          • 24 Summary
                            • 3 Multipurpose IPA via Conversational Assistant
                              • 31 Introduction
                                • 311 Outline and Contributions
                                  • 32 Model
                                    • 321 OMB+ Design
                                    • 322 Preprocessing
                                    • 323 Natural Language Understanding
                                    • 324 Dialogue Manager
                                    • 325 Meta Policy
                                    • 326 Response Generation
                                      • 33 Evaluation
                                        • 331 Automated Evaluation
                                        • 332 Human Evaluation amp Error Analysis
                                          • 34 Structured Dialogue Acquisition
                                          • 35 Summary
                                          • 36 Future Work
                                            • 4 Deep Learning Re-Implementation of Units
                                              • 41 Introduction
                                                • 411 Outline and Contributions
                                                  • 42 Related Work
                                                  • 43 BERT Bidirectional Encoder Representations from Transformers
                                                  • 44 Data amp Descriptive Statistics
                                                  • 45 Named Entity Recognition
                                                    • 451 Model Settings
                                                      • 46 Next Action Prediction
                                                        • 461 Model Settings
                                                          • 47 Evaluation and Results
                                                          • 48 Error Analysis
                                                          • 49 Summary
                                                          • 410 Future Work
                                                              • Clustering-based Trend Analyzer
                                                                • 5 Foundations
                                                                  • 51 Motivation and Challenges
                                                                  • 52 Detection of Emerging Research Trends
                                                                    • 521 Methods for Topic Modeling
                                                                    • 522 Topic Evolution of Scientific Publications
                                                                      • 53 Summary
                                                                        • 6 Benchmark for Scientific Trend Detection
                                                                          • 61 Motivation
                                                                            • 611 Outline and Contributions
                                                                              • 62 Corpus Creation
                                                                                • 621 Underlying Data
                                                                                • 622 Stratification
                                                                                • 623 Document representations amp Clustering
                                                                                • 624 Trend amp Downtrend Estimation
                                                                                • 625 (Down)trend Candidates Validation
                                                                                  • 63 Benchmark Annotation
                                                                                    • 631 Inter-annotator Agreement
                                                                                    • 632 Gold Trends
                                                                                      • 64 Evaluation Measure for Trend Detection
                                                                                      • 65 Trend Detection Baselines
                                                                                        • 651 Configurations
                                                                                        • 652 Experimental Setup and Results
                                                                                          • 66 Benchmark Description
                                                                                          • 67 Summary
                                                                                          • 68 Future Work
                                                                                            • 7 Discussion and Future Work
                                                                                            • Bibliography
Page 5: Statistical Natural Language Processing Methods for

Z U S A M M E N FA S S U N G

Robotergesteuerte Prozessautomatisierung (RPA) ist eine Art von Soft-ware-Bots die manuelle menschliche Taumltigkeiten wie die Eingabevon Daten in das System die Anmeldung in Benutzerkonten oderdie Ausfuumlhrung einfacher aber sich wiederholender Arbeitsablaumlufenachahmt (Mohanty and Vyas 2018) Einer der Hauptvorteile undgleichzeitig Nachteil der RPA-bots ist jedoch deren Faumlhigkeit die ge-stellte Aufgabe punktgenau zu erfuumlllen Einerseits ist ein solches Sys-tem in der Lage die Aufgabe akkurat sorgfaumlltig und schnell auszu-fuumlhren Andererseits ist es sehr anfaumlllig fuumlr Veraumlnderungen in definier-ten Szenarien Da der RPA-Bot fuumlr eine bestimmte Aufgabe konzipiertist ist es oft nicht moumlglich ihn an andere Domaumlnen oder sogar fuumlreinfache Aumlnderungen in einem Arbeitsablauf anzupassen (Mohantyand Vyas 2018) Diese Unfaumlhigkeit sich an veraumlnderte Bedingungenanzupassen fuumlhrte zu einem weiteren Verbesserungsbereich fuumlr RPA-bots ndash den Intelligenten Prozessautomatisierungssystemen (IPA)

IPA-Bots kombinieren RPA mit Kuumlnstlicher Intelligenz (AI) und koumln-nen komplexe und kognitiv anspruchsvollere Aufgaben erfuumlllen dieuA Schlussfolgerungen und natuumlrliches Sprachverstaumlndnis erfordernDiese Systeme uumlbernehmen zeitaufwaumlndige und routinemaumlszligige Auf-gaben ermoumlglichen somit einen intelligenten Arbeitsablauf und be-freien Fachkraumlfte fuumlr die Durchfuumlhrung komplizierterer AufgabenBisher wurden die IPA-Techniken hauptsaumlchlich im Bereich der Bild-verarbeitung eingesetzt In der letzten Zeit wurde die natuumlrlicheSprachverarbeitung (NLP) jedoch auch zu einem der potenziellen An-wendungen fuumlr IPA und zwar aufgrund von der Faumlhigkeit die mensch-liche Sprache zu interpretieren NLP-Methoden werden eingesetztum groszlige Mengen an Textdaten zu analysieren und auf verschiedeneAnfragen zu reagieren Auch wenn die verfuumlgbaren Daten unstruk-turiert sind oder kein vordefiniertes Format haben (zB E-Mails) oderwenn die in einem variablen Format vorliegen (zB Rechnungen ju-ristische Dokumente) dann werden ebenfalls die NLP Techniken an-gewendet um die relevanten Informationen zu extrahieren die dannzur Loumlsung verschiedener Probleme verwendet werden koumlnnen

NLP im Rahmen von IPA beschraumlnkt sich jedoch nicht auf die Extrak-tion relevanter Daten aus Textdokumenten Eine der zentralen An-wendungen von IPA sind Konversationsagenten die zur Interaktionzwischen Mensch und Maschine eingesetzt werden Konversations-agenten erfahren enorme Nachfrage da sie in der Lage sind einegroszlige Anzahl von Benutzern gleichzeitig zu unterstuumltzen und dabeiin einer natuumlrlichen Sprache kommunizieren Die Implementierungeines Chatsystems umfasst verschiedene Arten von NLP-Teilaufgabenbeginnend mit dem Verstaumlndnis der natuumlrlichen Sprache (zB Ab-sichtserkennung Extraktion von Entitaumlten) uumlber das Dialogmanage-

V

[ September 4 2020 at 1017 ndash classicthesis ]

ment (zB Festlegung der naumlchstmoumlglichen Bot-Aktion) bis hin zurResponse-Generierung Ein typisches Dialogsystem fuumlr IPA-Zweckeuumlbernimmt in der Regel unkomplizierte Kundendienstanfragen (zBBeantwortung von FAQs) so dass sich die Mitarbeiter auf komplexereAnfragen konzentrieren koumlnnen

Diese Dissertation umfasst zwei Bereiche die durch das breitereThema vereint sind naumlmlich die Intelligente Prozessautomatisierung(IPA) unter Verwendung statistischer Methoden der natuumlrlichen Sprach-verarbeitung (NLP)

Der erste Block dieser Arbeit (Kapitel 2 ndash Kapitel 4) befasst sichmit der Impementierung eines Konversationsagenten fuumlr IPA-Zweckeinnerhalb der E-Learning-Domaumlne Wie bereits erwaumlhnt sind Chat-bots eine der zentralen Anwendungen fuumlr die IPA-Domaumlne da siezeitaufwaumlndige Aufgaben in einer natuumlrlichen Sprache effektiv aus-fuumlhren koumlnnen Der IPA-Kommunikationsbot der in dieser Arbeitrealisiert wurde kuumlmmert sich ebenfalls um routinemaumlszligige und zeit-aufwaumlndige Aufgaben die sonst von Tutoren in einem Online-Mathe-matikkurs in deutscher Sprache durchgefuumlhrt werden Dieser Botist in der taumlglichen Anwendung innerhalb der mathematischen Platt-form OMB+ eingesetzt Bei der Durchfuumlhrung von Experimenten be-obachteten wir zwei Moumlglichkeiten den Konversationsagenten im in-dustriellen Umfeld zu entwickeln ndash zunaumlchst mit rein regelbasiertenMethoden unter Bedingungen der fehlenden Trainingsdaten und be-sonderer Aspekte der Zieldomaumlne (dh E-Learning) Zweitens habenwir zwei der Hauptsystemkomponenten (SprachverstaumlndnismodulDialog-Manager) mit dem derzeit fortschrittlichsten Deep LearningAlgorithmus reimplementiert und die Performanz dieser Komponen-ten untersucht

Der zweite Teil der Doktorarbeit (Kapitel 5 ndash Kapitel 6) betrachtetein IPA-Problem innerhalb des Vorhersageanalytik-Bereichs Vorher-sageanalytik zielt darauf ab Prognosen uumlber zukuumlnftige Ergebnisseauf der Grundlage von historischen und aktuellen Daten zu erstellenDaher kann ein Unternehmen mit Hilfe der Vorhersagesysteme zBdie Trends oder neu entstehende Themen zuverlaumlssig bestimmen unddiese Informationen dann bei wichtigen Geschaumlftsentscheidungen (zBInvestitionen) einsetzen In diesem Teil der Arbeit beschaumlftigen wiruns mit dem Teilproblem der Trendprognose ndash insbesondere mit demFehlen oumlffentlich zugaumlnglicher Benchmarks fuumlr die Evaluierung vonTrenderkennungsalgorithmen Wir haben den Benchmark zusammen-gestellt und veroumlffentlicht um sowohl Trends als auch Abwaumlrtstrendszu erkennen Nach unserem besten Wissen ist die Aufgabe der Ab-waumlrtstrenderkennung bisher nicht adressiert worden Der resultie-rende Benchmark basiert auf einer Sammlung von mehr als einerMillion Dokumente der zu den groumlszligten gehoumlrt die bisher fuumlr dieTrenderkennung verwendet wurden und somit einen realistischenRahmen fuumlr die Entwicklung von Trenddetektionsalgorithmen bietet

VI

[ September 4 2020 at 1017 ndash classicthesis ]

A C K N O W L E D G E M E N T S

This thesis was possible due to several people who contributed signif-icantly to my work I owe my gratitude to each of them for guidanceencouragement and inspiration during my doctoral study

First of all I would like to thank my advisor Prof Dr HinrichSchuumltze for his mentorship support and thoughtful guidance Myspecial gratitude goes for Prof Dr Ruedi Seiler and his fantastic teamat Integral-Learning GmbH I am thankful for your valuable inputinterest in our collaborative work and for your continuous supportI would also like to show gratitude to Prof Dr Alexander Fraser forco-advising support and helpful feedback

Finally none of this would be possible without my familyrsquos sup-port there are no words to express the depth of gratitude I feel forthem My earnest appreciation goes to my husband Dietrich thankyou for all the time you were by my side and for the encouragementyou provided me through these years I am very grateful to my par-ents for their understanding and backing at any moment I appreciatethe love of my grandparents it is hard to realize how significant theircare is for me Besides that I am also grateful to my sister Alexan-dra elder brother Denis and parents-in-law for merely being alwaysthere for me

VII

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

C O N T E N T S

List of Figures XIList of Tables XIIAcronyms XIII1 introduction 1

11 Outline 3

i e-learning conversational assistant 5

2 foundations 7

21 Introduction 7

22 Conversational Assistants 9

221 Taxonomy 9

222 Conventional Architecture 12

223 Existing Approaches 15

23 Target Domain amp Task Definition 18

24 Summary 19

3 multipurpose ipa via conversational assistant 21

31 Introduction 21

311 Outline and Contributions 22

32 Model 23

321 OMB+ Design 23

322 Preprocessing 23

323 Natural Language Understanding 25

324 Dialogue Manager 26

325 Meta Policy 30

326 Response Generation 32

33 Evaluation 37

331 Automated Evaluation 37

332 Human Evaluation amp Error Analysis 37

34 Structured Dialogue Acquisition 39

35 Summary 39

36 Future Work 40

4 deep learning re-implementation of units 41

41 Introduction 41

411 Outline and Contributions 42

42 Related Work 43

43 BERT Bidirectional Encoder Representations from Trans-formers 44

44 Data amp Descriptive Statistics 46

45 Named Entity Recognition 48

451 Model Settings 49

46 Next Action Prediction 49

461 Model Settings 49

47 Evaluation and Results 50

48 Error Analysis 58

IX

[ September 4 2020 at 1017 ndash classicthesis ]

X contents

49 Summary 58

410 Future Work 59

ii clustering-based trend analyzer 61

5 foundations 63

51 Motivation and Challenges 63

52 Detection of Emerging Research Trends 64

521 Methods for Topic Modeling 65

522 Topic Evolution of Scientific Publications 67

53 Summary 68

6 benchmark for scientific trend detection 71

61 Motivation 71

611 Outline and Contributions 72

62 Corpus Creation 73

621 Underlying Data 73

622 Stratification 73

623 Document representations amp Clustering 74

624 Trend amp Downtrend Estimation 75

625 (Down)trend Candidates Validation 75

63 Benchmark Annotation 77

631 Inter-annotator Agreement 77

632 Gold Trends 78

64 Evaluation Measure for Trend Detection 79

65 Trend Detection Baselines 80

651 Configurations 81

652 Experimental Setup and Results 81

66 Benchmark Description 82

67 Summary 83

68 Future Work 83

7 discussion and future work 85

bibliography 87

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F F I G U R E S

Figure 1 Taxonomy of dialogue systems 10

Figure 2 Example of a conventional dialogue architecture 12

Figure 3 OMB+ online learning platform 24

Figure 4 Ruled-Based System Dialogue flow 36

Figure 5 Macro F1 evaluation NAP ndash default modelComparison between German and multilingualBERT 52

Figure 6 Macro F1 test NAP ndash default model Compari-son between German and multilingual BERT 53

Figure 7 Macro F1 evaluation NAP ndash extended model 54

Figure 8 Macro F1 test NAP ndash extended model 55

Figure 9 Macro F1 evaluation NER ndash Comparison be-tween German and multilingual BERT 56

Figure 10 Macro F1 test NER ndash Comparison between Ger-man and multilingual BERT 57

Figure 11 (Down)trend detection Overall distribution ofpapers in the entire dataset 73

Figure 12 A trend candidate (positive slope) Topic Sen-timent and Emotion Analysis (using Social Me-dia Channels) 76

Figure 13 A downtrend candidate (negative slope) TopicXML 76

XI

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F TA B L E S

Table 1 Natural language representation of a sentence 13

Table 2 Rule 1 Admissible configurations 27

Table 3 Rule 2 Admissible configurations 27

Table 4 Rule 3 Admissible configurations 27

Table 5 Rule 4 Admissible configurations 28

Table 6 Rule 5 Admissible configurations 28

Table 7 Case 1 ID completeness 31

Table 8 Case 2 ID completeness 31

Table 9 Showcase 1 Short flow 33

Table 10 Showcase 2 Organisational question 33

Table 11 Showcase 3 Fallback and human-request poli-cies 34

Table 12 Showcase 4 Contextual question 34

Table 13 Showcase 5 Long flow 35

Table 14 General statistics for conversational dataset 46

Table 15 Detailed statistics on possible systems actionsin conversational dataset 48

Table 16 Detailed statistics on possible named entitiesin conversational dataset 48

Table 17 F1-results for the Named Entity Recognition task 51

Table 18 F1-results for the Next Action Prediction task 51

Table 19 Average dialogue accuracy computed for theNext Action Prediction task 51

Table 20 Trend detection List of gold trends 79

Table 21 Trend detection Example of MAP evaluation 80

Table 22 Trend detection Ablation results 82

XII

[ September 4 2020 at 1017 ndash classicthesis ]

acronyms XIII

AI Artificial Intelligence

API Application Programming Interface

BERT Bidirectional Encoder Representations from Transformers

BiLSTM Bidirectional Long Short Term Memory

CRF Conditional Random Fields

DM Dialogue Manager

DST Dialogue State Tracker

DSTC Dialogue State Tracking Challenge

EC Equivalence Classes

ES ElasticSearch

GM Gaussian Models

IAA Inter Annotator Agreement

ID Informational Dictionary

IPA Intelligent Process Automation

ITS Intelligent Tutoring System

LDA Latent Dirichlet Allocation

LSA Latent Semantic Analysis

LSTM Long Short Term Memory

MAP Mean Average Precision

MER Mutually Exclusive Rules

NAP Next Action Prediction

NER Named Entity Recognition

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OMB+ Online Mathematik Bruumlckenkurs

PL Policy Learning

pLSA probabilistic Latent Semantic Analysis

REST Representational State Transfer

RNN Recurrent Neural Network

RPA Robotic Process Automation

RegEx Regular Expressions

TL Transfer Learning

ToDS Task-oriented Dialogue System

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

1I N T R O D U C T I O N

Robotic Process Automation (RPA) is a type of software bots that im-itates manual human activities like entering data into a system log-ging into accounts or executing simple but repetitive workflows (Mo-hanty and Vyas 2018) In other words RPA-systems allow businessapplications to work without human involvement by replicating hu-man worker activities A further benefit of RPA-bots is that they aremuch cheaper compared to the employment of a human worker andthey are also able to work around-the-clock Overall advantages ofsoftware bots are hugely appealing and include cost savings accuracyand compliance while executing processes improved responsiveness(ie bots are faster than human workers) and finally such bots areusually agile and multi-skilled (Ivancic Vugec and Vukšic 2019)

However one of the main benefits and at the same time ndash draw-backs of the RPA-systems is their ability to precisely fulfill the as-signed task On the one hand the bot can carry out the task accu-rately and diligently On the other hand it is highly susceptible tochanges in defined scenarios Being designed for a particular taskthe RPA-bot is often not adaptable to other domains or even simplechanges in a workflow (Mohanty and Vyas 2018) This inability toadapt to changing conditions gave rise to a further area of improve-ment for RPA-bots ndash Intelligent Process Automation (IPA) systems

IPA-bots combine Robotic Process Automation (RPA) with ArtificialIntelligence (AI) and thus can perform complex and more cognitivelydemanding tasks that require reasoning and language understand-ing Hence IPA-bots went beyond automating simple ldquoclick tasksrdquo (asin RPA) and can accomplish jobs more intelligently ndash employing ma-chine learning algorithms and advanced analytics Such IPA-systemsundertake time-consuming and routine tasks and thus enable smartworkflows and free up skilled workers to accomplish higher-valueactivities

Previously IPA techniques were primarily employed within the com-puter vision domain Starting with the automation of simple displayclick-tasks and going towards sophisticated self-driving cars withtheir ability to interpret a stream of situational data obtained fromsensors and cameras However in recent times Natural LanguageProcessing (NLP) became one of the potential applications for IPA aswell due to its ability to understand and interpret human languageNLP methods are used to analyze large amounts of textual data andto respond to various inquiries Furthermore if the available datais unstructured and does not have any predefined format (ie cus-tomer emails) or if it is available in a variable format (ie invoices

1

[ September 4 2020 at 1017 ndash classicthesis ]

2 introduction

legal documents) then the Natural Language Processing (NLP) tech-niques are applied to extract the multiple relevant information Thisdata then can be utilized to solve distinct problems (ie natural lan-guage understanding predictive analytics) (Mohanty and Vyas 2018)For instance in the case of predictive analytics such a bot wouldmake forecasts about the future using current and historical data andtherethrough assist with sales or investments

However NLP in the context of Intelligent Process Automation (IPA)is not limited to the extraction of relevant data and insights from tex-tual documents One of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots) that are used toenable human-to-machine interaction This type of software is com-monly used for customer support or within recommendation- andbooking systems Conversational agents gain enormous demand dueto their ability to simultaneously support a large number of userswhile communicating in a natural language In a conventional chat-bot system a user provides input in a natural language the bot thenprocesses this query to extract potentially useful information and re-sponds with a relevant reply This process comprises multiple stagesand involves different types of NLP subtasks starting with NaturalLanguage Understanding (NLU) (eg intent recognition named en-tity extraction) and going towards dialogue management (ie deter-mining the next possible bots action considering the dialogue his-tory) and response generation (eg converting the semantic repre-sentation of next bots action to a natural language utterance) Typicaldialogue system for Intelligent Process Automation (IPA) purposesusually undertakes shallow customer support requests (eg answer-ing of FAQs) allowing human workers to focus on more sophisticatedinquiries

Natural Language Processing (NLP) techniques may also be usedindirectly for IPA purposes There are attempts to automatically iden-tify and classify tasks extracted from textual process descriptions asmanual user or automated The goal of such an NLP applicationis to reduce the effort required to identify suitable candidates for fur-ther robotic process automation (Friedrich Mendling and Puhlmann2011 Leopold Aa and Reijers 2018)

Intelligent Process Automation (IPA) is an emerging technology andevolves mainly due to the recognized potential benefit of combiningRobotic Process Automation (RPA) and Artificial Intelligence (AI) tech-niques to solve industrial problems (Ivancic Vugec and Vukšic 2019Mohanty and Vyas 2018) Industries are typically interested in be-ing responsive to customers and markets (Mohanty and Vyas 2018)Thus most customer support applications need to be real-time activeand highly robust to achieve higher client satisfaction rates The man-ual accomplishment of such tasks is often time-consuming expensiveand error-prone Furthermore due to the massive amounts of textualdata available for analysis it seems not to be feasible to process thisdata entirely by human means

[ September 4 2020 at 1017 ndash classicthesis ]

11 outline 3

11 outline

This thesis covers two topics united by the broader theme which isIntelligent Process Automation (IPA) utilizing statistical Natural Lan-guage Processing (NLP) methods

the first block of the thesis (Chapter 2 ndash Chapter 4) addressesthe challenge of implementing a conversational agent for Intelli-gent Process Automation (IPA) purposes within the e-learningfield As mentioned above chatbots are one of the central ap-plications of the IPA domain since they can effectively performtime-consuming tasks requiring natural language understand-ing allowing human workers to concentrate on more valuabletasks The IPA conversational bot that was realized within thisthesis takes care of routine and time-consuming tasks performedby human tutors within an online mathematical course Thisbot is deployed in a real-world setting and interact with stu-dents within the OMB+ mathematical platform Conducting ex-periments we observed two possibilities to build the conversa-tional agent in industrial settings ndash first with purely rule-basedmethods considering the missing training data and individualaspects of the target domain (ie e-learning) Second we re-implemented two of the main system components (ie Natu-ral Language Understanding (NLU) and Dialogue Manager (DM)units) using the current state-of-the-art deep-learning algorithm(ie Bidirectional Encoder Representations from Transformers(BERT)) and investigated their performance and potential use asa part of a hybrid model (ie containing both rule-based andmachine learning methods)

the second block of the thesis (Chapter 5 ndash Chapter 6) con-siders an Intelligent Process Automation (IPA) problem withinthe predictive analytics domain and addresses the task of scien-tific trend detection Predictive analytics makes forecasts aboutfuture outcomes based on historical and current data Henceorganizations can benefit through the advanced analytics mod-els by detecting trends and their behaviors and then employ-ing this information while making crucial business decisions(ie investments) In this work we deal with the subproblemof trend detection task ndash specifically we address the lack ofpublicly available benchmarks for the evaluation of trend de-tection algorithms Therefore we assembled the benchmark forboth trends and downtrends (ie topics that become less frequentovertime) detection To the best of our knowledge the task ofdowntrend detection has not been well addressed before Fur-thermore the resulting benchmark is based on a collection ofmore than one million documents which is among the largestthat has been used for trend detection before and therefore of-fers a realistic setting for developing trend detection algorithms

[ September 4 2020 at 1017 ndash classicthesis ]

4 introduction

All main chapters contain their specific introduction contribution listrelated work section and summary

[ September 4 2020 at 1017 ndash classicthesis ]

Part I

E - L E A R N I N G C O N V E R S AT I O N A L A S S I S TA N T

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

2F O U N D AT I O N S

In this block we address the challenge of designing and implement-ing a conversational assistant for Intelligent Process Automation (IPA)purposes within the e-learning domain The system aims to supporthuman tutors of the Online Mathematik Bruumlckenkurs (OMB+) plat-form during the tutoring round by undertaking routine and time-consuming responsibilities Specifically the system intelligently auto-mates the process of information accumulation and validation thatprecede every conversation Therethrough the IPA-bot frees up tutorsand allows them to concentrate on more complicated and cognitivelydemanding tasks

In this chapter we introduce the fundamental concepts required bythe topic of the first part of the thesis We begin with the foundationsto conversational assistants in Section 22 where we present the taxon-omy of existing dialogue systems in Section 221 give an overview ofthe conventional dialogue architecture in Section 222 with its mostessential components and describe the existing approaches for build-ing a conversational agent in Section 223 We finally introduce thetarget domain for our experiments in Section 23

21 introduction

Conversational Assistants - a type of software that interacts with peo-ple via written or spoken natural language have become widely pop-ularized in recent times Todayrsquos conversational assistants support abroad range of everyday skills including but not limited to creatinga reminder answering factoid questions and controlling smart homedevices

Initially the notion of artificial companion capable of conversingin a natural language was anticipated by Alan Turing in 1949 (Tur-ing 1949) Still the first significant step in this direction was donein 1966 with the appearance of Weizenbaums system ELIZA (Weizen-baum 1966) Its successor was the ALICE (Artificial LinguisticInternet Computer Entity) ndash a natural language processing dialoguesystem that conversed with a human by applying heuristical patternmatching rules Despite being remarkably popular and technologi-cally advanced for their days both systems were unable to pass theTuring test (Alan 1950) Following a long silent period the nextwave of growth of intelligent agents was caused by the advances inNatural Language Processing (NLP) This was enabled by access tolow-cost memory higher processing speed vast amounts of available

7

[ September 4 2020 at 1017 ndash classicthesis ]

8 foundations

data and sophisticated machine learning algorithms Hence one ofthe most notable leaps in the history of conversational agents beganin the year 2011 with intelligent personal assistants In that time Ap-ples Siri followed by Microsofts Cortana and Amazons Alexa in 2014and then Google Assistant in 2016 came to the scene The main focusof these agents was not to mimic humans as in ELIZA or ALICEbut to assist a human by answering simple factoid questions and set-ting notification tasks across a broad range of content Another typeof conversational agent called Task-oriented Dialogue System (ToDS)became a focus of industries in the year 2016 For the most part thesebots aimed to support customer service (Cui et al 2017) and respondto Frequently Asked Questions (Liao et al 2018) Their conversa-tional style provided more depth than those of personal assistantsbut a narrower range since they were patterned for task solving andnot for open-domain discussions

One of the central advantages of conversational agents and the rea-son why they are attracting considerable interest is their ability togive attention to several users simultaneously while supporting nat-ural language communication Due to this conversational assistantsgain momentum in numerous kinds of practical support applications(ie customer support) where a high number of users are involved atthe same time Classical engagement of human assistants is very cost-efficient and such support is usually not full-time accessible There-fore to automate human assistance is desirable but also an ambi-tious task Due to the lack of solid Natural Language Understand-ing (NLU) advancing beyond primary tasks is still challenging (Lugerand Sellen 2016) and even the most prosperous dialogue systemsoften fail to meet userrsquos expectations (Jain et al 2018) Significantchallenges involved in the design and implementation of a conversa-tional agent varying from domain engineering to Natural LanguageUnderstanding (NLU) and designing of an adaptive domain-specificconversational flow The quintessential problems and how-to ques-tions while building conversational assistants include

bull How to deal with variability and flexibility of language

bull How to get high-quality in-domain data

bull How to build meaningful representations

bull How to integrate commonsense and domain knowledge

bull How to build a robust system

Besides that current Natural Language Processing (NLP) and Ar-tificial Intelligence (AI) techniques have weaknesses when applied todomain-specific task-oriented problems especially if underlying datacomes from an industrial source As machine learning techniquesheavily rely on structured and cleaned training data to deliver consis-tent performance typically noisy real-world data without access toadditional knowledge databases negatively impacts the classificationaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 9

Despite that conversational agents could be successfully employedto solve less cognitive but still highly relevant tasks In this case adialogue agent could be operated as a part of the Intelligent ProcessAutomation (IPA) system where it takes care of straightforward butroutine and time-consuming functions performed in a natural lan-guage while allowing a human worker to focus on more cognitivedemanding processes

22 conversational assistants

Development and design of a particular dialogue system usually varyby its objectives among the most fundamental are

bull Purpose Should it be an open-ended dialogue system (egchit-chat) task-oriented system (eg customer support) or apersonal assistant (eg Google Assistant)

bull Skills What kind of behavior should a system demonstrate

bull Data Which data is required to develop (resp train) specificblocks and how could it be assembled and curated

bull Deployment Will the final system be integrated into a partic-ular software platform What are the challenges in choosingsuitable tools How will the deployment happen

bull Implementation Should it be a rule-based information-retrievalmachine-learning or a hybrid system

In the following chapter we will take a look at the most commontaxonomies of dialogue systems and different design and implemen-tation strategies

221 Taxonomy

According to the goals conversational agents could be classified intotwo main categories task-oriented systems (eg for customer support)and social bots (eg for chit-chat purposes) Furthermore systemsalso vary regarding their implementation characteristics Below weprovide a detailed overview of the most common variations

task-oriented dialogue systems are famous for their abilityto lead a dialogue with the ultimate goal of solving a userrsquos problem(see Figure 1) A customer support agent or a flightrestaurant book-ing system exemplify this class of conversational assistants Suchsystems are more meaningful and useful as those with an entirely so-cial goal however they usually affect more complications during theimplementation stage Since Task-oriented Dialogue System (ToDS) re-quire a precise understanding of the userrsquos intent (ie goal) the Natu-ral Language Understanding (NLU) segment must perform flawlessly

[ September 4 2020 at 1017 ndash classicthesis ]

10 foundations

Besides that if applied in an industrial setting ToDS are usually builtmodular which means they mostly rely on handcrafted rules and pre-defined templates and are restricted in their abilities Personal assis-tants (ie Google Assistant) are members of the task-oriented familyas well Though compared to standard agents they involve more so-cial conversation and could be used for entertainment purposes (egldquoOk Google tell me a jokerdquo) The latter is typically not available inconventional task-oriented systems

social bots and chit-chat systems in contrast regularly donot hold any specific goal and are designed for small-talk and enter-tainment conversations (see Figure 1) Those agents are usually end-to-end trainable and highly data-driven which means they requirelarge amounts of training data However due to entirely learnabletraining such systems often produce inappropriate responses mak-ing them extremely unreliable for practical applications Alike chit-chat systems social bots are rich in social conversation however theypossess an ability to solve uncomplicated user requests and conducta more meaningful dialogue (Fang et al 2018) Those requests arenot the same as in task-oriented systems and are mostly pointed toanswer simple factoid questions

Figure 1 Taxonomy of dialogue systems according to the measure of theirtask accomplishment and social contribution Figure originatesfrom Fang et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 11

The aforementioned system types could be decomposed farther ac-cording to the implementation characteristics

open domain amp closed domain A Task-oriented Dialogue Sys-tem (ToDS) is typically built within a closed domain setting Thespace of potential inputs and outputs is restricted since the sys-tem attempts to achieve a specific goal Customer support orshopping assistants are cases of closed domain problems (Wenet al 2016) These systems do not need to be able to talk aboutoff-site topics but they have to fulfill their particular tasks asefficiently as possible

Whereas chit-chat systems do not assume a well-defined goalor intention and thus could be built within an open domainThis means the conversation can progress in any direction andwithout any or minimal functional purpose (Serban et al 2016)However the infinite number of various topics and the fact thata certain amount of commonsense is required to create reason-able responses makes an open-domain setting a hard problem

content-driven amp user-centric Social bots and chit-chat sys-tems are typically utilizing a content-driven dialogue mannerThat means that their responses are based on the large and dy-namic content derived from the daily web mining and knowl-edge graphs This approach builds a dialogue based on trendycontent from diverse sources and thus it is an ideal way forsocial bots to catch usersrsquo attention and bring a user into thedialogue

User-centric approaches in turn assume precise language un-derstanding to detect the sentiment of usersrsquo statements Ac-cording to the sentiment the dialogue manager learns usersrsquopersonalities handles rapid topic changes and tracks engage-ment In this case the dialogue builds on specific user interests

long amp short conversations Models with a short conversationstyle attempt to create a single response to a single input (egweather forecast) In the case with a long conversation a systemhas to go through multiple turns and to keep track of what hasbeen said before (ie dialogue history) The longer the conver-sation the more challenging it is to train a system Customersupport conversations are typically long conversational threadswith multiple questions

response generation One of the essential elements of a dialoguesystem is a response generation This could be either done in aretrieval-based manner or by using generative models (ie NaturalLanguage Generation (NLG))

The former usually utilizes predefined responses and heuristicsto select a relevant response based on the input and context Theheuristic could be a rule-based expression match (ie RegularExpressions (RegEx)) or an ensemble of machine learning clas-

[ September 4 2020 at 1017 ndash classicthesis ]

12 foundations

sifiers (eg entity extraction prediction of next action) Suchsystems do not generate any original text instead they select aresponse from a fixed set of predefined candidates These mod-els are usually used in an industrial setting where the systemmust show high robustness performance and interpretabilityof the outcomes

The latter in contrast do not rely on predefined replies Sincesuch models are typically based on (deep-) machine learningtechniques they attempt to generate new responses from scratchThis is however a complicated task and is mostly used forvanilla chit-chat systems or in a research domain where thestructured training data is available

222 Conventional Architecture

In this and the following subsection we review the pipeline and meth-ods for Task-oriented Dialogue System (ToDS) Such agents are typi-cally composed of several components (Figure 2) addressing diversetasks required to facilitate a natural language dialogue (WilliamsRaux and Henderson 2016)

Figure 2 Example of a conventional dialogue architecture with its maincomponents in the bounding box Language Understanding (NLU)Dialogue Manager (DM) Response Generation (NLG) Figure origi-nates from NAACL 2018 Tutorial Su et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 13

A conventional dialogue architecture implements this with the fol-lowing components

natural language understanding unit has the following ob-jectives it parses the user input classifies the domain (ie cus-tomer support restaurant booking not required for a singledomain systems) and intent (ie find-flight book-taxi) and ex-tracts entities (ie Italian food Berlin)

Generally NLU unit maps every user utterance into semanticframes that are predefined according to different scenarios Ta-ble 1 illustrates an example of sentence representation wherethe departure-location (Berlin) and the arrival-location (Munich)are specified as slot values and the domain (airline travel) andintent (find-flight) are specified as well Typically there are twokinds of possible representations The first one is the utterancecategory represented in the form of the userrsquos intent The sec-ond could be the word-level information extraction in the formof Named Entity Recognition (NER) or Slot Filling tasks

Sentence Find flight from Berlin to Munich today

SlotsConcepts O O O B-dept O B-arr B-date

Named Entity O O O B-city O B-city O

Intent Find_Flight

Domain Airline Travel

Table 1 An illustrative example of a sentence representation1

A dialog act classification sub-module also known as intent clas-sification is dedicated to detecting the primary userrsquos goal atevery dialogue state It labels each userrsquos utterance with oneof the predefined intents Deep neural networks can be directlyapplied to this conventional classification problem (KhanpourGuntakandla and Nielsen 2016 Lee and Dernoncourt 2016)

Slot filling is another sub-module for language understandingUnlike intent detection (ie classification problem) slot fillingis defined as a sequence labeling problem where words in thesequence are labeled with semantic tags The input is a se-quence of words and the output is a sequence of slots onefor every word Variations of Recurrent Neural Network (RNN)architectures (LSTM BiLSTM) (Deoras et al 2015) and (attention-based) encoder-decoder architectures (Kurata et al 2016) havebeen employed for this task

[ September 4 2020 at 1017 ndash classicthesis ]

14 foundations

dialogue manager unit is subdivided into two components ndashfirst is a Dialogue State Tracker (DST) that holds the conver-sational state and second is a Policy Learning (PL) that definesthe systems next action

Dialogue State Tracker (DST) is the core component to guaranteea robust dialogue flow since it manages the userrsquos intent at ev-ery turn The conventional methods which are broadly appliedin commercial implementations often adopt handcrafted rulesto pick the most probable candidate (Goddeau et al 1996 Wangand Lemon 2013) among the predefined Though a variety ofstatistical approaches have emerged since the Dialogue StateTracking Challenge (DSTC)2 was arranged by Jason D Williamsin year 2013 Among the deep-learning based approaches isthe Belief Tracker proposed by Henderson Thomson and Young(2013) It employs a sliding window to produce a sequence ofprobability distributions over an arbitrary number of possiblevalues (Chen et al 2017) In the work of Mrkšic et al (2016) au-thors introduced a Neural Belief Tracker to identify the slot-valuepairs The system used the user intents preceding the user in-put the user utterance itself and a candidate slot-value pairwhich it needs to decide about as the input Then the systemiterated overall candidate slot-value pairs to determine whichones have just been expressed by the user (Chen et al 2017)Finally Vodolaacuten Kadlec and Kleindienst proposed a Hybrid Di-alog State Tracker that achieved the state-of-the-art performancefor DSTC2 shared task Their model used a separate-learned de-coder coupled with a rule-based system

Based on the state representation from the Dialogue State Tracker(DST) the PL unit generates the next system action that is thendecoded by the Natural Language Generation (NLG) moduleUsually a rule-based agent is used to initialize the system (Yanet al 2017) and then supervised learning is applied to theactions generated by these rules Alternatively the dialoguepolicy can be trained end-to-end with reinforcement learningto manage the system making policies toward the final perfor-mance (Cuayaacutehuitl Keizer and Lemon 2015)

natural language generation unit transforms the next actioninto a natural language response Conventional approaches toNatural Language Generation (NLG) typically adopt a template-based style where a set of rules is defined to map every nextaction to a predefined natural language response Such ap-proaches are simple generally error-free and easy to controlhowever their implementation is time-consuming and the styleis repetitive and hardly scalable

Among the deep learning techniques is the work by Wen etal (2015) that introduced neural network-based approaches toNLG with an LSTM-based structure Later Zhou and Huang

2Currently Dialog System Technology Challenge

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 15

(2016) adopted this model along with the question informa-tion semantic slot values and dialogue act types to generatemore precise answers Finally Wu Socher and Xiong (2019)proposed Global to Local Memory Pointer Networks (GLMP) ndash anarchitecture that incorporates knowledge bases directly into alearning framework (due to the Memory Networks component)and is able to re-use information from the user input (due tothe Pointer Networks component)

In particular cases dialogue flow can include speech recognitionand speech synthesis units (if managing a spoken natural languagesystem) and connection to third-party APIs to request informationstored in external databases (eg weather forecast) (as depicted inFigure 2)

223 Existing Approaches

As stated above a conventional task-oriented dialogue system is im-plemented in a pipeline manner That means that once the systemreceives a user query it interprets it and acts according to the dia-logue state and the corresponding policy Additionally based on un-derstanding it may access the knowledge base to find the demandedinformation there Finally a system transforms a systemrsquos next ac-tion (and if given an extracted information) into its surface form as anatural language response (Monostori et al 1996)

Individual components (ie Natural Language Understanding (NLU)Dialogue Manager (DM) Natural Language Generation (NLG)) of aparticular conversational system could be implemented using differ-ent approaches starting with entirely rule- and template-based meth-ods and going towards hybrid approaches (using learnable componentsalong with handcrafted units) and end-to-end trainable machine learn-ing methods

rule-based approaches Though many of the latest researchapproaches handle Natural Language Understanding (NLU) and Nat-ural Language Generation (NLG) units by using statistical NaturalLanguage Processing (NLP) models (Bocklisch et al 2017 Burtsevet al 2018 Honnibal and Montani 2017) most of the industriallydeployed dialogue systems still utilise handcrafted features and rulesfor the state tracking action prediction intent recognition and slotfilling tasks (Chen et al 2017 Ultes et al 2017) So most of thePyDial3 framework modules offer rule-based implementations usingRegular Expressions (RegEx) rule-based dialogue trackers (Hender-son Thomson and Williams 2014) and handcrafted policies Alsoin order to map the next action to text Ultes et al (2017) suggest rulesalong with a template-based response generation

3PyDial Framework httpwwwcamdialorgpydial

[ September 4 2020 at 1017 ndash classicthesis ]

16 foundations

The rule-based approach ensures robustness and stable performancethat is crucial for industrial systems that communicate with a largenumber of users simultaneously However it is highly expensive andtime-consuming to deploy a real dialogue system built in that man-ner The major disadvantage is that the usage of handcrafted systemsis restricted to a specific domain and possible adaptation requiresextensive manual engineering

end-to-end learning approaches Due to the recent advanceof end-to-end neural generative models (Collobert et al 2011) manyefforts have been made to build an end-to-end trainable architecturefor conversational agents Rather than using the traditional pipeline(see Figure 2) the end-to-end model is conceived as a single module(Chen et al 2017)

Wen et al (2016) followed by Bordes Boureau and Weston (2016)were among the pioneers who adapted an end-to-end trainable ap-proach for conversational systems Their approach was to treat dia-logue learning as the problem of learning a mapping from dialoguehistories to system responses and to apply an encoder-decoder model(Sutskever Vinyals and Le 2014) to train the entire system The pro-posed model still had its significant limitations (Glasmachers 2017)such as missing policies to learn a dialogue flow and the inability toincorporate additional knowledge that is especially crucial for train-ing a task-oriented agent To overcome the first problem Dhingraet al (2016) introduced an end-to-end reinforcement learning approachthat jointly trains dialogue state tracker and policy learning unitsThe second weakness was addressed by Eric and Manning (2017)who augmented existing recurrent network architectures with a dif-ferentiable attention-based key-value retrieval mechanism over theentries of a knowledge base This approach was inspired by Key-Value Memory Networks (Miller et al 2016) Later on Wu Socherand Xiong (2019) proposed Global to Local Memory Pointer Networks(GLMP) where the authors incorporated knowledge bases directlyinto a learning framework In their model a global memory encoderand a local memory decoder are employed to share external knowl-edge The encoder encodes dialogue history modifies global con-textual representation and generates a global memory pointer Thedecoder first produces a sketch response with empty slots Next ittransfers the global memory pointer to filter the external knowledgefor relevant information and then instantiates the slots via the localmemory pointers

Despite having better adaptability compared to any rule-based sys-tem and being simpler to train end-to-end approaches remain unattain-able for commercial conversational agents that operating on real-world data A well and carefully constructed task-oriented dialoguesystem in an observed domain using handcrafted rules (in NLU andDST units) and predefined responses for NLG still outperforms the

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 17

end-to-end systems due to its robustness (Glasmachers 2017 WuSocher and Xiong 2019)

hybrid approaches Though end-to-end learning is an attrac-tive solution for conversational systems current techniques are data-intensive and require large amounts of dialogues to learn simple be-haviors To overcome this barrier Williams Asadi and Zweig (2017)introduce Hybrid Code Networks (HCNs) which is an ensemble ofretrieval and trainable units System utilizes an Recurrent NeuralNetwork (RNN) for Natural Language Understanding (NLU) blockdomain-specific knowledge that are encoded as software (ie API

calls) and custom action templates used for Natural Language Gen-eration (NLG) unit Compared to existing end-to-end methods theauthors report that their approach considerably reduces the amountof data required for training (Williams Asadi and Zweig 2017) Ac-cording to the authors HCNs achieve state-of-the-art performance onthe bAbI dialog dataset (Bordes Boureau and Weston 2016) and out-perform two commercial customer dialogue systems4

Along with this work Wen et al (2016) propose a neural network-based model that is end-to-end trainable but still modularly con-nected In their work authors treat dialogue as a sequence to se-quence mapping problem (Sutskever Vinyals and Le 2014) aug-mented with the dialogue history and the current database searchoutcome (Wen et al 2016) The authors explain the learning processas follows At every turn the system takes a user input representedas a sequence of tokens and converts it into two internal represen-tations a distributed representation of the intent and a probabilitydistribution over slot-value pairs called the belief state (Young et al2013) The database operator then picks the most probable values inthe belief state to form the search result At the same time the intentrepresentation and the belief state are transformed and combined bya Policy Learning (PL) unit to form a single vector representing thenext system action This system action is then used to generate asketch response token by the token The final system response iscomposed by substituting the actual values of the database entriesinto the sketch sentence structure (Wen et al 2016)

Hybrid models appear to replace the established rule- and template-based approaches currently utilized in an industrial setting The per-formance of the most Natural Language Understanding (NLU) com-ponents (ie Named Entity Recognition (NER) domain and intentclassification) is reasonably high to apply them for the developmentof commercial conversational agents Whereas the Natural LanguageGeneration (NLG) unit could still be realized as a template-based so-lution by employing predefined response candidates and predictingthe next system action employing deep learning systems

4Internal Microsoft Research datasets in customer support and troubleshootingdomains were employed for training and evaluation

[ September 4 2020 at 1017 ndash classicthesis ]

18 foundations

23 target domain amp task definition

MUMIE5 is an open-source e-learning platform for studying and teach-ing mathematics and computer science Online Mathematik Bruumlck-enkurs (OMB+) is one of the projects powered by MUMIE SpecificallyOnline Mathematik Bruumlckenkurs (OMB+)6 is a German online learningplatform that assists students who are preparing for an engineeringor computer science study at a university

The coursersquos central purpose is to assist students in reviving theirmathematical skills so that they can follow the upcoming universitycourses This online course is thematically segmented into 13 partsand includes free mathematical classes with theoretical and practi-cal content Every chapter begins with an overview of its sectionsand consists of explanatory texts with built-in videos examples andquick checks of understanding Furthermore various examinationmodes to practice the content (ie training exercises) and to checkthe understanding of the theory (ie quiz) are available Besidesthat OMB+ provides a possibility to get assistance from a human tu-tor Usually the students and tutors interact via a chat interface inwritten form The language of communication is German

The current dilemma of the OMB+ platform is that the number ofstudents grows significantly every year but it is challenging to findqualified tutors ndash and it is expensive to hire more human tutors Thisresults in a longer waiting period for students until their problemscan be considered and processed In general all student questionscan be grouped into three main categories Organizational questions(ie course certificate) contextual questions (ie content theorem) andmathematical questions (exercises solutions) To assist a student witha mathematical question a tutor has to know the following regularinformation What kind of topic (or subtopic) a student has a problemwith At which examination mode (quiz chapter level training or exer-cise section level training or exercise or final examination) a studentis working right now And finally the exact question number and exactproblem formulation This means that a tutor has to ask the same on-boarding questions every time a new conversation starts This is how-ever very time-consuming and could be successfully solved within anIntelligent Process Automation (IPA) system

The work presented in Chapter 3 and Chapter 4 was enabled by thecollaboration with MUMIE and Online Mathematik Bruumlckenkurs (OMB+)team and addresses the abovementioned problem Within Chapter 3we implemented an Intelligent Process Automation (IPA) dialogue sys-tem with rule- and retrieval-based algorithms that interacts with stu-dents and performs the onboarding This system is multipurposeon the one hand it eases the assistance process for human tutors byreducing repetitive and time-consuming activities and therefore al-lows workers to focus on more challenging tasks Second interacting

5Mumie httpswwwmumienet6httpswwwombplusde

[ September 4 2020 at 1017 ndash classicthesis ]

24 summary 19

with students it augments the resources with structured and labeledtraining data The latter in turn allowed us to re-train specific sys-tem components utilizing machine and deep learning methods Wedescribe this procedure in Chapter 4

We report that the earlier collected human-to-human data from theOMB+ platform could not be used for training a machine learning sys-tem since it is highly unstructured and contains no direct question-answer matching which is crucial for trainable conversational algo-rithms We analyzed these conversations to get insight from themthat we used to define the dialogue flow

We also do not consider the automatization of mathematical ques-tions in this work since this task would require not only preciselanguage understanding and extensive commonsense knowledge butalso the accurate perception of mathematical notations The tutor-ing process at OMB+ differentiates a lot from the standard tutoringmethods Here instructors attempt to navigate students by givinghints and tips instead of providing an out-of-the-box solution Exist-ing human-to-human dialogues revealed that such kind of task couldbe even challenging for human tutors since it is not always directlyperceptible what precisely the student does not understand or hasdifficulties with Furthermore dialogues containing mathematical ex-planations could last for a long time and the topic (ie intent andfinal goal) can shift multiple times during the conversation

To our knowledge this task would not be feasible to sufficientlysolve with current machine learning algorithms especially consider-ing the missing training data A rule-based system would not bethe right solution for this purpose as well due to a large number ofcomplex rules which would need to be defined Thus in this workwe focus on the automatization of less cognitively demanding taskswhich are still relevant and must be solved in the first instance to easethe tutoring process for instructors

24 summary

Conversational agents are a highly demanding solution for indus-tries due to their potential ability to support a large number of usersmulti-threaded while performing a flawless natural language conver-sation Many messenger platforms are created and custom chatbotsin various constellations emerge daily Unfortunately most of thecurrent conversational agents are often disappointing to use due tothe lack of substantial natural language understanding and reason-ing Furthermore Natural Language Processing (NLP) and ArtificialIntelligence (AI) techniques have limitations when applied to domain-specific and task-oriented problems especially if no structured train-ing data is available

Despite that conversational agents could be successfully applied tosolve less cognitive but still highly relevant tasks ndash that is within the

[ September 4 2020 at 1017 ndash classicthesis ]

20 foundations

Intelligent Process Automation (IPA) domain Such IPA-bots wouldundertake tedious and time-consuming responsibilities to free up theknowledge worker for more complex and intricate tasks IPA-botscould be seen as a member of goal-oriented dialogue family and arefor the most part performed either in a rule-based manner or throughhybrid models when it comes to a real-world setting

[ September 4 2020 at 1017 ndash classicthesis ]

3M U LT I P U R P O S E I PA V I A C O N V E R S AT I O N A LA S S I S TA N T

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research describedin this chapter was carried out in its entirety by the au-thor of this thesis The other authors of the publicationacted as advisers At the time of writing this thesis thesystem described in this work was deployed at the OnlineMathematik Bruumlckenkurs (OMB+) platform

As we outlined in the introduction Intelligent Process Automa-tion (IPA) is an emerging technology with the primary goal of assist-ing the knowledge worker by taking care of repetitive routine andlow-cognitive tasks Conversational assistants that interact with usersin a natural language are a potential application for the IPA domainSuch smart virtual agents can support the user by responding to var-ious inquiries and performing routine tasks carried out in a naturallanguage (ie customer support)

In this work we tackle the challenge of implementing an IPA con-versational agent in a real-world industrial context and conditions ofmissing training data Our system has two meaningful benefits Firstit decreases monotonous and time-consuming tasks and hence letsworkers concentrate on more intellectual processes Second interact-ing with users it augments the resources with structured and labeledtraining data The latter in turn can be used to retrain a systemutilizing machine and deep learning methods

31 introduction

Conversational dialogue systems can give attention to several userssimultaneously while supporting natural communication They arethus exceptionally needed for practical assistance applications wherea high number of users are involved at the same time Regularly di-alogue agents require a lot of natural language understanding andcommonsense knowledge to hold a conversation reasonably and in-telligently Therefore nowadays it is still challenging to substitutehuman assistance with an intelligent agent completely However adialogue system could be successfully employed as a part of the In-telligent Process Automation (IPA) application In this case it wouldtake care of simple but routine and time-consuming tasks performed

21

[ September 4 2020 at 1017 ndash classicthesis ]

22 multipurpose ipa via conversational assistant

in a natural language while allowing a human worker to focus onmore cognitive demanding processes

Recent research in the dialogue generation domain is conducted byemploying Artificial Intelligence (AI) techniques like machine- anddeep learning (Lowe et al 2017 Wen et al 2016) However thosemethods require high-quality data for training that is often not di-rectly available for domain-specific and industrial problems Eventhough companies gather and store vast volumes of data it is of-ten low-quality with regards to machine learning approaches (Daniel2015) Especially if it concerns dialogue data which has to be prop-erly structured as well annotated Whereas if trained on artificiallycreated datasets (which is often the case in the research setting) thesystems can solve only specific problems and are hardly adaptableto other domains Hence despite the popularity of deep learningend-to-end models one still needs to rely on traditional pipelines inpractical dialogue engineering mainly while introducing a new do-main

311 Outline and Contributions

This work addresses the challenge of implementing a dialogue systemfor IPA purposes within the practical e-learning domain under condi-tions of missing training data The chapter is structured as followsIn Section 32 we introduce our method and describe the design ofthe rule-based dialogue architecture with its main components suchas Natural Language Understanding (NLU) Dialogue Manager (DM)and Natural Language Generation (NLG) units In Section 33 wepresent the results of our dual-focused evaluation and Section 34 de-scribes the format of collected structured data Finally we concludein Section 35

Our contributions within this work are as follows

bull We implemented a robust conversational system for IPA pur-poses within a practical e-learning domain and under the con-ditions of missing training (ie dialogue) data

bull The system has two main objectives

ndash First it reduces repetitive and time-consuming activitiesand allows workers of the e-learning platform to focussolely on mathematical inquiries

ndash Second by interacting with users it augments the resourceswith structured and partially labeled training data for fur-ther implementation of trainable dialogue components

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 23

32 model

The central purpose of the introduced system is to interact with stu-dents at the beginning of every chat conversation and gather informa-tion on the topic (and sub-topic) examination mode and level questionnumber and exact problem formulation Such onboarding process savestime for human tutors and allows them to handle mathematical ques-tions solely Furthermore the system is realized in a way such that itaccumulates labeled and structured dialogues in the background

Figure 4 displays the intact conversation flow After the systemaccepts user input it analyzes it at different levels and extracts infor-mation if such is provided If some of the information required fortutoring is missing the system asks the student to provide it Whenall the information points are collected they are automatically vali-dated and forwarded to a human tutor The latter can then directlyproceed with the assistance process In the following we explain thekey components of the system

321 OMB+ Design

Figure 3 illustrates the internal structure and design of the OMB+ plat-form It has topics and sub-topics as well as four various examinationmodes Each topic (Figure 3 tag 1) correspond to a chapter level and al-ways has sub-topics (Figure 3 tag 2) that correspond to a section levelExamination modes training and exercise are ambiguous because theycorrespond to either a chapter (Figure 3 tag 3) or a section (Figure 3tag 5) level and it is important to differentiate between them sincethey contain different types of content The mode final examination(Figure 3 tag 4) always corresponds to a chapter level whereas quiz(Figure 3 tag 5) can belong only to a section level According to thedesign of the OMB+ platform there are several ways of how a possibledialogue flow can proceed

322 Preprocessing

In a natural language conversation one may respond in various formsmaking the extraction of data from user-generated text challengingdue to potential misspellings or confusable spellings (ie Aufgabe 1aAufgabe 1 (a)) Hence to facilitate a substantial recognition of intentsand extraction of entities we normalize (eg correct misspellingsdetect synonyms) and preprocess every user input before moving for-ward to the Natural Language Understanding (NLU) module

[ September 4 2020 at 1017 ndash classicthesis ]

24 multipurpose ipa via conversational assistant

Figure 3 OMB+ Online Learning Platform where 1 is the Topic (corre-sponds to a chapter level) 2 is a Sub-Topic (corresponds to a sectionlevel) 3 is chapter level examination mode 4 is the Final Exam-ination Mode (available only for chapter level) 5 are the Examina-tion Modes Exercise Training (available at section levels) and Quiz(available only at section level) and 6 is the OMB+ Chat

We perform both linguistic and domain-specific preprocessing whichinclude the following steps

bull lowercasing and stemming of words in the input query

bull removal of stop words and punctuation

bull all mentions of X in mathematical formulations are removed toavoid confusion with roman number 10 (ldquoXrdquo)

bull in a combination of the type word ldquoKapitelAufgaberdquo + digitwritten as a word (ie ldquoeinsrdquo ldquozweitesrdquo etc) word is replacedwith a digit (ldquoKapitel einsrdquo rarr ldquoKapitel 1rdquo ldquoim fuumlnften Kapitelrdquo rarrldquoim Kapitel 5rdquo) roman numbers are replaced with a digit as well(ldquoKapitel IVrdquorarr ldquoKapitel 4rdquo)

bull detected ambiguities are normalized (eg ldquoTrainingsaufgaberdquorarrldquoTrainingrdquo)

bull recognized misspellings resp type errors are corrected (egldquoDifeernzialrechnungrdquorarr ldquoDifferentialrechnungrdquo)

bull permalinks are parsed and analyzed From each permalink it ispossible to extract topic examination mode and question num-ber

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 25

323 Natural Language Understanding

We implement the Natural Language Understanding (NLU) unit uti-lizing handcrafted rules Regular Expressions (RegEx) and Elasctic-search1 API

intent classification According to the design of the OMB+platform all student questions can be classified into three cate-gories organizational questions (ie course certificate) contextualquestions (ie content theorem) and mathematical questions (exer-cises solutions) To classify the input query by its intent (ie cat-egory) we employ weighted key-word information in the formof handcrafted rules We assume that specific words are explic-itly associated with a corresponding intent (eg theorem or rootdenote the mathematical question feedback or certificate denoteorganizational inquire) If no intent could be ordered then itis assumed that the NLU unit was not capable of understand-ing and the intent is unknown In this case the virtual agentrequests the user to provide an intent manually by picking onefrom the mentioned three options The queries from organiza-tional and theoretical categories are directly handed over to ahuman tutor while mathematical questions are analyzed by theautomated agent for further information extraction

entity extraction On the next step the system retrieves the en-tities from a user message We specify five following prerequi-site entities which have to be collected before the dialogue canbe handed over to a human tutor topic sub-topic examina-tion mode and level and question number This part is imple-mented using ElasticSearch (ES) and RegEx To facilitate the useof ES we indexed the OMB+ web-page to an internal databaseBesides indexing the titles of topics and sub-topics we also pro-vided supplementary information on possible synonyms andwriting styles We additionally filed OMB+ permalinks whichdirect to the site pages To query the resulting database we em-ploy the internal Elasticsearch multi match function and set theminimum should match parameter to 20 This parameter spec-ifies the number of terms that must match for a document tobe considered relevant Besides that we adopted fuzziness withthe maximum edit distance set to 2 characters The fuzzy queryuses similarity based on Levenshtein edit distance (Levenshtein1966) Finally the system produces a ranked list of potentialmatching entries found in the database within the predefinedrelevance threshold (we set it to θ=15) We then pick the mostprobable entry as the right one and select the correspondingentity from the user input

Overall the Natural Language Understanding (NLU) module re-ceives the user input as a preprocessed text and examines it across all

1httpswwwelasticcoproductselasticsearch

[ September 4 2020 at 1017 ndash classicthesis ]

26 multipurpose ipa via conversational assistant

predefined RegEx statements and for a match in the ElasticSearch (ES)database We use ES to retrieve entities for the topic sub-topic andexamination mode whereas RegEx is used to extract the informationon a question number Every time the entity is extracted it is filled inthe Informational Dictionary (ID) The ID has the following six slotsto be filled in topic sub-topic examination level examination modequestion number and exact problem formulation (this information isrequested on further steps)

324 Dialogue Manager

A conventional Dialogue Manager (DM) consists of two interactingcomponents The first component is the Dialogue State Tracker (DST)that maintains a representation of the current conversational stateand the second is the Policy Learning (PL) that defines the next systemaction (ie response)

In our model every agentrsquos next action is determined by the stateof the previously obtained information accumulated in the Informa-tional Dictionary (ID) For instance if the system recognizes that thestudent works on the final examination it also knows (defined by hand-coded logic in the predefined rules) that there is no necessity to askfor sub-topic because the final examination always corresponds to achapter level (due to the layout of OMB+ platform) If the system rec-ognizes that the user struggles in solving quiz it has to ask for boththe corresponding topic and sub-topic because the quiz always relatesto a section level

To discover all of the potential conversational flows we implementMutually Exclusive Rules (MER) which indicate that two events e1and e2 are mutually exclusive or disjoint since they cannot both oc-cur at the same time (ie the intersection of these events is emptyP(A cap B) = 0) Additionally we defined transition and mapping rulesHence only particular entities can co-occur while others must beexcluded from the specific rule combination Assume a topic t =

Geometry According to MER this topic can co-occur with one of thepertaining sub-topics defined at the OMB+ (ie s = Angel) Thus thelevel of the examination would be l = Section (but l 6= Chapter)because the student already reported a sub-section he or she is work-ing on In this specific case the examination mode could either bee isin [TrainingExerciseQuiz] but not e = Final_ExaminationThis is because e = Final_Examination can co-occur only with atopic (chapter-level) but not with a sub-topic (section-level) The ques-tion number q could be arbitrary and has to co-occur with every othercombination

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 27

This allows the formal solution to be found Assume a list of alltheoretically possible dialogue states S = [topic sub-topic trainingexercise chapter level section level quiz final examination questionnumber] and for each element sn in S is true that

sn =

1 if information is given

0 otherwise(1)

This gives all general (resp possible) dialogue states without ref-erence to the design of the OMB+ platform However to make thedialogue states fully suitable for the OMB+ from the general stateswe select only those which are valid To define the validness of thestate we specify the following five MERrsquos

R1 Topic

not T T

Table 2 Rule 1 ndash Admissible topic configurations

Rule (R1) in Table 2 denotes admissible configurations for topic andmeans that the topic could be either given (T ) or not (notT )

R2 Examination Mode

not TR not E not Q not FE

TR not E not Q not FE

not TR E not Q not FE

not TR not E Q not FE

not TR not E not Q FE

Table 3 Rule 2 ndash Admissible examination mode configurations

Rule (R2) in Table 3 denotes that either no information on the ex-amination mode is given or examination mode is Training (TR) orExercise (E) or Quiz (Q) or Final Examination (FE) but not more thanone mode at the same time

R3 Level

not CL not SL

CL not SL

not CL SL

Table 4 Rule 3 ndash Admissible level configurations

Rule (R3) in Table 4 indicates that either no level information isprovided or the level corresponds to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

[ September 4 2020 at 1017 ndash classicthesis ]

28 multipurpose ipa via conversational assistant

R4 Examination amp Level

not TR not E not SL not CL

TR not E SL not CL

TR not E not SL CL

TR not E not SL not CL

not TR E SL not CL

not TR E not SL CL

not TR E not SL not CL

Table 5 Rule 4 ndash Admissible examination mode (only for Training and Exer-cise) and corresponding level configurations

Rule (R4) in Table 5 means that Training (TR) and Exercise (E) ex-amination modes can either belong to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

R5 Topic amp Sub-Topic

not T not ST

T not ST

not T ST

T ST

Table 6 Rule 5 ndash Admissible topic and corresponding sub-topic configura-tions

Rule (R5) in Table 6 symbolizes that we could be either given onlya topic (T ) or the combination of topic and sub-topic (ST ) at the sametime or only sub-topic or no information on this point at all

We then define a valid dialogue state as a dialogue state that meetsall requirements of the abovementioned rules

foralls(State(s)and R1minus5(s))rarr Valid(s) (2)

Once we have the valid states for dialogues we perform a mappingfrom each valid dialogue state to the next possible systems action Forthat we first define five transition rules The order of transition rulesis important

T1 = notT (3)

means that no topic (T ) is found in the ID (ie could not be ex-tracted from user input)

T2 = notEM (4)

indicates that no examination mode (EM) is found in the ID

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 29

T3 = EM where EM isin [TRE] (5)

denotes that the extracted examination mode (EM) is either Train-ing (TR) or Exercise (E)

T4 =

T notST TR SL

T notST E SL

T notST Q

(6)

means that no sub-topic (ST ) is provided by a user but ID either al-ready contains the combination of topic (T ) training (TR) and sectionlevel (SL) or the combination of topic exercise (E) and section levelor the combination of topic and quiz (Q)

T5 = notQNR (7)

indicates that no question number (QNR) was provided by a stu-dent (or could not be successfully extracted)

Finally we assumed the list of possible next actions for the system

A =

Ask for Topic

Ask for Examination Mode

Ask for Level

Ask for Sub-Topic

Ask for Question Nr

(8)

Following the transition rules we map each valid dialogue state tothe possible next action am in A

existaforalls(Valid(s)and T1(s))rarr a(s T) (9)

in the case where we do not have any topic provided the nextaction is to ask for the topic (T )

existaforalls(Valid(s)and T2(s))rarr a(s EM) (10)

if no examination mode is provided by a user (or it could not besuccessfully extracted from the user query) the next action is definedas ask for examination mode (EM)

existaforalls(Valid(s)and T3(s))rarr a(s L) (11)

in case where we know the examination mode EM isin [TrainingExercise] we have to ask about the level (ie training at chapter levelor training at section level) thus the next action is ask for level (L)

[ September 4 2020 at 1017 ndash classicthesis ]

30 multipurpose ipa via conversational assistant

existaforalls(Valid(s)and T4(s))rarr a(s ST) (12)

if no sub-topic is provided but the examination mode EM isin [Train-ing Exercise] at section level the next action is defined as ask for sub-topic (ST )

existaforalls(Valid(s)and T5(s))rarr a(s QNR) (13)

if no question number is provided by a user then the next action isask for question number (QNR)

Following this scheme we generate 56 state transitions which spec-ify the next system actions Being on a new conversation state wecompare the extracted (or updated) data in the Informational Dictio-nary (ID) with the valid dialogue states and select the mapped actionas the next systemrsquos action

flexibility of dst The DST transitions outlined above are meantto maintain the current configuration of the OMB+ learning platformStill further MERs could be effortlessly generated to create new tran-sitions Examining this we performed tests with the previous de-sign of the platform where all the topics except for the first onehad sub-topics Whereas the first topic contained both sub-topics andsub-sub-topics We could produce the missing transitions with the de-scribed approach The number of permissible transitions in this caseincreased from 56 to 117

325 Meta Policy

Since the proposed system is expected to operate in a real-world set-ting we had to develop additional policies that manage the dialogueflow and verify the systemrsquos accuracy We explain these policies be-low

completeness of informational dictionary The model au-tomatically verifies the completeness of the Informational Dic-tionary (ID) which is determined by the number of essentialslots filled in the informational dictionary There are 6 discretecases when the ID is deemed to be complete For example ifa user works on a final examination mode the agent does nothas to request a sub-topic or examination level Hence the ID

has to be filled only with data for a topic examination typeand question number Whereas if the user works on trainingmode the system has to collect information about the topic sub-topic examination level examination mode and the questionnumber Once the ID is intact it is provided to the verificationstep Otherwise the agent proceeds according to the next ac-tion The system extracts the information in each dialogue stateand therefore if the user gives updated information on any sub-

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 31

ject later in the dialogue history the corresponding slot will beupdated in the ID

Below are examples of two final cases (out of six) where Infor-mational Dictionary (ID) is considered to be complete

Case1 =

intent = Math

topic 6= None

exam mode = Final Examination

level = None

question nr 6= None

(14)

Table 7 Case 1 ndash Any topic examination mode is final examination exami-nation level does not matter any question number

Case2 =

intent = Math

topic 6= None

sub-topic 6= None

exam mode isin [Training Exercise]

level = Section

question nr 6= None

(15)

Table 8 Case 2 Any topic any related sub-topic examination mode is ei-ther training or exercise examination level is section any questionnumber

verification step After the system has collected all the essentialdata (ie ID is complete) it continues to the final verification stepHere the obtained data is presented to the current student inthe dialogue session The student is asked to review the exactnessof the data and if any entries are faulty to correct them TheInformational Dictionary (ID) is where necessary updated withthe user-provided data This procedure repeats until the userconfirms the correctness of the compiled data

fallback policy In the cases where the system fails to retrieveinformation from a studentrsquos query it re-asks a user and at-tempts to retrieve information from a follow-up response Themaximum number of re-ask trials is a parameter and set to threetimes (r = 3) If the system is still incapable of extracting in-formation the user input is considered as the ground truth andfilled to the appropriate slot in ID An exception to this ruleapplies where the user has to specify the intent manually Afterthree unclassified trials a dialogues session is directly handedover to a human tutor

[ September 4 2020 at 1017 ndash classicthesis ]

32 multipurpose ipa via conversational assistant

human request In every dialogue state a user can shift to a hu-man tutor For this a user can enter the ldquohumanrdquo (or ldquoMenschrdquoin German language) key-word Therefore every user messageis investigated for the presence of this key-word

326 Response Generation

The semantic representation of the systems next action is transformedinto a natural language utterance Hence each possible action ismapped to precisely one response that is predefined in the templateSome of the responses are fixed (ie ldquoWelches Kapitel bearbeitest dugeraderdquo) 2 Others have placeholders for custom values In the lattercase the utterance can be formulated dependent on the InformationalDictionary (ID)

Assume predefined response ldquoDeine Angaben waren wie folgt a)Kapitel K b) Abschnitt A c) Aufgabentyp M d) Aufgabe Q e)Ebene E Habe ich dich richtig verstanden Bitte antworte mit Ja oderNeinrdquo3 The slots [KAMQE] are the placeholders for the informa-tion stored in the ID During the conversation the system selects thetemplate according to the next action and fills the slots with extractedvalues if required After the extracted information is filled in the cor-responding placeholders in the utterance the final response could beas follows ldquoDeine Angaben waren wie folgt a) Kapitel I ElementaresRechnen b) Abschnitt Rechenregel fuumlr Potenzen c) Aufgabentyp Quiz d)Aufgabe 1(a) e) Ebene Abschnitt Habe ich dich richtig verstanden Bitteantworte mit Ja oder Neinrdquo4

Below we demonstrate five sample dialogues in original Germanlanguage with variable flows

bull short dialogue where certain information is already providedin the first studentrsquos input

bull organizational question case

bull contextual question case

bull example with the manual intent classification

bull example with fallback and human policies

bull long flow with entry correction on the final verification phase

2Translates to ldquoWhich chapter are you currently working onrdquo3Translates to ldquoYour details were as follows a) Chapter b) Section c) Examina-

tion Mode d) Question Number e) Level Did I understand you correctly Pleaseanswer yes or no

4Translates to ldquoYour details were as follows a) Elementary Calculus b) PowerLaws c) Quiz d) Question 1 (a) e) Section Level Did I understand you correctlyPlease answer yes or no

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 33

Student Hallo ich habe Schwierigkeiten beim Kapitel I Elementares Rech-nen mit der Uumlbungsaufgabe 1 a

Next-Action [Ask for a Level Chapter or Section]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel ldquoGeometrierdquo) oder um eine Aufgabe auf der Ebene einesAbschnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt ldquoWinkelrdquoim Kapitel ldquoGeometrierdquo) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlrAbschnitt

Student Kapitel

Next-Action [Final-Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Aufgabentyp Exercisec) Aufgabe 1ad) Ebene KapitelHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yep

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 9 Showcase 1 Short flow most of the information was extracted fromthe initial user input In bold are the extracted entities

Student Hallo ich habe eine Frage bzgl eines Zertifikates und zwar wiekann ich es austellen lassen

Next-Action [Organisational Question]Bot Es scheint sich wohl um eine organisatorische Frage zu handeln Bittefasse deine Frage in einem kurzen Text zusammen damit ich sie an meinenmenschlichen Kollegen weiterleiten kann

Student Wie bekomme ich ein Zertifikat

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 10 Showcase 2 Organisational Question Underlined are the key-words which point on the organisational intent

[ September 4 2020 at 1017 ndash classicthesis ]

34 multipurpose ipa via conversational assistant

Student hallo

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student ma

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student math

Next-Action [Ask for Topic]Bot Welches Kapitel bearbeitest du gerade

Student Lineare Gleichungssysteme

Next-Action [Ask for Examination Mode]Bot Bearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfung

Student human

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 11 Showcase 3 Manual intent classification Fallback and Human Re-quest policies (in bold type)

Student Halli-Hallo koumlnnte mir bitte jemand die Regel 72 (vertikale streck-ung und stauchung des graphen) erklaumlren Danke im Voraus

Next-Action [Contextual Question]Bot Es scheint sich wohl um eine inhaltliche Frage zu handeln Bitte fassedeine Frage in einem kurzen Text zusammen damit ich sie an meinen men-schlichen Kollegen weiterleiten kann

Student Ich habe Schwierigkeiten beim Verstehen der Regel 72 Was istgenau die vertikale Streckung des Graphen

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 12 Showcase 4 Contextual Question Underlined are the key-wordswhich point on the contextual intent

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 35

Student Hallo ich kann beim Kapitel 1 nicht weiterkommen Trainingsauf-gabe 1 (a)

Next-Action [Ask for Level]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel Geometrie) oder um eine Aufgabe auf der Ebene eines Ab-schnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt Winkel imKapitel Geometrie) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlr Ab-schnitt

Student ich glaube es ist eine Sektion

Next-Action [Ask for Sub-Topic]Bot Wie heiszligt der Abschnitt in dem sich die Aufgabe befindet auf die sichdeine Frage bezieht

Student ich bearbeite rechenregeln und potenzen

Next-Action [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Quizd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student nope

Next-Action [Ask_For]Bot Welche der folgenden Punkten habe ich falsch erkannt Schreibe bittewelche der folgenden Punkte es sind a b c d e

Student c

State-Transition [Correct-Fields]Bot Gib bitte die korrekte Information fuumlr c) ein

Student Training

State-Transition [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Trainingd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yes

State-Transition [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 13 Showcase 5 Long Flow Correction of entries In bold type are theextracted entities

[ September 4 2020 at 1017 ndash classicthesis ]

36 multipurpose ipa via conversational assistant

Figure 4 Dialogue Flow Abbreviations UNK - unknown ID - informa-tional dictionary RegEx - regular expressions

[ September 4 2020 at 1017 ndash classicthesis ]

33 evaluation 37

33 evaluation

In order to get the feedback on the quality functionality and use-fulness of the proposed model we evaluated it in two steps firstwith an automated method (Section 331) utilizing 130 manually an-notated conversations to verify the robustness of the system Secondwith the help of human tutors (Section 332) from OMB+ to exam-ine the user experience We describe the details as well as the mostcommon errors below

331 Automated Evaluation

To manage an automated evaluation we manually assembled a datasetwith 130 conversational flows We predefined viable initial user ques-tions user answers as well as gold system responses These flowscover most frequent dialogues that we previously observed in human-to-human dialogue data We examined our system by implementinga self-chatting evaluation bot The evaluation cycle is as follows

bull The system accepts the first question from a predefined dia-logue via an API request preprocesses and analyzes it to de-termine the intent and extract entities

bull Then it estimates the appropriate next action and responds ac-cordingly

bull This response is then compared to the systems gold answerIf the predicted result is correct then the system receives thenext predefined user input from the templated dialogue and re-sponds again as defined above This procedure proceeds untilthe dialogue terminates (ie ID is complete) Otherwise thesystem reports an unsuccessful case number

The system successfully passed this evaluation for all 130 cases

332 Human Evaluation amp Error Analysis

To evaluate the system on irregular cases we carried out experimentswith human tutors from the OMB+ platform The tutors are experi-enced regarding rare or complex issues ambiguous responses mis-spellings and other infrequent but still relevant problems which oc-cur during a natural language dialogue In the following we reviewsome common errors and describe additional observations

misspellings and confusable spelling frequently befall in theuser-generated text Since we attempt to let the conversationremain natural to provide better user experience we do notinstruct students for formal writing Therefore we have to ac-count for various writing issues One of the prevalent problems

[ September 4 2020 at 1017 ndash classicthesis ]

38 multipurpose ipa via conversational assistant

are misspellings German words are commonly long and canbe complex and because users often type quickly it can lead tothe wrong order of characters within a given word To tacklethis challenge we utilized fuzzy match within ES Though themaximum allowed edit distance in ElasticSearch (ES) is set to 2characters which means that all the misspellings beyond thisthreshold could not be correctly identified by ES (eg Differenzial-rechnung vs Differnetialrechnung) Another representative exam-ple would be the formulation of the section or question num-ber The equivalent information can be communicated in sev-eral discrete ways which has to be considered by RegEx unit(eg Aufgabe 5 a Aufgabe V a Aufgabe 5 (a)) A similar prob-lem occurs with confusable spelling (ie Differenzialrechnung vsDifferenzialgleichung)

We analyzed the cases stated above and improved our modelby adding some of the most common misspellings to the Elas-ticSearch (ES) database or handling them with RegEx during thepreprocessing step

elasticsearch threshold We observed both cases where thesystem failed to extract information although the user providedit and examples where ES extracted information not mentionedin a user query According to our understanding these errorsoccur due to the relevancy scoring algorithm of ElasticSearch (ES)where a documentrsquos score is a combination of textual similarityand other metadata based scores Our examination revealedthat ES mostly fails to extract the information if the user mes-sage is rather short (eg about 5 tokens) To overcome this diffi-culty we coupled the current input ut with the dialogue history(h = utmiddotmiddotmiddot tminus2) That eliminated the problem and improved theretrieval quality

To solve the opposite problem where ES extracts false infor-mation (or information that was not mentioned in a query)was more challenging We learned that the problem comesfrom short words or sub-words (ie suffix prefix) which ES

locates in a database and considers them credible enough TheES documentation suggests getting rid of stop words to elim-inate this behavior However this did not improve the searchAlso fine-tuning of ES parameters such as the relevance thresholdprefix length5 and minimum should match6 parameter did not re-sult in notable improvements To cope with this problem weimplemented a verification step where a user can edit the erro-neously retrieved data

5The amount of fuzzified characters This parameter helps to decrease the num-ber of terms which must be reviewed

6Indicates a number of terms that must match for a document to be consideredrelevant

[ September 4 2020 at 1017 ndash classicthesis ]

34 structured dialogue acquisition 39

34 structured dialogue acquisition

As we stated the implemented system not only supports the humantutor by assisting students but also collects structured and labeledtraining data in the background In a trial run of the rule-based sys-tem we accumulated a toy-dataset with dialogues The assembleddataset includes the following

bull Plain dialogues with unique dialogue-indexes

bull Plain ID information collected for the whole conversation

bull Pairs of questions (ie user requests) and responses (ie botreplies) with the unique dialogue- and turn-indexes

bull Triples in the form of (Question Next Action Response) Informa-tion on the next systemrsquos action could be employed to train aDM unit with (deep-) machine learning algorithms

bull On every conversational state we keep the entities that the sys-tem was able to extract along with their position in the utter-ance This information could be used to train a custom domain-specific NER model

35 summary

In this work we realized a conversational agent for IPA purposesthat addresses two problems First it decreases repetitive and time-consuming activities which allows workers of the e-learning plat-form to focus solely on mathematical and hence cognitively demand-ing questions Second by interacting with users it augments theresources with structured and labeled training data The latter canbe utilized for further implementation of learnable dialogue compo-nents

The realization of such a system was connected with multiple chal-lenges Among others were missing structured conversational dataambiguous or erroneous user-generated text and the necessity todeal with existing corporate tools and their design

The proposed conversational agent enabled us to accumulate struc-tured labeled data without any special efforts from the human (ietutors) side (eg manual annotation and post-processing of existingdata change of the conversational structure within the tutoring pro-cess) Once we collected structured dialogues we were able to re-train particular segments of the system with deep learning methodsand achieved consistent performance for both of the proposed tasksWe report the re-implemented elements in Chapter 4

At the time of writing this thesis the system described in this workwas deployed at the OMB+ platform

[ September 4 2020 at 1017 ndash classicthesis ]

40 multipurpose ipa via conversational assistant

36 future work

Possible extensions of our work are

bull In this work we implemented a rule-based dialogue modelfrom scratch and adjusted it for the specific domain (ie e-learning) The core of the model ndash is a dialogue manager thatdetermines the current state of the conversation and the possi-ble next action Rule-based systems are generally consideredto be hardly adaptable to new domains however our dialoguemanager proved to be flexible to slight modifications in a work-flow One of the possible directions of future work would bethus the investigation of the general adaptability of the dialoguemanager core to other scenarios and domains (eg differentcourse)

bull A further possible extension would we the introduction of dif-ferent languages The current model is implemented for the Ger-man language but the OMB+ platform provides this course alsoin English and Chinese Therefore it would be interesting toinvestigate whether it is possible to adjust the existing systemto other languages without drastic changes in the model

[ September 4 2020 at 1017 ndash classicthesis ]

4D E E P L E A R N I N G R E - I M P L E M E N TAT I O N O F U N I T S

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research described inthis chapter was carried out in its entirety by the authorof this thesis The other authors of the publication actedas advisors

In this chapter we address the challenge of re-implementation oftwo central components of our IPA dialogue system described in Chap-ter 3 by employing deep learning techniques Since the data for thetraining was collected in a (short) trial-run of the rule-based systemit is not sufficient to train a neural network from scratch Thereforewe utilize Transfer Learning (TL) methods and employ the previouslypre-trained Bidirectional Encoder Representations from Transform-ers (BERT) model that we fine-tune for two of our custom tasks

41 introduction

Data mining and machine learning technologies have gained notableadvancement in many research fields including but not limited tocomputer vision robotics advanced analytics and computational lin-guistics (Pan and Yang 2009 Trautmann et al 2019 Werner et al2015) Besides that in recent years deep neural networks with theirability to accurately map from inputs to outputs (ie labels imagessentences) while analyzing patterns in data became a potential solu-tion for many industrial automatization tasks (eg fake news detec-tion sentiment analysis)

Nevertheless conventional supervised models still have limitationswhen applied to real-world data and industrial tasks The first chal-lenge here is a training phase since a robust model requires an im-mense amount of structured and labeled data Still there are just afew cases where such data is publicly available (eg research datasets)but for most of the tasks domains and languages the data has tobe gathered over a long time The second hurdle is that if trainedon existing data from a distinct domain a supervised model cannotgeneralize to conditions that are different from those faced in thetraining dataset Most of the machine learning methods perform cor-rectly only under a general assumption the training and test dataare mapped to the same feature space and the same distribution (Panand Yang 2009) When the distribution changes most statistical mod-els need to be retrained from scratch using newly obtained training

41

[ September 4 2020 at 1017 ndash classicthesis ]

42 deep learning re-implementation of units

samples It is especially crucial if applying such approaches to real-world data that is usually very noisy and contains a large numberof scenarios In this case a model would be ill-trained to make deci-sions about novel patterns Furthermore for most of the real-worldapplications it is expensive or even not feasible at all to recollect therequired training data to retrain the model Hence the possibility totransfer the knowledge obtained during the training on existing datato the new unseen domain is one of the possible solutions for indus-trial problems Such a technique is called Transfer Learning (TL) andit allows dealing with the abovementioned scenarios by leveragingexisting labeled data of some related tasks or domains

411 Outline and Contributions

This work addresses the challenge of re-implementing two out ofthree central components in our rule-based IPA conversational assis-tant with deep learning methods As we discussed in Chapter 2 hy-brid dialogue models (ie consisting of both rule-based and trainablecomponents) appear to replace the established rule- and template-based methods which are to the most part employed in the in-dustrial setting To investigate this assumption and examine currentstate-of-the-art deep learning techniques we conducted experimentson our custom domain-specific data Since the available data werecollected in a trial-run and the obtained dataset was relatively smallto train a conventional machine learning model from scratch we uti-lized the Transfer Learning (TL) approach and fine-tuned the existingpre-trained model for our target domain

For the experiments we defined two tasks

bull First we considered the NER problem in a custom domain set-ting We defined a sequence labeling task and employed a Bidirec-tional Encoder Representations from Transformers (BERT) (De-vlin et al 2018) as one of the transfer learning methods widelyutilized in the NLP area We applied the model on our datasetand fine-tuned it for seven (7) domain-specific (ie e-learning)entities

bull Second we examined the effectiveness of transfer learning forthe dialogue manager core The DM is one of the fundamentalparts of the conversational agent and requires compelling learn-ing techniques to hold the conversation intelligently For thatexperiment we defined a classification task and applied the BERT

model to predict the systems next action for every given userinput in a conversation We then computed the macro F1-scorefor 14 possible classes (ie actions) and an average dialogueaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

42 related work 43

The main characteristics of both settings are

bull the set of named entities and labels (ie actions) between sourceand target domain is different

bull for both settings we utilized the BERT model in German andmultilingual versions with additional parameters

bull the dataset used to train the target model is small and consistsof highly domain-specific labels The latter is often appearingin industrial scenarios

We finally verified that the selected method performed reasonablywell on both tasks We reached the performance of 093 macro F1

points for Named Entity Recognition (NER) and 075 macro F1 pointsfor the Next Action Prediction (NAP) task Therefore we concludethat both NER and NAP components could be employed to substituteor extend the existing rule-based modules

42 related work

As discussed in the previous section the main benefit of TransferLearning (TL) is that it allows the domains and tasks used duringtraining and testing phases to be different Initial steps in the for-mulation of transfer learning were made after NeurIPS-951 workshopthat was focused on the demand for lifelong machine-learning meth-ods that preserve and reuse previously learned knowledge (Chen etal 2018) Research on transfer learning has attracted more and moreattention since then especially in the computer vision domain andin particular for image recognition tasks (Donahue et al 2014 SharifRazavian et al 2014)

Recent studies have shown that TL can also be successfully appliedwithin the Natural Language Processing (NLP) domain especially onsemantically equivalent tasks (Dai et al 2019 Devlin et al 2018Mou et al 2016) Several research works were carried out specificallyon the Named Entity Recognition (NER) task and explored the capa-bilities of transfer learning when applied to different named entitycategories (ie different output spaces) For instance Qu et al (2016)pre-trained a linear-chain Conditional Random Fields (CRF) on a largeamount of labeled data in the source domain Then they introduceda two linear layer neural network that learns the difference betweenthe source and target label distributions Finally the authors initial-ized another CRF with parameters learned in linear layers to predictthe labels of the target domain Kim et al (2015) conducted experi-ments with transferring features and model parameters between re-lated domains where the label classes are different but may have asemantic similarity Their primary approach was to construct labelembeddings to automatically map the source and target label classesto improve the transfer (Chen et al 2018)

1Conference on Neural Information Processing Systems Formerly NIPS

[ September 4 2020 at 1017 ndash classicthesis ]

44 deep learning re-implementation of units

However there are also models trained in a manner such that theycould be applied to multiple NLP tasks simultaneously Among sucharchitectures are ULMFit (Howard and Ruder 2018) BERT (Devlin etal 2018) GPT2 (Radford et al 2019) and TransformerXL (Dai et al2019) ndash powerful models that were recently proposed for transferlearning within the NLP domain These architectures are called lan-guage models since they attempt to learn language structure from largetextual collections in an unsupervised manner and then predict thenext word in a sequence based on previously observed context wordsThe Bidirectional Encoder Representations from Transformers (BERT)(Devlin et al 2018) model is one of the most popular architectureamong the abovementioned and it is also one of the first that showeda close to human-level performance on the General Language Un-derstanding Evaluation (GLUE) benchmark2 (Wang et al 2018) thatincludes tasks on sentiment analysis question answering semanticsimilarity and language inference The architecture of BERT consistsof stacked transformer blocks that are pre-trained on a large corpusconsisting of 800M words from modern English books (ie BooksCor-pus by Zhu et al) and 2 500M words of text from pre-processed (iewithout markup) English Wikipedia articles Overall BERT advancedthe state-of-the-art performance for eleven (11) NLP tasks We describethis approach in detail in the subsequent section

43 bert bidirectional encoder representations from

transformers

Bidirectional Encoder Representations from Transformers (BERT) ar-chitecture is divided into two steps pre-training and fine-tuning Thepre-training step was enabled through two following tasks

bull Masked Language Model ndash the idea behind this approach is totrain a deep bidirectional representation that is different fromconventional language models which are either trained left-to-right or right-to-left To train a bidirectional model the authorsmask out a certain number of tokens in a sequence The modelis then asked to predict the original words for masked tokens Itis essential that in contrast to denoising auto-encoders (Vincentet al 2008) the model does not need to reconstruct the entireinput instead it attempts to predict the masked words Sincethe model does not know which words exactly it will be askedabout it learns a representation for every token in a sequence(Devlin et al 2018)

bull Next Sentence Prediction ndash aims to capture relationships be-tween two sentences A and B In order to train the model theauthors pre-trained a binarized next sentence prediction taskwhere pairs of sequences were labeled according to the criteriaA and B either follow each other directly in the corpus (label

2httpsgluebenchmarkcom

[ September 4 2020 at 1017 ndash classicthesis ]

43 bert bidirectional encoder representations from transformers 45

IsNext) or are both taken from random places (label NotNext)The model is then asked to predict whether the former or thelatter is the case The authors demonstrated that pre-trainingtowards this task is extremely useful for both question answeringand natural language inference tasks (Devlin et al 2018)

Furthermore BERT uses WordPiece tokenization for embeddings(Wu et al 2016) which segments words into subword-level tokensThis approach allows the model to make additional inferences basedon word structure (ie -ing ending denotes similar grammatical cat-egories)

The fine-tuning step follows the pre-training process For that theBERT model is initialized with parameters obtained during the pre-training step and a task-specific fully-connected layer is used aftertransformer blocks which maps the general representations to cus-tom labels (employing softmax operation)

self-attention The key operation of BERT architecture is theself-attention which is a sequence-to-sequence operation with conven-tional encoder-decoder structure (Vaswani et al 2017) The encodermaps an input sequence (x1 x2 xn) to a sequence of continuousrepresentations z = (z1 z2 zn) Given z the decoder then gener-ates an output sequence (y1y2 ym) of symbols one element ata time (Vaswani et al 2017) To produce output vector yi the selfattention operation takes a weighted average over all input vectors

yi =sumj

wijxj (16)

where j is an index over the entire sequence and the weights sumto one (1) over all j The weight wij is derived from a dot productfunction over xi and xj (Bloem 2019 Vaswani et al 2017)

w primeij = xTi xj (17)

The dot product returns a value between negative and positive in-finity and a softmax maps these values to [0 1] to ensure that theysum to one (1) over the entire sequence (Bloem 2019 Vaswani et al2017)

wij =expwlsquoijsumj expwlsquoij

(18)

A self-attention is not the only operation in transformers but is theessential one since it propagates information between vectors Otheroperations in the transformer are applied to every vector in the inputsequence without interactions between vectors (Bloem 2019 Vaswaniet al 2017)

[ September 4 2020 at 1017 ndash classicthesis ]

46 deep learning re-implementation of units

44 data amp descriptive statistics

The benchmark that we accumulated through the trial-run includes300 structured and partially labeled dialogues with the average lengthof dialogue being six (6) utterances All the conversations were heldin the German language Detailed general statistics can be found inTable 14

Value

Max Len Dialogue (in utterances) 15

Avg Len Dialogue (in utterances) 6

Max Len Utterance (in tokens) 100

Avg Len Utterance (in tokens) 9

Overall Unique Action Labels 13

Overall Unique Entity Labels 7

Train ndash Dialogues ( Utterances) 200 (1161)

Eval ndash Dialogues ( Utterances) 50 (279)

Test ndash Dialogues ( Utterances) 50 (300)

Table 14 General statistics for conversational dataset

The dataset covers 14 unique next actions and seven (7) uniquenamed entities Detailed information on next actions and entities canbe found in Table 15 and Table 16 Below we listed possible actionswith the corresponding explanation and predefined mapped state-ments

unk ndash requests to define the intent of the question mathematicalorganizational or contextual (ie fallback policy in case if theautomated intent recognition unit fails) rarr ldquoHast du eine Fragezu einer Aufgabe (MATH) zu einem Text im Kurs (TEXT) oder eineorganisatorische Frage (ORG)rdquo

org ndash handovers discussion to a human tutor in case of organiza-tional question rarr ldquoEs scheint sich wohl um eine organisatorischeFrage zu handeln Bitte fasse deine Frage in einem kurzen Text zusam-men damit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

text ndash handovers discussion to a human tutor in case of contextualquestion rarr ldquoEs scheint sich wohl um eine inhaltliche Frage zuhandeln Bitte fasse deine Frage in einem kurzen Text zusammendamit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

[ September 4 2020 at 1017 ndash classicthesis ]

44 data amp descriptive statistics 47

topic ndash requests information on the topic rarr ldquoWelches Kapitel bear-beitest du gerade Antworte bitte mit der Kapitelnummer zB IA IB II oder mit dem Kapitelnamen zB Geometrierdquo

examination ndash requests information about the examination moderarrldquoBearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfungrdquo

subtopic ndash requests information about the subtopic in case of quizor section-level trainingexercise modes rarr ldquoWie heiszligt der Ab-schnitt in dem sich die Aufgabe befindet auf die sich deine Fragebeziehtrdquo

level ndash requests information about the level of examination modechapter or section rarr ldquoGeht es um eine Aufgabe auf Kapitel-Ebene(zB um eine Trainingsaufgabe im Kapitel Geometrie) oder um eineAufgabe auf der Ebene eines Abschnitts in einem Kapitel (zB umeine Aufgabe zum Abschnitt Winkel im Kapitel Geometrie) Antwortebitte mit KAP fuumlr Kapitel und SEK fuumlr Abschnittrdquo

question number ndash requests information about the question num-ber rarr ldquoWelche Aufgabe bearbeitest du gerade Wenn du zB beiTeilaufgabe c in der zweiten Trainingsaufgabe (oder der zweiten Uumlbungs-aufgabe) bist antworte mit 2c Bearbeitest du gerade einen Quiz odereine Schlusspruumlfung gib bitte nur die Teilaufgabe anrdquo

final request ndash considers the Informational Dictionary (ID) to becomplete and initiates the verification step rarr ldquoDeine Angabenwaren wie folgt Habe ich dich richtig verstanden Bitte antwortemit Ja oder Neinrdquo

verify request ndash requests the student to verify the assembled in-formation rarr ldquoWelche der folgenden Punkte habe ich falsch erkanntSchreibe bitte welche der folgenden Punkte es sind [a b c ]3 rdquo

correct request ndash requests the student to provide a correct infor-mation if there is an error in the assembled data rarr ldquoGib bittedie korrekte Information fuumlr [a]4 einrdquo

exact question ndash requests the student to provide the detailed ex-planation of the problem rarr ldquoBitte fasse deine Frage in einemkurzen Text zusammen damit ich sie an meinen menschlichen Kolle-gen weiterleiten kannrdquo

human handover ndash finalizes the conversation rarr ldquoVielen Dankunsere menschlichen Kollegen melden sich gleich bei dirrdquo

3Variables that depend on the state of Informational Dictionary (ID)4Variable that was selected as erroneous and needs to be corrected

[ September 4 2020 at 1017 ndash classicthesis ]

48 deep learning re-implementation of units

Action Counts Action Counts

Final Request 321 Unk 80

Human Handover 300 Subtopic 55

Exact Question 286 Correct Request 40

Question Number 176 Verify Request 34

Examination 175 Org 17

Topic 137 Text 13

Level 130

Table 15 Detailed statistics on possible systems actions Columns Countsdenote the number of occurrences of each action in the entiredataset

Entity Counts

Question Nr 317

Chapter 311

Examination 303

Subtopic 198

Level 80

Intent 70

Table 16 Detailed statistics on possible named entities Column Counts de-note the number of occurrences of each entity in the entire dataset

45 named entity recognition

We defined a sequence labeling task to extract custom entities fromuser input We considered seven (7) viable entities (see Table 16) tobe recognized by the model topic subtopic examination mode and levelquestion number intent as well as the entity other for remaining wordsin the utterance Since the data collected from the rule-based sys-tem already includes information on the entities we able to train adomain-specific NER unit However since the original user-input wasinformal the same information could be provided in different writ-ing styles That means that a single entity could have different surfaceforms (eg synonyms writing styles) Therefore entities that we ex-tracted from the rule-based system were transformed into a universalstandard (eg official chapter names) To consider all of the variableentity forms while post-labeling the original dataset we determinedgeneric entity names (eg chapter question nr) and mapped vari-ations of entities from the user input (eg Chapter = [ElementaryCalculus Chapter I ]) to them The overall dataset consists of 300labeled dialogues where 200 (with 1161 utterances) were employedfor training and 100 for evaluation and test sets (50 dialogues withca 300 utterances for each set respectively)

[ September 4 2020 at 1017 ndash classicthesis ]

46 next action prediction 49

451 Model Settings

We conducted experiments with German and multilingual BERT im-plementations5 The capitalization of words is significant for the Ger-man language therefore we run the experiments on the capitalizeddata while preserving the original punctuation We employed the avail-able base model for both multilingual and German BERT implemen-tations in the cased version We initiated the learning rate for bothmodels to 1e minus 4 and the maximum length of the tokenized inputwas set to 128 tokens We conducted the experiments multiple timeswith different seeds for a maximum of 50 epochs with the trainingbatch size set to 32 We utilized AdamW as the optimizer and appliedearly stopping if the performance did not improve significantly after5 epochs

46 next action prediction

We defined a classification problem where we aim to predict the sys-temrsquos next action according to the given user input We assumed 14custom actions (see Table 15) that we considered being our classesWhile collecting the toy dataset every input was automatically la-beled by the rule-based system with the corresponding next actionand the dialogue-id Thus no additional post-labeling was requiredon this step

We investigated two settings for this task

bull Default Setting Utilizing a user input and the correspondingclass (ie next action) without any supplementary context Bydefault we conduct all of our experiments in this setting

bull Extended Setting Employing a user input corresponding nextaction and previous systems action as a source of supplementarycontext For this setting we experimented with the best per-forming model from the default setting

The overall dataset consists of 300 labeled dialogues where 200(with 1161 utterances) were employed for training and 100 for evalu-ation and test sets (50 dialogues with ca 300 utterances for each setrespectively)

461 Model Settings

For the NAP task we carried out experiments with German and multi-lingual BERT implementations as well We examined the performanceof both capitalized and lower-cased inputs as well as plain and prepro-cessed data For the multilingual BERT we used the base model inboth cased and uncased modifications For the German BERT we

5httpsgithubcomhuggingfacetransformers

[ September 4 2020 at 1017 ndash classicthesis ]

50 deep learning re-implementation of units

utilized the base model in the cased modification only6 For bothmodels we initiated the learning rate to 4e minus 5 and the maximumlength of the tokenized input was set to 128 tokens We conductedthe experiments multiple times with different seeds for a maximumof 300 epochs with the training batch size set to 32 We utilized AdamW

as the optimizer and applied early stopping if the performance didnot improve significantly after 15 epochs

47 evaluation and results

To evaluate the models we computed word-level macro F1 score forthe Named Entity Recognition (NER) task and utterance-level macro F1

score for the Next Action Prediction (NAP) task The word-level F1

is estimated as the average of the F1 scores per class each computedfrom all words in the evaluation and test sets The outcomes for theNER task are represented in Table 17 For utterance-level F1 a singleclass label (ie next action) is taken for the whole utterance Theresults for the NAP task are shown in Table 18

We additionally computed average dialogue accuracy for the best per-forming NAP models This score indicates how well the predictednext actions compose the conversational flow The average dialogueaccuracy was estimated for 50 conversations in the evaluation andtest sets respectively The results are displayed in Table 19

The obtained results for the NER task revealed that German BERT

performed significantly better than the multilingual BERT model Theperformance of the custom NER unit is at 093 macro F1 points for allpossible named entities (see Table 17) In contrast for the NAP taskthe multilingual BERT model obtained better performance than theGerman BERT model The best performing method in the default set-ting gained a macro F1 of 0677 points for 14 possible classes whereasthe model in the extended setting performed better ndash its best macroF1 score is 0752 for the equal number of classes (see Table 18)

Regarding the dialogue accuracy the extended system fine-tunedwith multilingual BERT obtained better outcomes as the default one(see Table 19) The overall conclusion for the NAP is that the capi-talized setting increased the performance of the model whereas theinclusion of punctuation has not influenced the results

6Uncased pre-trained modification of model was not available

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 51

Task Model Cased Punct F1

Eval Test

NER GER X X 0971 0930

NER Mult X X 0926 0905

Table 17 Word-level F1 for the Named Entity Recognition (NER) task Inbold best performance for evaluation and test sets

Task Model Cased Punct Extended F1

Eval Test

NAP GER X times times 0711 0673

NAP GER X X times 0701 0606

NAP Mult X X times 0688 0625

NAP Mult X times times 0769 0677

NAP Mult X times X 0810 0752

NAP Mult times X times 0664 0596

NAP Mult times times times 0742 0502

Table 18 Utterance-level F1 for the Next Action Prediction (NAP) task Un-derlined best performance for evaluation and test sets for defaultsetting (without previous action context) In bold best perfor-mance for evaluation and test sets on extended setting (with previ-ous action context)

Model Accuracy

Eval Test

NAP default 0765 0724

NAP extended 0813 0801

Table 19 Average dialogue accuracy computed for the Next Action Predic-tion (NAP) task for best preforming models In bold best perfor-mance for evaluation and test sets

[ September 4 2020 at 1017 ndash classicthesis ]

52 deep learning re-implementation of units

Figure5M

acroF1

evaluationN

AP

ndashdefault

modelC

omparison

between

Germ

anand

multilingual

BERT

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 53

Figu

re6

Mac

roF1

test

NA

Pndash

defa

ult

mod

elC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

54 deep learning re-implementation of units

Figure7M

acroF1

evaluationN

AP

ndashextended

model

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 55

Figu

re8

Mac

roF1

test

NA

Pndash

exte

nded

mod

el

[ September 4 2020 at 1017 ndash classicthesis ]

56 deep learning re-implementation of units

Figure9M

acroF1

evaluationN

ERndash

Com

parisonbetw

eenG

erman

andm

ultilingualBER

T

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 57

Figu

re1

0M

acroF1

test

NER

ndashC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

58 deep learning re-implementation of units

48 error analysis

We then investigated the cases where models failed to predict thecorrect action or labeled the named entity span erroneously Belowwe outline the most common errors

next action prediction One of the prevalent flaws in the de-fault model was the mismatch between two consecutive actionsndash the action question number and subtopic That is due to theposition of these actions in the conversational flow The occur-rence of both actions in the dialogue is not strict and largelydepends on the previous action To explain this we illustratethe following possible conversational scenarios

bull The system can either directly proceed with the request forthe question number if the examination mode is the finalexamination

bull If the examination mode is training or exercise the systemhas to ask about the level first (ie chapter or section)

bull If it is a chapter-level the system can again directly pro-ceed with the request for the question number

bull In the case of section-level the system has to ask aboutthe subsection first (if this information was not providedbefore)

The examination of the extended model showed that the inductionof supplementary context (ie previous action) improved theperformance of the system by about 60

named entity recognition The cases where the system failedcover mismatches between the labels chapter and other andthe labels question number and other This failure arose dueto the imperfectly labeled span of a multi-word named entity(eg ldquoElementary Calculus Proportionality Percentagerdquo) The firstor last words in the named entity were excluded from the spanand erroneously labeled with the other label

49 summary

In this chapter we re-implemented two of three fundamental unitsof our IPA conversational agent ndash Named Entity Recognition (NER)and Dialogue Manager (DM) ndash using the methods of Transfer Learn-ing (TL) specifically the Bidirectional Encoder Representations fromTransformers (BERT) model

We believe that the obtained results are reasonably high consider-ing a comparatively little amount of training data we employed tofine-tune the models Therefore we conclude that both NAP and NER

segments could be used to replace or extend the existing rule-based

[ September 4 2020 at 1017 ndash classicthesis ]

410 future work 59

modules Rule-based components as well as ElasticSearch (ES) arelimited in their capacities and not flexible to novel patterns Whereasthe trainable units generalize better and could possibly decrease theamount of inaccurate predictions in case of accidental dialogue be-havior Furthermore to increase the overall robustness rule-basedand trainable components could be used synchronously when onesystem fails (eg ElasticSearch (ES) module can not extract an entity)the conversation continues on the prediction obtained from the othermodel

410 future work

There are possible directions for future work

bull At the time of writing of this thesis both units were trained andinvestigated on its own ndash that means without being incorpo-rated into the initial IPA conversational system Thus one of thepotential directions of future work would be the embodimentof these units into the original system and the examination ofpossible failure cases Furthermore the resulting hybrid sys-tem could then imply the constraint of additional meta policiesto prevent an unexpected systems behavior

bull Further investigation could be towards the multi-language mo-dality for the re-implemented units Since the Online Mathe-matik Bruumlckenkurs (OMB+) platform also supports English andChinese it would be interesting to examine whether the sim-ple translation from target language (ie English Chinese) tosource language (ie German) would be sufficient to employalready-assembled dataset and pre-trained units Presumablyit should be achievable in the case of the Next Action Predic-tion (NAP) unit but could lead to a noticeable performance dropfor the Named Entity Recognition (NER) task since there couldbe a considerable gap between translated and original entitiesspans

bull Last but not least one of the further extensions for the IPA

would be the eventual applicability for more cognitively de-manding tasks For instance it is of interest to extend the modelfor more complex domains like handling of mathematical ques-tions Would it be possible to automate the human assistance atleast by less complicated mathematical inquiries or is it still un-achievable with currently existing NLP techniques and machinelearning methods

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

Part II

C L U S T E R I N G - B A S E D T R E N D A N A LY Z E R

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

5F O U N D AT I O N S

As discussed in Chapter 1 predictive analytics is one of the poten-tial Intelligent Process Automation (IPA) tasks due to its capabilityto forecast future events by using current and historical data andtherethrough assist knowledge workers with for instance sales or in-vestments Modern machine learning algorithms allow an intelligentand fast analysis of large amounts of data However the evaluationof such algorithms and models remains one of the grave problemsin this domain This is mostly due to the lack of publicly availablebenchmarks for the evaluation of topic modeling and trend detectionalgorithms Hence in this part of the thesis we are addressing thischallenge and assembling the benchmark for detecting both trendsand downtrends (ie topics that become less frequent overtime) Tothe best of our knowledge the task of downtrend detection has notbeen well addressed before

In this chapter we introduce the concepts required by the secondpart of the thesis We begin with the enlightenment to the topic mod-eling task and define the emerging trend in Section 51 Next weexamine the related work existing approaches to topic modeling andtrend detection and their limitations in Section 52 In Chapter 6we introduce our TRENDNERT benchmark for trend and downtrenddetection and describe the method we employed for its assembly

51 motivation and challenges

Science changes and evolves rapidly novel research areas emergewhile others fade away Keeping pace with these changes is challeng-ing Therefore recognizing and forecasting emerging research trends isof significant importance for researchers academic publishers as wellas for funding agencies stakeholders and innovation companies

Previously the task of trend detection was mostly solved by do-main experts who used specialized and cumbersome tools to inves-tigate the data to get useful insights from it (eg through data visu-alization or statistics) However manual analysis of large amountsof data could be time-consuming and hiring domain experts is verycostly Furthermore the overall increase of research data (ie GoogleScholar PubMed DBLP) in the past decade makes the approachbased on human domain experts less and less scalable In contrastautomated approaches become a more desirable solution for the taskof emerging trend identification (Kontostathis et al 2004 Salatino2015) Due to a large number of electronic documents appearing on

63

[ September 4 2020 at 1017 ndash classicthesis ]

64 foundations

the web daily several machine learning techniques were developed tofind patterns of word occurrences in those documents using hierarchi-cal probabilistic models These approaches are called ndash topic modelsand they allow the analysis of extensive textual collections and valu-able insights from them The main advantage of topic models is theirability to discover hidden patterns of word-use and connecting docu-ments containing similar patterns In other words the idea of a topicmodel is that documents are a mixture of topics where a topic is de-fined as a probability distribution over words (Alghamdi and Alfalqi2015)

Topics have a property of being able to evolve thus modeling top-ics without considering time may confuse the precise topic identifi-cation Whereas modeling topics by considering time is called topicevolution modeling and it can reveal important hidden informationin the document collection allowing recognizing topics with the ap-pearance of time This approach also provides the ability to check theevolution of topics during a given time window and examine whethera given topic is gaining interest and popularity over time (ie becom-ing an emerging trend) or if its development is stable Considering thelatter approach we turn to the task of emerging trend identification

How is an emerging trend defined An emerging trend is a topicthat is gaining interest and utility over time and it has two attributesnovelty and growth (Small Boyack and Klavans 2014 Tu and Seng2012) Rotolo Hicks and Martin (2015) explain the correlations ofthese two attributes in the following way Up to the point of theemergence a research topic is specified by a high level of novelty Atthat time it does not attract any considerable attention from the sci-entific community and due to the limited impact its growth is nearlyflat After some turning points (ie the appearance of significant sci-entific publications) the research topic starts to grow faster and maybecome a trend if the growth is fast however the level of a noveltydecreases continuously once the emergence becomes clear Then atthe final phase the topic becomes well-established while its noveltylevels out (He and Chen 2018)

To detect trends in textual data one employs computational anal-ysis techniques and Natural Language Processing (NLP) In the sub-sequent section we give an overview of the existing approaches fortopic modeling and trend detection

52 detection of emerging research trends

The task of trend detection is closely associated with the topic mod-eling task since in the first instance it detects the latent topics in thesubstantial collection of data It secondarily employs techniques toinvestigate the evolution of the topic and whether it has the potentialto become a trend

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 65

Topic modeling has an extensive history of applications in the scien-tific domain including but not limited to studies of temporal trends(Griffiths and Steyvers 2004 Wang and McCallum 2006) and investi-gation of the impact prediction (Yogatama et al 2011) Below we givea general overview of the most significant methods used in this do-main explain their importance and if any the potential limitations

521 Methods for Topic Modeling

In order to extract a topic from textual documents (ie research pa-pers blog posts) the precise definition of the model for topic repre-sentation is crucial

latent semantic analysis or formerly called Latent SemanticIndexing is one of the fundamental methods in the topic model-ing domain proposed by Deerwester et al in 1990 Its primarygoal is to generate vector-based representations for documentsand terms and then employ these representations to computethe similarity between documents (resp terms) in order to se-lect the most related ones In this regard Latent Semantic Anal-ysis (LSA) takes a matrix of documents and terms and then de-composes it into a separate document-topic matrix and a topic-term matrix Before turning to the central task of finding latenttopics that capture the relationships among the words and docu-ments it also performs dimensionality reduction utilizing trun-cated Singular Value Decomposition This technique is usuallyapplied to reduce the sparseness noisiness and redundancyacross dimensions in the document-topic matrix Finally usingthe resulting document and term vectors one applies measuressuch as cosine similarity to evaluate the similarity of differentdocuments words or queries

One of the extensions to traditional LSA is probabilistic LatentSemantic Analysis (pLSA) that was introduced by Hofmann in1999 to fix several shortcomings The foremost advantage ofthe follow-up method was that it could successfully automatedocument indexing which in turn improved the LSA in a prob-abilistic sense by using a generative model (Alghamdi and Al-falqi 2015) Furthermore the pLSA can differentiate betweendiverse contexts of word usage without utilizing dictionariesor a thesaurus That affects better polysemy disambiguationand discloses typical similarities by grouping words that sharea similar context Among the works that employ pLSA is theapproach proposed by Gohr et al (2009) where authors applypLSA in a window that slides across the stream of documents toanalyze the evolution of topics Also Mei et al (2008) uses thepLSA in order to create a network of topics

[ September 4 2020 at 1017 ndash classicthesis ]

66 foundations

latent dirichlet allocation was developed by Blei Ng andJordan (2003) to overcome the limitations of both LSA and pLSA

models Nowadays Latent Dirichlet Allocation (LDA) is ex-pected to be one of the most utilized and robust probabilistictopic models The fundamental concept of LDA is the assump-tion that every document is represented as a mixture of topicswhere each topic is a discrete probability distribution that de-fines the probability of each word to appear in a given topicThese topic probabilities provide a compressed representationof a document In other words LDA models all of d documentsin the collection as a mixture over n latent topics where eachof the topics describes a multinomial distribution over awwordin the vocabulary (Alghamdi and Alfalqi 2015)

LDA fixes such weaknesses of the pLSA like the incapacity toassign a probability to previously unseen documents (becausepLSA learns the topic mixtures only for documents seen in thetraining phase) and the overfitting problem (ie the number ofparameters in pLSA increases linearly with the size of the corpus)(Salatino 2015)

An extension of the conventional LDA is the hierarchical LDA

(Griffiths et al 2004) where topics are grouped in hierarchiesEach node in the hierarchy is associated with a topic and a topicis a distribution across words Another comparable approach isthe Relational Topic Model (Chang and Blei 2010) which is amixture of a topic model and a network model for groups oflinked documents The attribute of every document is its text(ie discrete observations taken from a fixed vocabulary) andthe links between documents are hyperlinks or citations

LDA is a prevalent method for discovering topics (Blei and Laf-ferty 2006 Bolelli Ertekin and Giles 2009 Bolelli et al 2009Hall Jurafsky and Manning 2008 Wang and Blei 2011 Wangand McCallum 2006) However it was explored that to trainand fine-tune a stable LDA model could be very challenging andtime-consuming (Agrawal Fu and Menzies 2018)

correlated topic model One of the main drawbacks of the LDA

is its incapability of capturing dependencies among the topics(Alghamdi and Alfalqi 2015 He et al 2017) Therefore Bleiand Lafferty (2007) proposed the Correlated Topic Model as anextension of the LDA approach In this model the authors re-placed the Dirichlet distribution with a logistic-normal distribu-tion that models pairwise topic correlations with the Gaussiancovariance matrix (He et al 2017)

However one of the shortcomings of this model is the compu-tational cost The number of parameters in the covariance ma-trix increases quadratic to the number of topics and parameterestimation for the full-rank matrix can be inaccurate in high-dimensional space (He et al 2017) Therefore He et al (2017)

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 67

proposed their method to overcome this problem The authorsintroduced the model which learns dense topic embeddingsand captures topic correlations through the similarity betweenthe topic vectors According to He et al their method allowsefficient inference in the low-dimensional embedding space andsignificantly reduces previous cubic or quadratic time complex-ity to linear concerning the topic number

522 Topic Evolution of Scientific Publications

Although the topic modeling task is essential it has a significant off-spring task namely the modeling of topic evolution Methods able toidentify topics within the context of time are highly applicable in thescientific domain They allow understanding of the topic lineage andreveal how research on one topic influences another Generally thesemethods could be classified according to the primary sources of in-formation they employ text citations or key phrases

text-based Extensive textual collections have dynamic co-occurre-nces of word patterns that change over time Thus models thattrack topics evolution over time should consider both word co-occurrence patterns and the timestamps Blei and Lafferty 2006

In 2006 Wang and McCallum proposed a Non-Markov Continu-ous Time model for the detection of topical trends in the col-lection of NeurIPS publications (the overall timestamp of 17years was considered) The authors made the following as-sumption If the pattern of the word co-occurrence exists fora short time then the system creates a narrow-time-distributiontopic Whereas for the long-term patterns system generated abroad- time-distribution topic The principal objective of this ap-proach is that it models topic evolution without discretizing thetime ie the state at time t + t1 is independent of the state attime t (Wang and McCallum 2006)

Another work done by Blei and Lafferty (2006) proposed the Dy-namic Topic Models The authors assumed that the collectionof documents is organized based on certain time spans and thedocuments belonging to every time span are represented witha K-component model Thereby the topics are associated withtime span t and emerge from topics corresponding to span timetminus 1 One of the main differences of this model compared toother existing approaches is that it utilizes Gaussian distribu-tion for the topic parameters instead of the conventional Dirich-let distribution and therefore can capture the evolution overvarious time spans

[ September 4 2020 at 1017 ndash classicthesis ]

68 foundations

citation-based analysis is the second popular direction that isconsidered to be effective for trend identification He et al 2009Le Ho and Nakamori 2005 Shibata Kajikawa and Takeda2009 Shibata et al 2008 Small 2006 This method assumes thatcitations in their various forms (ie bibliographic citations co-citation networks citation graphs) indicate the meaningful rela-tionship between topics and uses citations to model the topicevolution in the scientific domain Nie and Sun (2017) utilizethis approach along with the Latent Dirichlet Allocation (LDA)and k-means clustering to identify research trends The authorsfirst use LDA to extract features and determine the optimal num-ber of topics Then they use k-means to obtain thematic clus-ters and finally they compute citation functions to identify thechanges of clusters over time

Though the citation-based approachrsquos main drawback is that aconsistent collection of publications associated with a specificresearch topic is required for these techniques to detect the clus-ter and novelty of the particular topic He and Chen 2018 Fur-thermore there are also many trend detection scenarios (egresearch news articles or research blog posts) in which citationsare not readily available That makes this approach not applica-ble to this kind of data

keyphrase-based Another standard solution is based on the useof keyphrase information extracted from research papers In thiscase every keyword is representative of a single research topicSystems like Saffron (Monaghan et al 2010) as well as Saffron-based models use this approach Saffron is a research toolkitthat provides algorithms for keyphrase extraction entity link-ing and taxonomy extraction Asooja et al (2016) utilize Saffronto extract the keywords from LREC proceedings and then pro-posed trend forecast based on regression models as an extensionto Saffron

However this method raises many questions including the fol-lowing how to deal with the noisiness of keywords (ie notrelevant keywords) and how to handle keywords that have theirhierarchies (ie sub-areas) Other drawbacks of this approachinclude the ambiguity of keywords (ie ldquoJavardquo as an island andldquoJavardquo as a programming language) or necessity to deal withsynonyms which could be treated as different topics

53 summary

An emerging trend detection algorithm aims to recognize topics thatwere earlier inappreciable but now gaining the importance and in-terest within the specific domain Knowledge of emerging trends isespecially essential for stakeholders researchers academic publish-ers and funding organizations Assume a business analyst in the

[ September 4 2020 at 1017 ndash classicthesis ]

53 summary 69

biotech company who is interested in the analysis of technical arti-cles for recent trends that could potentially impact the companiesinvestments A manual review of all the available information in aspecific domain would be extremely time-consuming or even not fea-sible However this process could be solved through Intelligent Pro-cess Automation (IPA) algorithms Such software would assist humanexperts who are tasked with identifying emerging trends through au-tomatic analysis of extensive textual collections and visualization ofoccurrences and tendencies of a given topic

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

6B E N C H M A R K F O R S C I E N T I F I C T R E N D D E T E C T I O N

This chapter covers work already published at interna-tional peer-reviewed conferences The relevant publica-tion is Moiseeva and Schuumltze 2020 The research describedin this chapter was carried out in its entirety by the authorof the thesis The other author of the publication acted asan advisor

Computational analysis and modeling of the evolution of trendsis a significant area of research because of its socio-economic impactHowever no large publicly available benchmark for trend detectioncurrently exists making a comparative evaluation of methods impos-sible We remedy this situation by publishing the benchmark TREND-NERT consisting of a set of gold trends (resp downtrends) and doc-ument labels that is available as an unlimited download and a largeunderlying document collection that can also be obtained for free Wepropose Mean Average Precision (MAP) as an evaluation measure fortrend detection and apply this measure in an investigation of severalbaselines

61 motivation

What is an emerging research topic and how is it defined in the litera-ture At the time of emergence a research topic often does not attractmuch recognition from the scientific society and is exemplified by justa few publications He and Chen (2018) associate the appearance ofa trend to some triggering event eg the publication of an article ina high-impact journal Later the research topic starts to grow fasterand becomes a trend (He and Chen 2018)

We adopt this as our key representation of trend in this paper atrend is a research topic with a strongly increasing number of publi-cations in a particular time interval In opposite to trends there arealso downtrends which refer to the topics that move lower in theirdemand or growth as they evolve Therefore we define a downtrendas the converse to a trend a downtrend is a research topic that has astrongly decreasing number of publications in a particular time inter-val Downtrends can be as important to detect as trends eg for afunding agency planning its budget

To detect both types of topics (ie trends and downtrends) intextual data one employs computational analysis techniques and NLPThese methods enable the analysis of extensive textual collectionsand gain valuable insights into topical trendiness and evolution over

71

[ September 4 2020 at 1017 ndash classicthesis ]

72 benchmark for scientific trend detection

time Despite the overall importance of trend analysis there is to ourknowledge no large publicly available benchmark for trend detectionThis makes a comparative evaluation of methods and algorithms im-possible Most of the previous work has used proprietary or moder-ate datasets or evaluation measures were not directly bound to theobjectives of trend detection (Gollapalli and Li 2015)

We remedy this situation by publishing the benchmark TREND-NERT1 The benchmark consists of the following

1 A set of gold trends compiled based on an extensive analysis ofthe meta-literature

2 A set of labeled documents (based on more than one millionscientific publications) where each document is assigned to atrend a downtrend or the class of flat topics and supported bythe topic-title

3 An evaluation script to compute Mean Average Precision (MAP)as the measure for the accuracy of trend detection

We make the benchmark available as unlimited downloads2 Theunderlying research corpus can be obtained for free from SemanticScholar3 (Ammar et al 2018)

611 Outline and Contributions

In this work we address the challenge of the benchmark creationfor (down)trend detection This chapter is structured as follows Sec-tion 62 introduces our model for corpus creation and its fundamen-tal components In Section 63 we present the crowdsourcing proce-dure for the benchmark labeling Section 65 outlines the proposedbaseline for trend detection our experimental setup and outcomesFinally we illustrate the details on the distributed benchmark in Sec-tion 66 and conclude in Section 67

Our contributions within this work are as follows

bull TRENDNERT is the first publicly available benchmark for (down)-trend detection

bull TRENDNERT is based on a collection of more than one milliondocuments It is among the largest that has been used for trenddetection and therefore offers a realistic setting for developingtrend detection algorithms

bull TRENDNERT addresses the task of detecting both trends anddowntrends To the best of our knowledge the task of down-trend detection has not been addressed before

1The name is a concatenation of trend and its ananym dnert Because it supportsthe evaluation of both trends and downtrends

2httpsdoiorg1017605OSFIODHZCT3httpsapisemanticscholarorgcorpus

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 73

62 corpus creation

In this section we explain the methodology we used to create theTRENDNERT benchmark

621 Underlying Data

We employ a corpus provided by Semantic Scholar4 It includes morethan one million papers published mostly between 2000 and 2015

in about 5000 computer science journals and conference proceedingsEvery record in the collection consists of a title key phrases abstractfull text and metadata

622 Stratification

The distribution of documents over time in the entire original datasetis skewed (see Figure 11) We discovered that clustering the entire col-lection as well as a random sample produces corrupt results becausemore weight is given to later years than to earlier years Thereforewe first generated a stratified sample of the original document collec-tion To this end we randomly pick 10 000 documents for each yearbetween 2000 and 2016 for an overall sample size of 160 000

Figure 11 Overall distribution of papers in the entire dataset Years 1975 to2016

4httpswwwsemanticscholarorg

[ September 4 2020 at 1017 ndash classicthesis ]

74 benchmark for scientific trend detection

623 Document representations amp Clustering

As we mentioned this workrsquos primary focus was the creation of abenchmark not the development of new algorithms or models fortrend detection Therefore for our experiments we have selected analgorithm based on k-means clustering as a simple baseline methodfor trend detection

In conventional document clustering documents are usually repre-sented as bag-of-words (BOW) feature vectors (Manning Raghavanand Schuumltze 2008) Those however have a major weakness theyneglect the semantics of words (Le and Mikolov 2014) Still the re-cent work in representation learning domain and particularly doc2vecndash the method proposed by Le and Mikolov (2014) ndash can provide rep-resentations to document clustering that overcome this weakness

The doc2vec5 algorithm is inspired by techniques for learning wordvectors (Mikolov et al 2013) and is capable of capturing semantic reg-ularities in textual collections This is an unsupervised approach forlearning continuous distributed vector representations for text (ieparagraphs documents) This approach maps documents into a vec-tor space such that semantically related documents are assigned sim-ilar vector representations (eg an article about ldquogenomicsrdquo is closerto an article about ldquogene expressionrdquo than to an article about ldquofuzzysetsrdquo) Formally each paragraph is mapped to the individual vectorrepresented by a column in matrix D and every word is also mappedto the individual vector represented by a column in matrix W Theparagraph vector and word vectors are concatenated to predict thenext word in a context (Le and Mikolov 2014) Other work has al-ready successfully employed this type of document representationsfor topic modeling combining them with both Latent Dirichlet Allo-cation (LDA) and clustering approaches (Curiskis et al 2019 DiengRuiz and Blei 2019 Moody 2016 Xie and Xing 2013)

In our work we run doc2vec on the stratified sample of 160 000papers and represent each document as a length-normalized vector(ie document embedding) These vectors are then clustered intok = 1000 clusters using the scikit-learn6 implementation of k-means(MacQueen 1967) with default parameters The combination of doc-ument representations with a clustering algorithm is conceptuallysimple and interpretable Also the comprehensive comparative eval-uation of topic modeling methods utilizing document embeddingsperformed by Curiskis et al (2019) showed that doc2vec feature repre-sentations with k-means clustering outperform several other methods7

on three evaluation measures (Normalized Mutual Information Ad-justed Mutual Information and Adjusted Rand Index)

5We use the implementation provided by Gensim httpsradimrehurekcom

gensimmodelsdoc2vechtml6httpsscikit-learnorgstable7hierarchical clustering k-medoids NMF and LDA

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 75

We run 10 trials of stratification and clustering resulting in 10 dif-ferent clusterings We do this to protect against the variability ofclustering and because we do not want to rely on a single clusteringfor proposing (down)trends for the benchmark

624 Trend amp Downtrend Estimation

Recalling our definitions of trend and downtrend A trend (resp down-trend) is defined as a research topic that has a strongly increasing(resp decreasing) number of publications in a particular time inter-val

Linear trend estimation is a widely used technique in predictive(time-series) analysis that proves statements about tendencies in thedata by correlating the measurements to the times at which they oc-curred (Hess Iyer and Malm 2001) Considering the definition oftrend (resp downtrend) and the main concept of linear trend esti-mation models we utilize the linear regression to identify trend anddowntrend candidates in resulting clustering sets

Specifically we count the number of documents per year for a clus-ter and then estimate the parameters of the best fitting line (ie slope)Clusters are then ranked according to the resulting slope and then = 150 clusters with the largest (resp smallest) slope are selectedas trend (resp downtrend) candidates Thus our definition of a single-clustering (down)trend candidate is a cluster with extreme ascendingor descending slope Figure 12 and Figure 13 give two examples of(down)trend candidates

625 (Down)trend Candidates Validation

As detailed in Section 623 we run ten series of clustering result-ing in ten discrete clustering sets We then estimated the (down)trendcandidates according to the procedure described in Section 624 Con-sequently for each of the ten clustering sets we obtained 150 trendand 150 downtrend candidates

Nonetheless for the benchmark among all the (down)trend can-didates across the ten clustering sets we want to sort out only thosethat are consistent overall sets To obtain consistent (down)trend candi-dates we apply a Jaccard coefficient metric that is used for measuringthe similarity and diversity of sample sets We define an equivalencerelation simR two candidates (ie clusters) are equivalent if their Jac-card coefficient is gt τsim where τsim = 05 Following this notion wecompute the equivalence of (down)trend candidates across the tensets

Finally we define an equivalence class as an equivalence class trendcandidate (resp equivalence class downtrend candidate) if it contains trend(resp downtrends) candidates in at least half of the clusterings This

[ September 4 2020 at 1017 ndash classicthesis ]

76 benchmark for scientific trend detection

Figure 12 A trend candidate (positive slope) Topic Sentiment and EmotionAnalysis (using Social Media Channels)

Figure 13 A downtrend candidate (negative slope) Topic XML

[ September 4 2020 at 1017 ndash classicthesis ]

63 benchmark annotation 77

procedure gave us in total 110 equivalence class trend candidates and107 equivalence class downtrend candidates that we then annotatefor our benchmark8

63 benchmark annotation

To annotate the equivalence class (down)trend candidates obtainedfrom our model we used the Figure-Eight9 platform It provides anintegrated quality-control mechanism and a training phase before theactual annotation (Kolhatkar Zinsmeister and Hirst 2013) whichminimizes the rate of poorly qualified annotators

We design our annotation task as follows A worker is shown adescription of a particular (down)trend candidate along with the list ofpossible (down)trend tags Descriptions are mixed and presented inrandom order The temporal presentation order of candidates is alsoarbitrary The worker is asked to pick from the list with (down)trendtags an item that matches the cluster (resp candidate) best Evenif there are several items that are a possible match the worker mustchoose only one If there is no matching item the worker can selectthe option other Three different workers evaluated each cluster andthe final label is the majority label If there is no majority label wepresent the cluster repeatedly to the workers until there is a majority

The descriptors of candidates are collected through the followingprocedure

bull First we review the documents assigned to a candidate cluster

bull Then we extract keywords and titles of the papers attributed tothe cluster Semantic Scholar provides keywords and titles as apart of the metadata

bull Finally we compute the top 15 most frequent keywords andrandomly select 15 titles These keywords and titles are thedescriptor of the cluster candidate presented to crowd-workers

We create the (down)trend tags based on keywords titles and meta-data such as publication venue The cluster candidates were manu-ally labeled by graduate employees (ie domain experts in the com-puter science domain) considering the abovementioned items

631 Inter-annotator Agreement

To ensure the quality of obtained annotations we computed the InterAnnotator Agreement (IAA) - a measure of how well two (or more)

8Note that the overall number of 217 refers to the number of the (down)trendcandidates and not to the number of documents Each candidate contains hundredsof documents

9Formerly called CrowdFlower

[ September 4 2020 at 1017 ndash classicthesis ]

78 benchmark for scientific trend detection

annotators can make the same decision for a particular category Todo that we used the following metrics

krippendorffrsquos α is a reliability coefficient developed to measurethe agreement among observers or measuring instruments draw-ing distinctions among typically unstructured phenomena orassign computable values to them (Krippendorff 2011) Wechoose Krippendorffrsquos α as an inter-annotator agreement mea-sure because it ndash unlike other specialized coefficients ndash is a gen-eralization of several reliability criteria and applies to situationslike ours where we have more than two annotators and a givenannotator only labels a subset of the data (ie we have to dealwith missing values)

Krippendorff (2011) has proposed several metric variations eachof which is associated with a specific set of weights We useKrippendorffrsquos α considering any number of observers and miss-ing data In this case Krippendorffrsquos α for the overall numberof 217 annotated documents is α = 0798 According to Landisand Koch (1977) this value corresponds to a substantial level ofagreement

figure-eight agreement rate Figure-Eight10 provides an agree-ment score c for each annotated unit u which is based onthe majority vote of the trusted workers The score has beenproven to perform well compared to the classic metrics (Kol-hatkar Zinsmeister and Hirst 2013) We computed the averageC for the 217 annotated documents C is 0799

Since both Krippendorffrsquos α and internal Figure Eight IAA are ratherhigh we consider the obtained annotations for the benchmark to beof good quality

632 Gold Trends

We compiled a list of computer science trends for the years 2016based on the analysis of the twelve survey publications (Ankerholz2016 Augustin 2016 Brooks 2016 Frot 2016 Harriet 2015 IEEEComputer Society 2016 Markov 2015 Meyerson and Mariette 2016Nessma 2015 Reese 2015 Rivera 2016 Zaino 2016) For this yearwe identified a total of 31 trends see Table 20

Since there is a variation as to the name of a specific trend wecreated a unique name for each of them Out of the 31 trends 28(90 3) are instantiated as trends by our model that we confirmedin the crowdsourcing procedure11 We searched for the three missingtrends using a variety of techniques (ie inspection of non-trendsdowntrends random browsing of documents not assigned to trend

10Former CrowdFlower platform11The author verified this based on her judgment of equivalence of one of the 31

gold trend names in the literature with one of the trend tags

[ September 4 2020 at 1017 ndash classicthesis ]

64 evaluation measure for trend detection 79

candidates and keyword searches) Two of the missing trends donot seem to occur in the collection at all ldquoHuman Augmentationrdquo andldquoFood and Water Technologyrdquo The third missing trend ldquoCryptocurrencyand Cryptographyrdquo does occur in the collection but the occurrencerate is very small (ca 001) We do not consider these three trendsin the rest of the paper

Computer Science Trend Computer Science Trend

Autonomous Agents and Systems Natural Language Processing

Autonomous Vehicles Open Source

Big Data Analytics Privacy

Bioinformatics and (Bio) Neuroscience Quantum Computing

Biometrics and Personal Identification Recommender Systems and Social Networks

Cloud Computing and Software as a Service Reinforcement Learning

Cryptocurrency and Cryptographylowast Renewable Energy

Cyber Security Robotics

E-business Semantic Web

Food and Water Technologylowast Sentiment and Emotion Analysis

Game-based Learning Smart Cities

Games and (Virtual) Augmented Reality Supply Chains and RFIDs

Human Augmentationlowast Technology for Climate Change

MachineDeep Learning Transportation and Energy

Medical Advances and DNA Computing Wearables

Mobile Computing

Table 20 31 areas identified as computer science trends for year 2016 in themedia 28 of these trends are instantiated by our model as welland thus covered in our benchmark Topics marked with asterisk(lowast) were not found in our underlying data collection

In summary based on our analysis of meta-literature we identified28 gold trends that also occur in our collection We will use them asthe gold standard that trend detection algorithms should aim to discover

64 evaluation measure for trend detection

Whether something is a trend is a graded concept For this reasonwe adopt an evaluation measure based on a ranking of candidatesspecifically Mean Average Precision (MAP) The score denotes theaverage of the precision values of ranks of correctly identified andnon-redundant trends

We approach trend detection as computing a ranked list of sets ofdocuments where each set is a trend candidate We refer to docu-ments as trendy downtrendy and flat depending on which class theyare assigned to in our benchmark We consider a trend candidate

[ September 4 2020 at 1017 ndash classicthesis ]

80 benchmark for scientific trend detection

c as a correct recognition of gold trend t if it satisfies the followingrequirements

bull c is trendy

bull t is the largest trend in c

bull |tcap c||c| gt ρ where ρ is a coverage parameter

bull t was not earlier recognized in the list

These criteria give us a true positive (ie recognition of a gold trend)or false positive (otherwise) for every position of the ranked list anda precision score for each trend Precision for a trend that was notobserved is 0 We then compute MAP see Table 21 for an example

Gold Trends Gold Downtrends

Cloud Computing Compilers

Bioinformatics Petri Nets

Sentiment Analysis Fuzzy Sets

Privacy Routing Protocols

Trend Candidate |tcap c||c| tpfp P

c1 Cloud Computing 060 tp 100

c2 Privacy 020 fp 050

c3 Cloud Computing 070 fp 033

c4 Bioinformatics 055 tp 050

c5 Sentiment Analysis 095 tp 060

c6 Sentiment Analysis 059 fp 050

c7 Bioinformatics 041 fp 043

c8 Compilers 060 fp 038

c9 Petri Nets 080 fp 033

c10 Privacy 033 fp 030

Table 21 Example of proposed MAP evaluation (with ρ = 05) MAP is theaverage of the precision values 10 (Cloud Computing) 05 (Bioin-formatics) 06 (Sentiment Analysis) and 00 (Privacy) ie MAP =214 = 0525 True positive tp False positive fp Precision P

65 trend detection baselines

In this section we investigate the impact of four different configura-tion choices on the performance of trend detection by way of ablation

[ September 4 2020 at 1017 ndash classicthesis ]

65 trend detection baselines 81

651 Configurations

abstracts vs full texts The underlying data collection containsboth abstracts and full texts12 Our initial expectation was thatthe full text of a document comprises more information thanthe abstract and it should be a more reliable basis for trend de-tection This is one of the configurations we test in the ablationWe employ our proposed method on full text (solely documenttexts without abstracts) and abstract (only abstract texts) collec-tion separately and observe the results

stratification Due to the uneven distribution of documents overthe years the last years (with increased volume of publications)may get too much impact on results compared to earlier yearsTo investigate this we conduct an ablation experiment wherewe compare (i) randomly sampled 160k documents from theentire collection and (ii) stratified sampling In stratified sam-pling we select 10k documents for each of the years 2000 ndash 2015resulting in a sampled collection of 160k

length Lt of interval Clusters are ranked by trendiness and theresulting ranked list is evaluated To measure the trendiness orgrowth of topics over time we fit a line to an interval (ini|i0 6i lt i0 + Lt by linear regression where Lt is the length of theinterval i is one of the 16 years (2000 2015) and ni is thenumber of documents that were assigned to cluster c in thatyear As a simple default baseline we apply the regression to ahalf of an entire interval ie Lt = 8 There are nine such inter-vals in our 16-year period As the final measure of growth forthe cluster we take the maximal of the nine individual slopesTo determine how much this configuration choice affects our re-sults we also test a linear regression over the entire time spanie Lt = 16 In this case there is a single interval

clustering method We consider Gaussian Mixture Models (GMM)and k-means We use GMM with spherical type of covariancewhere each component has the same variance For both weproceed as described in Section 623

652 Experimental Setup and Results

We conducted experiments on the Semantic Scholar corpus and eval-uated a ranked list of trend candidates against the benchmark con-sidering the configurations mentioned above We adopted Mean Av-erage Precision (MAP) as the principal evaluation measure As a sec-ondary evaluation measure we computed recall at 50 (R50) whichestimates the percentage of gold trends found in the list of 50 highest-ranked trend candidates We found that the configuration (0) in Ta-

12Semantic Scholar provided the underlying dataset at the beginning of our workin 2016

[ September 4 2020 at 1017 ndash classicthesis ]

82 benchmark for scientific trend detection

ble 22 works best We then conducted ablation experiments to dis-cover the importance of the configuration choices

(0) (1) (2) (3) (4)

document part A F A A A

stratification yes yes no yes yes

Lt 8 8 8 16 8

clustering GMMsph GMMsph GMMsph GMMsph kmicro

MAP 36 (03) 07 (19) 30 (03) 32 (04) 34 (21)

R50 avg 50 (012) 25 (018) 50 (016) 46 (018) 53(014)

R50 max 61 37 53 61 61

Table 22 Ablation results Standard deviations are in parentheses GMMclustering performed with the spherical (sph) type of covarianceK-means clustering is denoted as kmicro in the ablation table

comparing (0) and (1) 13 we observed that abstracts (A) are amore reliable representation for our trend detection baselinethan full texts (F) The possible explanation for that observa-tion is that an abstract is a summary of a scientific paper thatcovers only the central points and is semantically very conciseand rich In opposite the full text contains numerous parts thatare secondary (ie future work related work) and thus mayskew the documentrsquos overall meaning

comparing (0) and (2) we recognized that stratification (yes) im-proves results compared to the setting with non-stratified (no)randomly sampled data We would expect the effect of stratifi-cation to be even more substantial for collections in which thedistribution of documents over time is more skewed than in our

comparing (0) and (3) the length of the interval is a vital config-uration choice As we assumed the 16 year interval is ratherlong and 8 year interval could be a better choice primarily ifone aims to find short-term trends

comparing (0) and (4) we recognized that topics we obtain fromk-means have a similar nature to those from GMM HoweverGMM that take variance and covariance into account and esti-mate a soft assignment still perform slightly better

66 benchmark description

Below we report the contents of the TRENDNERT benchmark14

1 Two records containing information on documents One filewith the documents assigned to (down)trends and the otherfile with documents assigned to flat topics

13Numbers (0) (1) (2) (3) (4) are the configuration choices in the ablation table14httpsdoiorg1017605OSFIODHZCT

[ September 4 2020 at 1017 ndash classicthesis ]

67 summary 83

2 A script to produce document hash codes

3 A file to map every hash code to internal IDs in the benchmark

4 A script to compute MAP (default setting is ρ = 25)

5 READMEmd a file with guidance notes

Every document in the benchmark contains the following informa-tion

bull Paper ID internal ID of documents assigned to each documentin the original Semantic Scholar collection

bull Cluster ID unique ID of meta-cluster where the document wasobserved

bull LabelTag Name of the gold (down)trend assigned by crowd-workers (eg Cyber Security Fuzzy Sets and Systems)

bull Type of (down)trend candidate trend (T ) downtrend (D) or flattopic (F)

bull Hash ID15 of each paper from the original Semantic Scholarcollection

67 summary

Emerging trend detection is a promising task for Intelligent ProcessAutomation (IPA) that allows companies and funding agencies to au-tomatically analyze extensive textual collections in order to detectemerging fields and forecast the future employing machine learningalgorithms Thus the availability of publicly available benchmarksfor trend detection is of significant importance as only thereby canone evaluate the performance and robustness of the techniques em-ployed

Within this work we release TRENDNERT ndash the first publicly availablebenchmark for (down)trend detection that offers a realistic setting fordeveloping trend detection algorithms TRENDNERT also supportsdowntrend detection ndash a significant problem that was not addressedbefore We also present several experimental findings on trend detec-tion First stratification improves trend detection if the distribution ofdocuments is skewed over time Third abstract-based trend detectionperforms better than full text if straightforward models are used

68 future work

There are possible directions for future work

bull The presented approach to estimate the trendiness of the topicby means of linear regression is straightforward and can be

15MD5-hash of First Author + Title + Year

[ September 4 2020 at 1017 ndash classicthesis ]

84 benchmark for scientific trend detection

replaced by more sophisticated versions like Kendallrsquos τ cor-relation coefficient or the Least Squares Regression Smoother(LOESS) (Cleveland 1979)

bull It also would be challenging to replace the k-means clustering al-gorithm that we adopted within this work with the LDA modelwhile retaining the document representations as it was donein Dieng Ruiz and Blei 2019 The case to investigate wouldbe then whether the topic modeling outperforms the simpleclustering algorithm in this particular setting And would theresulting system benefit from the topic modeling algorithm

bull Furthermore it would be of interest to conduct more researchon the following What kind of document representation canmake effective use of the information in full texts Recentlyproposed contextualised word embeddings could be a possibledirection for these experiments

[ September 4 2020 at 1017 ndash classicthesis ]

7D I S C U S S I O N A N D F U T U R E W O R K

As we observed in the two parts of this thesis Intelligent Process Au-tomation (IPA) is a challenging research area due to its multifaceted-ness and appliance in a real-world scenario Its main objective is toautomatize time-consuming and routine tasks and free up the knowl-edge worker for more cognitively demanding tasks However thistask addresses many difficulties such as a lack of resources for bothtraining and evaluation of algorithms as well as an insufficient per-formance of machine learning techniques when applied to real-worlddata and practical problems In this thesis we investigated two IPA

applications within the Natural Language Processing (NLP) domainndash conversational agents and predictive analytics ndash as well as addressedsome of the most common issues

bull In the first part of the thesis we have addressed the challenge ofimplementing a robust conversational system for IPA purposeswithin the practical e-learning domain considering the condi-tions of missing training data The primary purpose of the re-sulting system is to perform the onboarding process betweenthe tutors and students This onboarding reduces repetitiveand time-consuming activities while allowing tutors to focuson solely mathematical questions which is the core task withthe most impact on students Besides that by interacting withusers the system augments the resources with structured andlabeled training data that we used to implement learnable dia-logue components

The realization of such a system was associated with severalchallenges Among others were missing structured data andambiguous user-generated content that requires a precise lan-guage understanding Also the system that simultaneouslycommunicates with a large number of users has to maintainadditional meta policies such as fallback and check-up policiesto hold the conversation intelligently

bull Moreover we have shown the rule-based dialogue systemrsquos po-tential extension using transfer learning We utilized a smalldataset of structured dialogues obtained in a trial-run of therule-based system and applied the current state-of-the-art Bidi-rectional Encoder Representations from Transformers (BERT) ar-chitecture to solve the domain-specific tasks Specifically weretrained two of three components in the conversational systemndash Named Entity Recognition (NER) and Dialogue Manager (DM)(ie NAP task) ndash and confirmed the applicability of such algo-

85

[ September 4 2020 at 1017 ndash classicthesis ]

86 discussion and future work

rithms in a real-world low-resource scenario by showing thereasonably high performance of resulting models

bull Last but not least in the second part of the thesis we addressedthe issue of missing publicly available benchmarks for the eval-uation of machine learning algorithms for the trend detectiontask To the best of our knowledge we released the first openbenchmark for this purpose which offers a realistic setting fordeveloping trend detection algorithms Besides that this bench-mark supports downtrend detection ndash a significant problemthat was not sufficiently addressed before We additionally pro-posed the evaluation measure and presented several experimen-tal findings on trend detection

The ultimate objective of Intelligent Process Automation (IPA) shouldbe a creation of robust algorithms in low-resource and domain-specificconditions This is still a challenging task however some of the theexperiments presented in this thesis revealed that this goal could bepotentially achieved by means of Transfer Learning (TL) techniques

[ September 4 2020 at 1017 ndash classicthesis ]

B I B L I O G R A P H Y

Agrawal Amritanshu Wei Fu and Tim Menzies (2018) bdquoWhat iswrong with topic modeling And how to fix it using search-basedsoftware engineeringldquo In Information and Software Technology 98pp 74ndash88 (cit on p 66)

Alan M (1950) bdquoTuringldquo In Computing machinery and intelligenceMind 59236 pp 433ndash460 (cit on p 7)

Alghamdi Rubayyi and Khalid Alfalqi (2015) bdquoA survey of topicmodeling in text miningldquo In Int J Adv Comput Sci Appl(IJACSA)61 (cit on pp 64ndash66)

Ammar Waleed Dirk Groeneveld Chandra Bhagavatula Iz BeltagyMiles Crawford Doug Downey Jason Dunkelberger Ahmed El-gohary Sergey Feldman Vu Ha et al (2018) bdquoConstruction ofthe literature graph in semantic scholarldquo In arXiv preprint arXiv1805 02262 (cit on p 72)

Ankerholz Amber (2016) 2016 Future of Open Source Survey Says OpenSource Is The Modern Architecture httpswwwlinuxcomnews2016-future-open-source-survey-says-open-source-modern-

architecture (cit on p 78)Asooja Kartik Georgeta Bordea Gabriela Vulcu and Paul Buitelaar

(2016) bdquoForecasting emerging trends from scientific literatureldquoIn Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) pp 417ndash420 (cit on p 68)

Augustin Jason (2016) Emergin Science and Technology Trends A Syn-thesis of Leading Forecast httpwwwdefenseinnovationmarket-placemilresources2016_SciTechReport_16June2016pdf (citon p 78)

Blei David M and John D Lafferty (2006) bdquoDynamic topic modelsldquoIn Proceedings of the 23rd international conference on Machine learn-ing ACM pp 113ndash120 (cit on pp 66 67)

Blei David M John D Lafferty et al (2007) bdquoA correlated topic modelof scienceldquo In The Annals of Applied Statistics 11 pp 17ndash35 (citon p 66)

Blei David M Andrew Y Ng and Michael I Jordan (2003) bdquoLatentdirichlet allocationldquo In Journal of machine Learning research 3Janpp 993ndash1022 (cit on p 66)

Bloem Peter (2019) Transformers from Scratch httpwwwpeterbloemnlblogtransformers (cit on p 45)

Bocklisch Tom Joey Faulkner Nick Pawlowski and Alan Nichol(2017) bdquoRasa Open source language understanding and dialoguemanagementldquo In arXiv preprint arXiv171205181 (cit on p 15)

Bolelli Levent Seyda Ertekin and C Giles (2009) bdquoTopic and trenddetection in text collections using latent dirichlet allocationldquo InAdvances in Information Retrieval pp 776ndash780 (cit on p 66)

87

[ September 4 2020 at 1017 ndash classicthesis ]

88 Bibliography

Bolelli Levent Seyda Ertekin Ding Zhou and C Lee Giles (2009)bdquoFinding topic trends in digital librariesldquo In Proceedings of the 9thACMIEEE-CS joint conference on Digital libraries ACM pp 69ndash72

(cit on p 66)Bordes Antoine Y-Lan Boureau and Jason Weston (2016) bdquoLearning

end-to-end goal-oriented dialogldquo In arXiv preprint arXiv160507683(cit on pp 16 17)

Brooks Chuck (2016) 7 Top Tech Trends Impacting Innovators in 2016httpinnovationexcellencecomblog201512267-top-

tech-trends-impacting-innovators-in-2016 (cit on p 78)Burtsev Mikhail Alexander Seliverstov Rafael Airapetyan Mikhail

Arkhipov Dilyara Baymurzina Eugene Botvinovsky NickolayBushkov Olga Gureenkova Andrey Kamenev Vasily Konovalovet al (2018) bdquoDeepPavlov An Open Source Library for Conver-sational AIldquo In (cit on p 15)

Chang Jonathan David M Blei et al (2010) bdquoHierarchical relationalmodels for document networksldquo In The Annals of Applied Statis-tics 41 pp 124ndash150 (cit on p 66)

Chen Hongshen Xiaorui Liu Dawei Yin and Jiliang Tang (2017)bdquoA survey on dialogue systems Recent advances and new fron-tiersldquo In Acm Sigkdd Explorations Newsletter 192 pp 25ndash35 (citon pp 14ndash16)

Chen Lingzhen Alessandro Moschitti Giuseppe Castellucci AndreaFavalli and Raniero Romagnoli (2018) bdquoTransfer Learning for In-dustrial Applications of Named Entity Recognitionldquo In NL4AIAI IA pp 129ndash140 (cit on p 43)

Cleveland William S (1979) bdquoRobust locally weighted regression andsmoothing scatterplotsldquo In Journal of the American statistical asso-ciation 74368 pp 829ndash836 (cit on p 84)

Collobert Ronan Jason Weston Leacuteon Bottou Michael Karlen KorayKavukcuoglu and Pavel Kuksa (2011) bdquoNatural language pro-cessing (almost) from scratchldquo In Journal of machine learning re-search 12Aug pp 2493ndash2537 (cit on p 16)

Cuayaacutehuitl Heriberto Simon Keizer and Oliver Lemon (2015) bdquoStrate-gic dialogue management via deep reinforcement learningldquo InarXiv preprint arXiv151108099 (cit on p 14)

Cui Lei Shaohan Huang Furu Wei Chuanqi Tan Chaoqun Duanand Ming Zhou (2017) bdquoSuperagent A customer service chat-bot for e-commerce websitesldquo In Proceedings of ACL 2017 SystemDemonstrations pp 97ndash102 (cit on p 8)

Curiskis Stephan A Barry Drake Thomas R Osborn and Paul JKennedy (2019) bdquoAn evaluation of document clustering and topicmodelling in two online social networks Twitter and Redditldquo InInformation Processing amp Management (cit on p 74)

Dai Zihang Zhilin Yang Yiming Yang William W Cohen Jaime Car-bonell Quoc V Le and Ruslan Salakhutdinov (2019) bdquoTransformer-xl Attentive language models beyond a fixed-length contextldquo InarXiv preprint arXiv190102860 (cit on pp 43 44)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 89

Daniel Ben (2015) bdquoBig Data and analytics in higher education Op-portunities and challengesldquo In British journal of educational tech-nology 465 pp 904ndash920 (cit on p 22)

Deerwester Scott Susan T Dumais George W Furnas Thomas K Lan-dauer and Richard Harshman (1990) bdquoIndexing by latent seman-tic analysisldquo In Journal of the American society for information sci-ence 416 pp 391ndash407 (cit on p 65)

Deoras Anoop Kaisheng Yao Xiaodong He Li Deng Geoffrey Ger-son Zweig Ruhi Sarikaya Dong Yu Mei-Yuh Hwang and Gre-goire Mesnil (2015) Assignment of semantic labels to a sequence ofwords using neural network architectures US Patent App 14016186

(cit on p 13)Devlin Jacob Ming-Wei Chang Kenton Lee and Kristina Toutanova

(2018) bdquoBert Pre-training of deep bidirectional transformers forlanguage understandingldquo In arXiv preprint arXiv181004805 (citon pp 42ndash45)

Dhingra Bhuwan Lihong Li Xiujun Li Jianfeng Gao Yun-NungChen Faisal Ahmed and Li Deng (2016) bdquoTowards end-to-end re-inforcement learning of dialogue agents for information accessldquoIn arXiv preprint arXiv160900777 (cit on p 16)

Dieng Adji B Francisco JR Ruiz and David M Blei (2019) bdquoTopicModeling in Embedding Spacesldquo In arXiv preprint arXiv190704907(cit on pp 74 84)

Donahue Jeff Yangqing Jia Oriol Vinyals Judy Hoffman Ning ZhangEric Tzeng and Trevor Darrell (2014) bdquoDecaf A deep convolu-tional activation feature for generic visual recognitionldquo In Inter-national conference on machine learning pp 647ndash655 (cit on p 43)

Eric Mihail and Christopher D Manning (2017) bdquoKey-value retrievalnetworks for task-oriented dialogueldquo In arXiv preprint arXiv 170505414 (cit on p 16)

Fang Hao Hao Cheng Maarten Sap Elizabeth Clark Ari HoltzmanYejin Choi Noah A Smith and Mari Ostendorf (2018) bdquoSoundingboard A user-centric and content-driven social chatbotldquo In arXivpreprint arXiv180410202 (cit on p 10)

Friedrich Fabian Jan Mendling and Frank Puhlmann (2011) bdquoPro-cess model generation from natural language textldquo In Interna-tional Conference on Advanced Information Systems Engineering pp 482

ndash496 (cit on p 2)Frot Mathilde (2016) 5 Trends in Computer Science Research https

wwwtopuniversitiescomcoursescomputer-science-info-

rmation-systems5-trends-computer-science-research (citon p 78)

Glasmachers Tobias (2017) bdquoLimits of end-to-end learningldquo In arXivpreprint arXiv170408305 (cit on pp 16 17)

Goddeau David Helen Meng Joseph Polifroni Stephanie Seneffand Senis Busayapongchai (1996) bdquoA form-based dialogue man-ager for spoken language applicationsldquo In Proceeding of FourthInternational Conference on Spoken Language Processing ICSLPrsquo96Vol 2 IEEE pp 701ndash704 (cit on p 14)

[ September 4 2020 at 1017 ndash classicthesis ]

90 Bibliography

Gohr Andre Alexander Hinneburg Rene Schult and Myra Spiliopou-lou (2009) bdquoTopic evolution in a stream of documentsldquo In Pro-ceedings of the 2009 SIAM International Conference on Data MiningSIAM pp 859ndash870 (cit on p 65)

Gollapalli Sujatha Das and Xiaoli Li (2015) bdquoEMNLP versus ACLAnalyzing NLP research over timeldquo In EMNLP pp 2002ndash2006

(cit on p 72)Griffiths Thomas L and Mark Steyvers (2004) bdquoFinding scientific top-

icsldquo In Proceedings of the National academy of Sciences 101suppl 1pp 5228ndash5235 (cit on p 65)

Griffiths Thomas L Michael I Jordan Joshua B Tenenbaum andDavid M Blei (2004) bdquoHierarchical topic models and the nestedChinese restaurant processldquo In Advances in neural information pro-cessing systems pp 17ndash24 (cit on p 66)

Hall David Daniel Jurafsky and Christopher D Manning (2008) bdquoStu-dying the history of ideas using topic modelsldquo In Proceedingsof the conference on empirical methods in natural language processingAssociation for Computational Linguistics pp 363ndash371 (cit onp 66)

Harriet Taylor (2015) Privacy will hit tipping point in 2016 httpwwwcnbccom20151109privacy-will-hit-tipping-point-

in-2016html (cit on p 78)He Jiangen and Chaomei Chen (2018) bdquoPredictive Effects of Novelty

Measured by Temporal Embeddings on the Growth of ScientificLiteratureldquo In Frontiers in Research Metrics and Analytics 3 p 9

(cit on pp 64 68 71)He Junxian Zhiting Hu Taylor Berg-Kirkpatrick Ying Huang and

Eric P Xing (2017) bdquoEfficient correlated topic modeling with topicembeddingldquo In Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ACM pp 225ndash233 (cit on pp 66 67)

He Qi Bi Chen Jian Pei Baojun Qiu Prasenjit Mitra and Lee Giles(2009) bdquoDetecting topic evolution in scientific literature How cancitations helpldquo In Proceedings of the 18th ACM conference on In-formation and knowledge management ACM pp 957ndash966 (cit onp 68)

Henderson Matthew Blaise Thomson and Jason D Williams (2014)bdquoThe second dialog state tracking challengeldquo In Proceedings of the15th Annual Meeting of the Special Interest Group on Discourse andDialogue (SIGDIAL) pp 263ndash272 (cit on p 15)

Henderson Matthew Blaise Thomson and Steve Young (2013) bdquoDeepneural network approach for the dialog state tracking challengeldquoIn Proceedings of the SIGDIAL 2013 Conference pp 467ndash471 (cit onp 14)

Hess Ann Hari Iyer and William Malm (2001) bdquoLinear trend analy-sis a comparison of methodsldquo In Atmospheric Environment 3530Visibility Aerosol and Atmospheric Optics pp 5211 ndash5222 issn1352-2310 doi httpsdoiorg101016S1352- 2310(01)00342-9 url httpwwwsciencedirectcomsciencearticlepiiS1352231001003429 (cit on p 75)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 91

Hofmann Thomas (1999) bdquoProbabilistic latent semantic analysisldquo InProceedings of the Fifteenth conference on Uncertainty in artificial in-telligence Morgan Kaufmann Publishers Inc pp 289ndash296 (cit onp 65)

Honnibal Matthew and Ines Montani (2017) bdquospacy 2 Natural lan-guage understanding with bloom embeddings convolutional neu-ral networks and incremental parsingldquo In To appear 7 (cit onp 15)

Howard Jeremy and Sebastian Ruder (2018) bdquoUniversal languagemodel fine-tuning for text classificationldquo In arXiv preprint arXiv180106146 (cit on p 44)

IEEE Computer Society (2016) Top 9 Computing Technology Trends for2016 httpswwwscientificcomputingcomnews201601top-9-computing-technology-trends-2016 (cit on p 78)

Ivancic Lucija Dalia Suša Vugec and Vesna Bosilj Vukšic (2019) bdquoRobo-tic Process Automation Systematic Literature Reviewldquo In In-ternational Conference on Business Process Management Springerpp 280ndash295 (cit on pp 1 2)

Jain Mohit Pratyush Kumar Ramachandra Kota and Shwetak NPatel (2018) bdquoEvaluating and informing the design of chatbotsldquoIn Proceedings of the 2018 Designing Interactive Systems ConferenceACM pp 895ndash906 (cit on p 8)

Khanpour Hamed Nishitha Guntakandla and Rodney Nielsen (2016)bdquoDialogue act classification in domain-independent conversationsusing a deep recurrent neural networkldquo In Proceedings of COL-ING 2016 the 26th International Conference on Computational Lin-guistics Technical Papers pp 2012ndash2021 (cit on p 13)

Kim Young-Bum Karl Stratos Ruhi Sarikaya and Minwoo Jeong(2015) bdquoNew transfer learning techniques for disparate label setsldquoIn Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference onNatural Language Processing (Volume 1 Long Papers) pp 473ndash482

(cit on p 43)Kolhatkar Varada Heike Zinsmeister and Graeme Hirst (2013) bdquoAn-

notating anaphoric shell nouns with their antecedentsldquo In Pro-ceedings of the 7th Linguistic Annotation Workshop and Interoperabil-ity with Discourse pp 112ndash121 (cit on pp 77 78)

Kontostathis April Leon M Galitsky William M Pottenger SomaRoy and Daniel J Phelps (2004) bdquoA survey of emerging trend de-tection in textual data miningldquo In Survey of text mining Springerpp 185ndash224 (cit on p 63)

Krippendorff Klaus (2011) bdquoComputing Krippendorffrsquos alpha- relia-bilityldquo In Annenberg School for Communication (ASC) DepartmentalPapers 43 (cit on p 78)

Kurata Gakuto Bing Xiang Bowen Zhou and Mo Yu (2016) bdquoLever-aging sentence-level information with encoder lstm for semanticslot fillingldquo In arXiv preprint arXiv160101530 (cit on p 13)

Landis J Richard and Gary G Koch (1977) bdquoThe measurement ofobserver agreement for categorical dataldquo In biometrics pp 159ndash174 (cit on p 78)

[ September 4 2020 at 1017 ndash classicthesis ]

92 Bibliography

Le Minh-Hoang Tu-Bao Ho and Yoshiteru Nakamori (2005) bdquoDe-tecting emerging trends from scientific corporaldquo In InternationalJournal of Knowledge and Systems Sciences 22 pp 53ndash59 (cit onp 68)

Le Quoc V and Tomas Mikolov (2014) bdquoDistributed Representationsof Sentences and Documentsldquo In ICML Vol 14 pp 1188ndash1196

(cit on p 74)Lee Ji Young and Franck Dernoncourt (2016) bdquoSequential short-text

classification with recurrent and convolutional neural networksldquoIn arXiv preprint arXiv160303827 (cit on p 13)

Leopold Henrik Han van der Aa and Hajo A Reijers (2018) bdquoIden-tifying candidate tasks for robotic process automation in textualprocess descriptionsldquo In Enterprise Business-Process and Informa-tion Systems Modeling Springer pp 67ndash81 (cit on p 2)

Levenshtein Vladimir I (1966) bdquoBinary codes capable of correctingdeletions insertions and reversalsldquo In Soviet physics doklady 8pp 707ndash710 (cit on p 25)

Liao Vera Masud Hussain Praveen Chandar Matthew Davis Yasa-man Khazaeni Marco Patricio Crasso Dakuo Wang Michael Mu-ller N Sadat Shami Werner Geyer et al (2018) bdquoAll Work andNo Playldquo In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems ACM p 3 (cit on p 8)

Lowe Ryan Thomas Nissan Pow Iulian Vlad Serban Laurent Char-lin Chia-Wei Liu and Joelle Pineau (2017) bdquoTraining end-to-enddialogue systems with the ubuntu dialogue corpusldquo In Dialogueamp Discourse 81 pp 31ndash65 (cit on p 22)

Luger Ewa and Abigail Sellen (2016) bdquoLike having a really bad PAthe gulf between user expectation and experience of conversa-tional agentsldquo In Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems ACM pp 5286ndash5297 (cit on p 8)

MacQueen James (1967) bdquoSome methods for classification and analy-sis of multivariate observationsldquo In Proceedings of the fifth Berkeleysymposium on mathematical statistics and probability Vol 1 OaklandCA USA pp 281ndash297 (cit on p 74)

Manning Christopher D Prabhakar Raghavan and Hinrich Schuumltze(2008) Introduction to Information Retrieval Web publication at in-formationretrievalorg Cambridge University Press (cit on p 74)

Markov Igor (2015) 13 Of 2015rsquos Hottest Topics In Computer ScienceResearch httpswwwforbescomsitesquora2015042213-of-2015s-hottest-topics-in-computer-science-research

fb0d46b1e88d (cit on p 78)Martin James H and Daniel Jurafsky (2009) Speech and language pro-

cessing An introduction to natural language processing computationallinguistics and speech recognition PearsonPrentice Hall Upper Sad-dle River

Mei Qiaozhu Deng Cai Duo Zhang and ChengXiang Zhai (2008)bdquoTopic modeling with network regularizationldquo In Proceedings ofthe 17th international conference on World Wide Web ACM pp 101ndash110 (cit on p 65)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 93

Meyerson Bernard and DiChristina Mariette (2016) These are the top10 emerging technologies of 2016 httpwww3weforumorgdocsGAC16_Top10_Emerging_Technologies_2016_reportpdf (cit onp 78)

Mikolov Tomas Kai Chen Greg Corrado and Jeffrey Dean (2013)bdquoEfficient estimation of word representations in vector spaceldquo InarXiv preprint arXiv13013781 (cit on p 74)

Miller Alexander Adam Fisch Jesse Dodge Amir-Hossein KarimiAntoine Bordes and Jason Weston (2016) bdquoKey-value memorynetworks for directly reading documentsldquo In (cit on p 16)

Mohanty Soumendra and Sachin Vyas (2018) bdquoHow to Compete inthe Age of Artificial Intelligenceldquo In (cit on pp III V 1 2)

Moiseeva Alena and Hinrich Schuumltze (2020) bdquoTRENDNERT A Bench-mark for Trend and Downtrend Detection in a Scientific DomainldquoIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial In-telligence (cit on p 71)

Moiseeva Alena Dietrich Trautmann Michael Heimann and Hin-rich Schuumltze (2020) bdquoMultipurpose Intelligent Process Automa-tion via Conversational Assistantldquo In arXiv preprint arXiv200102284 (cit on pp 21 41)

Monaghan Fergal Georgeta Bordea Krystian Samp and Paul Buite-laar (2010) bdquoExploring your research Sprinkling some saffron onsemantic web dog foodldquo In Semantic Web Challenge at the Inter-national Semantic Web Conference Vol 117 Citeseer pp 420ndash435

(cit on p 68)Monostori Laacuteszloacute Andraacutes Maacuterkus Hendrik Van Brussel and E West-

kaumlmpfer (1996) bdquoMachine learning approaches to manufactur-ingldquo In CIRP annals 452 pp 675ndash712 (cit on p 15)

Moody Christopher E (2016) bdquoMixing dirichlet topic models andword embeddings to make lda2vecldquo In arXiv preprint arXiv160502019 (cit on p 74)

Mou Lili Zhao Meng Rui Yan Ge Li Yan Xu Lu Zhang and ZhiJin (2016) bdquoHow transferable are neural networks in nlp applica-tionsldquo In arXiv preprint arXiv160306111 (cit on p 43)

Mrkšic Nikola Diarmuid O Seacuteaghdha Tsung-Hsien Wen Blaise Thom-son and Steve Young (2016) bdquoNeural belief tracker Data-drivendialogue state trackingldquo In arXiv preprint arXiv160603777 (citon p 14)

Nessma Joussef (2015) Top 10 Hottest Research Topics in Computer Sci-ence http www pouted com top - 10 - hottest - research -

topics-in-computer-science (cit on p 78)Nie Binling and Shouqian Sun (2017) bdquoUsing text mining techniques

to identify research trends A case study of design researchldquo InApplied Sciences 74 p 401 (cit on p 68)

Pan Sinno Jialin and Qiang Yang (2009) bdquoA survey on transfer learn-ingldquo In IEEE Transactions on knowledge and data engineering 2210pp 1345ndash1359 (cit on p 41)

Qu Lizhen Gabriela Ferraro Liyuan Zhou Weiwei Hou and Tim-othy Baldwin (2016) bdquoNamed entity recognition for novel types

[ September 4 2020 at 1017 ndash classicthesis ]

94 Bibliography

by transfer learningldquo In arXiv preprint arXiv161009914 (cit onp 43)

Radford Alec Jeffrey Wu Rewon Child David Luan Dario Amodeiand Ilya Sutskever (2019) bdquoLanguage models are unsupervisedmultitask learnersldquo In OpenAI Blog 18 (cit on p 44)

Reese Hope (2015) 7 trends for artificial intelligence in 2016 rsquoLike 2015on steroidsrsquo httpwwwtechrepubliccomarticle7-trends-for-artificial-intelligence-in-2016-like-2015-on-steroids

(cit on p 78)Rivera Maricel (2016) Is Digital Game-Based Learning The Future Of

Learning httpselearningindustrycomdigital-game-based-

learning-future (cit on p 78)Rotolo Daniele Diana Hicks and Ben R Martin (2015) bdquoWhat is an

emerging technologyldquo In Research Policy 4410 pp 1827ndash1843

(cit on p 64)Salatino Angelo (2015) bdquoEarly detection and forecasting of research

trendsldquo In (cit on pp 63 66)Serban Iulian V Alessandro Sordoni Yoshua Bengio Aaron Courville

and Joelle Pineau (2016) bdquoBuilding end-to-end dialogue systemsusing generative hierarchical neural network modelsldquo In Thirti-eth AAAI Conference on Artificial Intelligence (cit on p 11)

Sharif Razavian Ali Hossein Azizpour Josephine Sullivan and Ste-fan Carlsson (2014) bdquoCNN features off-the-shelf an astoundingbaseline for recognitionldquo In Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops pp 806ndash813 (citon p 43)

Shibata Naoki Yuya Kajikawa and Yoshiyuki Takeda (2009) bdquoCom-parative study on methods of detecting research fronts using dif-ferent types of citationldquo In Journal of the American Society for in-formation Science and Technology 603 pp 571ndash580 (cit on p 68)

Shibata Naoki Yuya Kajikawa Yoshiyuki Takeda and KatsumoriMatsushima (2008) bdquoDetecting emerging research fronts basedon topological measures in citation networks of scientific publica-tionsldquo In Technovation 2811 pp 758ndash775 (cit on p 68)

Small Henry (2006) bdquoTracking and predicting growth areas in sci-enceldquo In Scientometrics 683 pp 595ndash610 (cit on p 68)

Small Henry Kevin W Boyack and Richard Klavans (2014) bdquoIden-tifying emerging topics in science and technologyldquo In ResearchPolicy 438 pp 1450ndash1467 (cit on p 64)

Su Pei-Hao Nikola Mrkšic Intildeigo Casanueva and Ivan Vulic (2018)bdquoDeep Learning for Conversational AIldquo In Proceedings of the 2018Conference of the North American Chapter of the Association for Com-putational Linguistics Tutorial Abstracts pp 27ndash32 (cit on p 12)

Sutskever Ilya Oriol Vinyals and Quoc V Le (2014) bdquoSequence tosequence learning with neural networksldquo In Advances in neuralinformation processing systems pp 3104ndash3112 (cit on pp 16 17)

Trautmann Dietrich Johannes Daxenberger Christian Stab HinrichSchuumltze and Iryna Gurevych (2019) bdquoRobust Argument Unit Recog-nition and Classificationldquo In arXiv preprint arXiv190409688 (citon p 41)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 95

Tu Yi-Ning and Jia-Lang Seng (2012) bdquoIndices of novelty for emerg-ing topic detectionldquo In Information processing amp management 482pp 303ndash325 (cit on p 64)

Turing Alan (1949) London Times letter to the editor (cit on p 7)Ultes Stefan Lina Barahona Pei-Hao Su David Vandyke Dongho

Kim Inigo Casanueva Paweł Budzianowski Nikola Mrkšic Tsu-ng-Hsien Wen et al (2017) bdquoPydial A multi-domain statistical di-alogue system toolkitldquo In Proceedings of ACL 2017 System Demon-strations pp 73ndash78 (cit on p 15)

Vaswani Ashish Noam Shazeer Niki Parmar Jakob Uszkoreit LlionJones Aidan N Gomez Łukasz Kaiser and Illia Polosukhin (2017)bdquoAttention is all you needldquo In Advances in neural information pro-cessing systems pp 5998ndash6008 (cit on p 45)

Vincent Pascal Hugo Larochelle Yoshua Bengio and Pierre-AntoineManzagol (2008) bdquoExtracting and composing robust features withdenoising autoencodersldquo In Proceedings of the 25th internationalconference on Machine learning ACM pp 1096ndash1103 (cit on p 44)

Vodolaacuten Miroslav Rudolf Kadlec and Jan Kleindienst (2015) bdquoHy-brid dialog state trackerldquo In arXiv preprint arXiv151003710 (citon p 14)

Wang Alex Amanpreet Singh Julian Michael Felix Hill Omer Levyand Samuel R Bowman (2018) bdquoGlue A multi-task benchmarkand analysis platform for natural language understandingldquo InarXiv preprint arXiv180407461 (cit on p 44)

Wang Chong and David M Blei (2011) bdquoCollaborative topic modelingfor recommending scientific articlesldquo In Proceedings of the 17thACM SIGKDD international conference on Knowledge discovery anddata mining ACM pp 448ndash456 (cit on p 66)

Wang Xuerui and Andrew McCallum (2006) bdquoTopics over time anon-Markov continuous-time model of topical trendsldquo In Pro-ceedings of the 12th ACM SIGKDD international conference on Knowl-edge discovery and data mining ACM pp 424ndash433 (cit on pp 65ndash67)

Wang Zhuoran and Oliver Lemon (2013) bdquoA simple and generic be-lief tracking mechanism for the dialog state tracking challengeOn the believability of observed informationldquo In Proceedings ofthe SIGDIAL 2013 Conference pp 423ndash432 (cit on p 14)

Weizenbaum Joseph et al (1966) bdquoELIZAmdasha computer program forthe study of natural language communication between man andmachineldquo In Communications of the ACM 91 pp 36ndash45 (cit onp 7)

Wen Tsung-Hsien Milica Gasic Nikola Mrksic Pei-Hao Su DavidVandyke and Steve Young (2015) bdquoSemantically conditioned lstm-based natural language generation for spoken dialogue systemsldquoIn arXiv preprint arXiv150801745 (cit on p 14)

Wen Tsung-Hsien David Vandyke Nikola Mrksic Milica Gasic LinaM Rojas-Barahona Pei-Hao Su Stefan Ultes and Steve Young(2016) bdquoA network-based end-to-end trainable task-oriented dia-logue systemldquo In arXiv preprint arXiv160404562 (cit on pp 1116 17 22)

[ September 4 2020 at 1017 ndash classicthesis ]

96 Bibliography

Werner Alexander Dietrich Trautmann Dongheui Lee and RobertoLampariello (2015) bdquoGeneralization of optimal motion trajecto-ries for bipedal walkingldquo In pp 1571ndash1577 (cit on p 41)

Williams Jason D Kavosh Asadi and Geoffrey Zweig (2017) bdquoHybridcode networks practical and efficient end-to-end dialog controlwith supervised and reinforcement learningldquo In arXiv preprintarXiv170203274 (cit on p 17)

Williams Jason Antoine Raux and Matthew Henderson (2016) bdquoThedialog state tracking challenge series A reviewldquo In Dialogue ampDiscourse 73 pp 4ndash33 (cit on p 12)

Wu Chien-Sheng Richard Socher and Caiming Xiong (2019) bdquoGlobal-to-local Memory Pointer Networks for Task-Oriented DialogueldquoIn arXiv preprint arXiv190104713 (cit on pp 15ndash17)

Wu Yonghui Mike Schuster Zhifeng Chen Quoc V Le Moham-mad Norouzi Wolfgang Macherey Maxim Krikun Yuan CaoQin Gao Klaus Macherey et al (2016) bdquoGooglersquos neural ma-chine translation system Bridging the gap between human andmachine translationldquo In arXiv preprint arXiv160908144 (cit onp 45)

Xie Pengtao and Eric P Xing (2013) bdquoIntegrating Document Cluster-ing and Topic Modelingldquo In Proceedings of the 30th Conference onUncertainty in Artificial Intelligence (cit on p 74)

Yan Zhao Nan Duan Peng Chen Ming Zhou Jianshe Zhou andZhoujun Li (2017) bdquoBuilding task-oriented dialogue systems foronline shoppingldquo In Thirty-First AAAI Conference on Artificial In-telligence (cit on p 14)

Yogatama Dani Michael Heilman Brendan OrsquoConnor Chris DyerBryan R Routledge and Noah A Smith (2011) bdquoPredicting a scien-tific communityrsquos response to an articleldquo In Proceedings of the Con-ference on Empirical Methods in Natural Language Processing Asso-ciation for Computational Linguistics pp 594ndash604 (cit on p 65)

Young Steve Milica Gašic Blaise Thomson and Jason D Williams(2013) bdquoPomdp-based statistical spoken dialog systems A reviewldquoIn Proceedings of the IEEE 1015 pp 1160ndash1179 (cit on p 17)

Zaino Jennifer (2016) 2016 Trends for Semantic Web and Semantic Tech-nologies http www dataversity net 2017 - predictions -

semantic-web-semantic-technologies (cit on p 78)Zhou Hao Minlie Huang et al (2016) bdquoContext-aware natural lan-

guage generation for spoken dialogue systemsldquo In Proceedingsof COLING 2016 the 26th International Conference on ComputationalLinguistics Technical Papers pp 2032ndash2041 (cit on pp 14 15)

Zhu Yukun Ryan Kiros Rich Zemel Ruslan Salakhutdinov RaquelUrtasun Antonio Torralba and Sanja Fidler (2015) bdquoAligningbooks and movies Towards story-like visual explanations by watch-ing movies and reading booksldquo In Proceedings of the IEEE interna-tional conference on computer vision pp 19ndash27 (cit on p 44)

[ September 4 2020 at 1017 ndash classicthesis ]

  • Abstract
  • Abstract
  • Acknowledgements
  • Contents
  • List of Figures
    • List of Figures
      • List of Tables
        • List of Tables
          • Acronyms
            • Acronyms
              • 1 Introduction
                • 11 Outline
                  • E-learning Conversational Assistant
                    • 2 Foundations
                      • 21 Introduction
                      • 22 Conversational Assistants
                        • 221 Taxonomy
                        • 222 Conventional Architecture
                        • 223 Existing Approaches
                          • 23 Target Domain amp Task Definition
                          • 24 Summary
                            • 3 Multipurpose IPA via Conversational Assistant
                              • 31 Introduction
                                • 311 Outline and Contributions
                                  • 32 Model
                                    • 321 OMB+ Design
                                    • 322 Preprocessing
                                    • 323 Natural Language Understanding
                                    • 324 Dialogue Manager
                                    • 325 Meta Policy
                                    • 326 Response Generation
                                      • 33 Evaluation
                                        • 331 Automated Evaluation
                                        • 332 Human Evaluation amp Error Analysis
                                          • 34 Structured Dialogue Acquisition
                                          • 35 Summary
                                          • 36 Future Work
                                            • 4 Deep Learning Re-Implementation of Units
                                              • 41 Introduction
                                                • 411 Outline and Contributions
                                                  • 42 Related Work
                                                  • 43 BERT Bidirectional Encoder Representations from Transformers
                                                  • 44 Data amp Descriptive Statistics
                                                  • 45 Named Entity Recognition
                                                    • 451 Model Settings
                                                      • 46 Next Action Prediction
                                                        • 461 Model Settings
                                                          • 47 Evaluation and Results
                                                          • 48 Error Analysis
                                                          • 49 Summary
                                                          • 410 Future Work
                                                              • Clustering-based Trend Analyzer
                                                                • 5 Foundations
                                                                  • 51 Motivation and Challenges
                                                                  • 52 Detection of Emerging Research Trends
                                                                    • 521 Methods for Topic Modeling
                                                                    • 522 Topic Evolution of Scientific Publications
                                                                      • 53 Summary
                                                                        • 6 Benchmark for Scientific Trend Detection
                                                                          • 61 Motivation
                                                                            • 611 Outline and Contributions
                                                                              • 62 Corpus Creation
                                                                                • 621 Underlying Data
                                                                                • 622 Stratification
                                                                                • 623 Document representations amp Clustering
                                                                                • 624 Trend amp Downtrend Estimation
                                                                                • 625 (Down)trend Candidates Validation
                                                                                  • 63 Benchmark Annotation
                                                                                    • 631 Inter-annotator Agreement
                                                                                    • 632 Gold Trends
                                                                                      • 64 Evaluation Measure for Trend Detection
                                                                                      • 65 Trend Detection Baselines
                                                                                        • 651 Configurations
                                                                                        • 652 Experimental Setup and Results
                                                                                          • 66 Benchmark Description
                                                                                          • 67 Summary
                                                                                          • 68 Future Work
                                                                                            • 7 Discussion and Future Work
                                                                                            • Bibliography
Page 6: Statistical Natural Language Processing Methods for

ment (zB Festlegung der naumlchstmoumlglichen Bot-Aktion) bis hin zurResponse-Generierung Ein typisches Dialogsystem fuumlr IPA-Zweckeuumlbernimmt in der Regel unkomplizierte Kundendienstanfragen (zBBeantwortung von FAQs) so dass sich die Mitarbeiter auf komplexereAnfragen konzentrieren koumlnnen

Diese Dissertation umfasst zwei Bereiche die durch das breitereThema vereint sind naumlmlich die Intelligente Prozessautomatisierung(IPA) unter Verwendung statistischer Methoden der natuumlrlichen Sprach-verarbeitung (NLP)

Der erste Block dieser Arbeit (Kapitel 2 ndash Kapitel 4) befasst sichmit der Impementierung eines Konversationsagenten fuumlr IPA-Zweckeinnerhalb der E-Learning-Domaumlne Wie bereits erwaumlhnt sind Chat-bots eine der zentralen Anwendungen fuumlr die IPA-Domaumlne da siezeitaufwaumlndige Aufgaben in einer natuumlrlichen Sprache effektiv aus-fuumlhren koumlnnen Der IPA-Kommunikationsbot der in dieser Arbeitrealisiert wurde kuumlmmert sich ebenfalls um routinemaumlszligige und zeit-aufwaumlndige Aufgaben die sonst von Tutoren in einem Online-Mathe-matikkurs in deutscher Sprache durchgefuumlhrt werden Dieser Botist in der taumlglichen Anwendung innerhalb der mathematischen Platt-form OMB+ eingesetzt Bei der Durchfuumlhrung von Experimenten be-obachteten wir zwei Moumlglichkeiten den Konversationsagenten im in-dustriellen Umfeld zu entwickeln ndash zunaumlchst mit rein regelbasiertenMethoden unter Bedingungen der fehlenden Trainingsdaten und be-sonderer Aspekte der Zieldomaumlne (dh E-Learning) Zweitens habenwir zwei der Hauptsystemkomponenten (SprachverstaumlndnismodulDialog-Manager) mit dem derzeit fortschrittlichsten Deep LearningAlgorithmus reimplementiert und die Performanz dieser Komponen-ten untersucht

Der zweite Teil der Doktorarbeit (Kapitel 5 ndash Kapitel 6) betrachtetein IPA-Problem innerhalb des Vorhersageanalytik-Bereichs Vorher-sageanalytik zielt darauf ab Prognosen uumlber zukuumlnftige Ergebnisseauf der Grundlage von historischen und aktuellen Daten zu erstellenDaher kann ein Unternehmen mit Hilfe der Vorhersagesysteme zBdie Trends oder neu entstehende Themen zuverlaumlssig bestimmen unddiese Informationen dann bei wichtigen Geschaumlftsentscheidungen (zBInvestitionen) einsetzen In diesem Teil der Arbeit beschaumlftigen wiruns mit dem Teilproblem der Trendprognose ndash insbesondere mit demFehlen oumlffentlich zugaumlnglicher Benchmarks fuumlr die Evaluierung vonTrenderkennungsalgorithmen Wir haben den Benchmark zusammen-gestellt und veroumlffentlicht um sowohl Trends als auch Abwaumlrtstrendszu erkennen Nach unserem besten Wissen ist die Aufgabe der Ab-waumlrtstrenderkennung bisher nicht adressiert worden Der resultie-rende Benchmark basiert auf einer Sammlung von mehr als einerMillion Dokumente der zu den groumlszligten gehoumlrt die bisher fuumlr dieTrenderkennung verwendet wurden und somit einen realistischenRahmen fuumlr die Entwicklung von Trenddetektionsalgorithmen bietet

VI

[ September 4 2020 at 1017 ndash classicthesis ]

A C K N O W L E D G E M E N T S

This thesis was possible due to several people who contributed signif-icantly to my work I owe my gratitude to each of them for guidanceencouragement and inspiration during my doctoral study

First of all I would like to thank my advisor Prof Dr HinrichSchuumltze for his mentorship support and thoughtful guidance Myspecial gratitude goes for Prof Dr Ruedi Seiler and his fantastic teamat Integral-Learning GmbH I am thankful for your valuable inputinterest in our collaborative work and for your continuous supportI would also like to show gratitude to Prof Dr Alexander Fraser forco-advising support and helpful feedback

Finally none of this would be possible without my familyrsquos sup-port there are no words to express the depth of gratitude I feel forthem My earnest appreciation goes to my husband Dietrich thankyou for all the time you were by my side and for the encouragementyou provided me through these years I am very grateful to my par-ents for their understanding and backing at any moment I appreciatethe love of my grandparents it is hard to realize how significant theircare is for me Besides that I am also grateful to my sister Alexan-dra elder brother Denis and parents-in-law for merely being alwaysthere for me

VII

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

C O N T E N T S

List of Figures XIList of Tables XIIAcronyms XIII1 introduction 1

11 Outline 3

i e-learning conversational assistant 5

2 foundations 7

21 Introduction 7

22 Conversational Assistants 9

221 Taxonomy 9

222 Conventional Architecture 12

223 Existing Approaches 15

23 Target Domain amp Task Definition 18

24 Summary 19

3 multipurpose ipa via conversational assistant 21

31 Introduction 21

311 Outline and Contributions 22

32 Model 23

321 OMB+ Design 23

322 Preprocessing 23

323 Natural Language Understanding 25

324 Dialogue Manager 26

325 Meta Policy 30

326 Response Generation 32

33 Evaluation 37

331 Automated Evaluation 37

332 Human Evaluation amp Error Analysis 37

34 Structured Dialogue Acquisition 39

35 Summary 39

36 Future Work 40

4 deep learning re-implementation of units 41

41 Introduction 41

411 Outline and Contributions 42

42 Related Work 43

43 BERT Bidirectional Encoder Representations from Trans-formers 44

44 Data amp Descriptive Statistics 46

45 Named Entity Recognition 48

451 Model Settings 49

46 Next Action Prediction 49

461 Model Settings 49

47 Evaluation and Results 50

48 Error Analysis 58

IX

[ September 4 2020 at 1017 ndash classicthesis ]

X contents

49 Summary 58

410 Future Work 59

ii clustering-based trend analyzer 61

5 foundations 63

51 Motivation and Challenges 63

52 Detection of Emerging Research Trends 64

521 Methods for Topic Modeling 65

522 Topic Evolution of Scientific Publications 67

53 Summary 68

6 benchmark for scientific trend detection 71

61 Motivation 71

611 Outline and Contributions 72

62 Corpus Creation 73

621 Underlying Data 73

622 Stratification 73

623 Document representations amp Clustering 74

624 Trend amp Downtrend Estimation 75

625 (Down)trend Candidates Validation 75

63 Benchmark Annotation 77

631 Inter-annotator Agreement 77

632 Gold Trends 78

64 Evaluation Measure for Trend Detection 79

65 Trend Detection Baselines 80

651 Configurations 81

652 Experimental Setup and Results 81

66 Benchmark Description 82

67 Summary 83

68 Future Work 83

7 discussion and future work 85

bibliography 87

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F F I G U R E S

Figure 1 Taxonomy of dialogue systems 10

Figure 2 Example of a conventional dialogue architecture 12

Figure 3 OMB+ online learning platform 24

Figure 4 Ruled-Based System Dialogue flow 36

Figure 5 Macro F1 evaluation NAP ndash default modelComparison between German and multilingualBERT 52

Figure 6 Macro F1 test NAP ndash default model Compari-son between German and multilingual BERT 53

Figure 7 Macro F1 evaluation NAP ndash extended model 54

Figure 8 Macro F1 test NAP ndash extended model 55

Figure 9 Macro F1 evaluation NER ndash Comparison be-tween German and multilingual BERT 56

Figure 10 Macro F1 test NER ndash Comparison between Ger-man and multilingual BERT 57

Figure 11 (Down)trend detection Overall distribution ofpapers in the entire dataset 73

Figure 12 A trend candidate (positive slope) Topic Sen-timent and Emotion Analysis (using Social Me-dia Channels) 76

Figure 13 A downtrend candidate (negative slope) TopicXML 76

XI

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F TA B L E S

Table 1 Natural language representation of a sentence 13

Table 2 Rule 1 Admissible configurations 27

Table 3 Rule 2 Admissible configurations 27

Table 4 Rule 3 Admissible configurations 27

Table 5 Rule 4 Admissible configurations 28

Table 6 Rule 5 Admissible configurations 28

Table 7 Case 1 ID completeness 31

Table 8 Case 2 ID completeness 31

Table 9 Showcase 1 Short flow 33

Table 10 Showcase 2 Organisational question 33

Table 11 Showcase 3 Fallback and human-request poli-cies 34

Table 12 Showcase 4 Contextual question 34

Table 13 Showcase 5 Long flow 35

Table 14 General statistics for conversational dataset 46

Table 15 Detailed statistics on possible systems actionsin conversational dataset 48

Table 16 Detailed statistics on possible named entitiesin conversational dataset 48

Table 17 F1-results for the Named Entity Recognition task 51

Table 18 F1-results for the Next Action Prediction task 51

Table 19 Average dialogue accuracy computed for theNext Action Prediction task 51

Table 20 Trend detection List of gold trends 79

Table 21 Trend detection Example of MAP evaluation 80

Table 22 Trend detection Ablation results 82

XII

[ September 4 2020 at 1017 ndash classicthesis ]

acronyms XIII

AI Artificial Intelligence

API Application Programming Interface

BERT Bidirectional Encoder Representations from Transformers

BiLSTM Bidirectional Long Short Term Memory

CRF Conditional Random Fields

DM Dialogue Manager

DST Dialogue State Tracker

DSTC Dialogue State Tracking Challenge

EC Equivalence Classes

ES ElasticSearch

GM Gaussian Models

IAA Inter Annotator Agreement

ID Informational Dictionary

IPA Intelligent Process Automation

ITS Intelligent Tutoring System

LDA Latent Dirichlet Allocation

LSA Latent Semantic Analysis

LSTM Long Short Term Memory

MAP Mean Average Precision

MER Mutually Exclusive Rules

NAP Next Action Prediction

NER Named Entity Recognition

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OMB+ Online Mathematik Bruumlckenkurs

PL Policy Learning

pLSA probabilistic Latent Semantic Analysis

REST Representational State Transfer

RNN Recurrent Neural Network

RPA Robotic Process Automation

RegEx Regular Expressions

TL Transfer Learning

ToDS Task-oriented Dialogue System

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

1I N T R O D U C T I O N

Robotic Process Automation (RPA) is a type of software bots that im-itates manual human activities like entering data into a system log-ging into accounts or executing simple but repetitive workflows (Mo-hanty and Vyas 2018) In other words RPA-systems allow businessapplications to work without human involvement by replicating hu-man worker activities A further benefit of RPA-bots is that they aremuch cheaper compared to the employment of a human worker andthey are also able to work around-the-clock Overall advantages ofsoftware bots are hugely appealing and include cost savings accuracyand compliance while executing processes improved responsiveness(ie bots are faster than human workers) and finally such bots areusually agile and multi-skilled (Ivancic Vugec and Vukšic 2019)

However one of the main benefits and at the same time ndash draw-backs of the RPA-systems is their ability to precisely fulfill the as-signed task On the one hand the bot can carry out the task accu-rately and diligently On the other hand it is highly susceptible tochanges in defined scenarios Being designed for a particular taskthe RPA-bot is often not adaptable to other domains or even simplechanges in a workflow (Mohanty and Vyas 2018) This inability toadapt to changing conditions gave rise to a further area of improve-ment for RPA-bots ndash Intelligent Process Automation (IPA) systems

IPA-bots combine Robotic Process Automation (RPA) with ArtificialIntelligence (AI) and thus can perform complex and more cognitivelydemanding tasks that require reasoning and language understand-ing Hence IPA-bots went beyond automating simple ldquoclick tasksrdquo (asin RPA) and can accomplish jobs more intelligently ndash employing ma-chine learning algorithms and advanced analytics Such IPA-systemsundertake time-consuming and routine tasks and thus enable smartworkflows and free up skilled workers to accomplish higher-valueactivities

Previously IPA techniques were primarily employed within the com-puter vision domain Starting with the automation of simple displayclick-tasks and going towards sophisticated self-driving cars withtheir ability to interpret a stream of situational data obtained fromsensors and cameras However in recent times Natural LanguageProcessing (NLP) became one of the potential applications for IPA aswell due to its ability to understand and interpret human languageNLP methods are used to analyze large amounts of textual data andto respond to various inquiries Furthermore if the available datais unstructured and does not have any predefined format (ie cus-tomer emails) or if it is available in a variable format (ie invoices

1

[ September 4 2020 at 1017 ndash classicthesis ]

2 introduction

legal documents) then the Natural Language Processing (NLP) tech-niques are applied to extract the multiple relevant information Thisdata then can be utilized to solve distinct problems (ie natural lan-guage understanding predictive analytics) (Mohanty and Vyas 2018)For instance in the case of predictive analytics such a bot wouldmake forecasts about the future using current and historical data andtherethrough assist with sales or investments

However NLP in the context of Intelligent Process Automation (IPA)is not limited to the extraction of relevant data and insights from tex-tual documents One of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots) that are used toenable human-to-machine interaction This type of software is com-monly used for customer support or within recommendation- andbooking systems Conversational agents gain enormous demand dueto their ability to simultaneously support a large number of userswhile communicating in a natural language In a conventional chat-bot system a user provides input in a natural language the bot thenprocesses this query to extract potentially useful information and re-sponds with a relevant reply This process comprises multiple stagesand involves different types of NLP subtasks starting with NaturalLanguage Understanding (NLU) (eg intent recognition named en-tity extraction) and going towards dialogue management (ie deter-mining the next possible bots action considering the dialogue his-tory) and response generation (eg converting the semantic repre-sentation of next bots action to a natural language utterance) Typicaldialogue system for Intelligent Process Automation (IPA) purposesusually undertakes shallow customer support requests (eg answer-ing of FAQs) allowing human workers to focus on more sophisticatedinquiries

Natural Language Processing (NLP) techniques may also be usedindirectly for IPA purposes There are attempts to automatically iden-tify and classify tasks extracted from textual process descriptions asmanual user or automated The goal of such an NLP applicationis to reduce the effort required to identify suitable candidates for fur-ther robotic process automation (Friedrich Mendling and Puhlmann2011 Leopold Aa and Reijers 2018)

Intelligent Process Automation (IPA) is an emerging technology andevolves mainly due to the recognized potential benefit of combiningRobotic Process Automation (RPA) and Artificial Intelligence (AI) tech-niques to solve industrial problems (Ivancic Vugec and Vukšic 2019Mohanty and Vyas 2018) Industries are typically interested in be-ing responsive to customers and markets (Mohanty and Vyas 2018)Thus most customer support applications need to be real-time activeand highly robust to achieve higher client satisfaction rates The man-ual accomplishment of such tasks is often time-consuming expensiveand error-prone Furthermore due to the massive amounts of textualdata available for analysis it seems not to be feasible to process thisdata entirely by human means

[ September 4 2020 at 1017 ndash classicthesis ]

11 outline 3

11 outline

This thesis covers two topics united by the broader theme which isIntelligent Process Automation (IPA) utilizing statistical Natural Lan-guage Processing (NLP) methods

the first block of the thesis (Chapter 2 ndash Chapter 4) addressesthe challenge of implementing a conversational agent for Intelli-gent Process Automation (IPA) purposes within the e-learningfield As mentioned above chatbots are one of the central ap-plications of the IPA domain since they can effectively performtime-consuming tasks requiring natural language understand-ing allowing human workers to concentrate on more valuabletasks The IPA conversational bot that was realized within thisthesis takes care of routine and time-consuming tasks performedby human tutors within an online mathematical course Thisbot is deployed in a real-world setting and interact with stu-dents within the OMB+ mathematical platform Conducting ex-periments we observed two possibilities to build the conversa-tional agent in industrial settings ndash first with purely rule-basedmethods considering the missing training data and individualaspects of the target domain (ie e-learning) Second we re-implemented two of the main system components (ie Natu-ral Language Understanding (NLU) and Dialogue Manager (DM)units) using the current state-of-the-art deep-learning algorithm(ie Bidirectional Encoder Representations from Transformers(BERT)) and investigated their performance and potential use asa part of a hybrid model (ie containing both rule-based andmachine learning methods)

the second block of the thesis (Chapter 5 ndash Chapter 6) con-siders an Intelligent Process Automation (IPA) problem withinthe predictive analytics domain and addresses the task of scien-tific trend detection Predictive analytics makes forecasts aboutfuture outcomes based on historical and current data Henceorganizations can benefit through the advanced analytics mod-els by detecting trends and their behaviors and then employ-ing this information while making crucial business decisions(ie investments) In this work we deal with the subproblemof trend detection task ndash specifically we address the lack ofpublicly available benchmarks for the evaluation of trend de-tection algorithms Therefore we assembled the benchmark forboth trends and downtrends (ie topics that become less frequentovertime) detection To the best of our knowledge the task ofdowntrend detection has not been well addressed before Fur-thermore the resulting benchmark is based on a collection ofmore than one million documents which is among the largestthat has been used for trend detection before and therefore of-fers a realistic setting for developing trend detection algorithms

[ September 4 2020 at 1017 ndash classicthesis ]

4 introduction

All main chapters contain their specific introduction contribution listrelated work section and summary

[ September 4 2020 at 1017 ndash classicthesis ]

Part I

E - L E A R N I N G C O N V E R S AT I O N A L A S S I S TA N T

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

2F O U N D AT I O N S

In this block we address the challenge of designing and implement-ing a conversational assistant for Intelligent Process Automation (IPA)purposes within the e-learning domain The system aims to supporthuman tutors of the Online Mathematik Bruumlckenkurs (OMB+) plat-form during the tutoring round by undertaking routine and time-consuming responsibilities Specifically the system intelligently auto-mates the process of information accumulation and validation thatprecede every conversation Therethrough the IPA-bot frees up tutorsand allows them to concentrate on more complicated and cognitivelydemanding tasks

In this chapter we introduce the fundamental concepts required bythe topic of the first part of the thesis We begin with the foundationsto conversational assistants in Section 22 where we present the taxon-omy of existing dialogue systems in Section 221 give an overview ofthe conventional dialogue architecture in Section 222 with its mostessential components and describe the existing approaches for build-ing a conversational agent in Section 223 We finally introduce thetarget domain for our experiments in Section 23

21 introduction

Conversational Assistants - a type of software that interacts with peo-ple via written or spoken natural language have become widely pop-ularized in recent times Todayrsquos conversational assistants support abroad range of everyday skills including but not limited to creatinga reminder answering factoid questions and controlling smart homedevices

Initially the notion of artificial companion capable of conversingin a natural language was anticipated by Alan Turing in 1949 (Tur-ing 1949) Still the first significant step in this direction was donein 1966 with the appearance of Weizenbaums system ELIZA (Weizen-baum 1966) Its successor was the ALICE (Artificial LinguisticInternet Computer Entity) ndash a natural language processing dialoguesystem that conversed with a human by applying heuristical patternmatching rules Despite being remarkably popular and technologi-cally advanced for their days both systems were unable to pass theTuring test (Alan 1950) Following a long silent period the nextwave of growth of intelligent agents was caused by the advances inNatural Language Processing (NLP) This was enabled by access tolow-cost memory higher processing speed vast amounts of available

7

[ September 4 2020 at 1017 ndash classicthesis ]

8 foundations

data and sophisticated machine learning algorithms Hence one ofthe most notable leaps in the history of conversational agents beganin the year 2011 with intelligent personal assistants In that time Ap-ples Siri followed by Microsofts Cortana and Amazons Alexa in 2014and then Google Assistant in 2016 came to the scene The main focusof these agents was not to mimic humans as in ELIZA or ALICEbut to assist a human by answering simple factoid questions and set-ting notification tasks across a broad range of content Another typeof conversational agent called Task-oriented Dialogue System (ToDS)became a focus of industries in the year 2016 For the most part thesebots aimed to support customer service (Cui et al 2017) and respondto Frequently Asked Questions (Liao et al 2018) Their conversa-tional style provided more depth than those of personal assistantsbut a narrower range since they were patterned for task solving andnot for open-domain discussions

One of the central advantages of conversational agents and the rea-son why they are attracting considerable interest is their ability togive attention to several users simultaneously while supporting nat-ural language communication Due to this conversational assistantsgain momentum in numerous kinds of practical support applications(ie customer support) where a high number of users are involved atthe same time Classical engagement of human assistants is very cost-efficient and such support is usually not full-time accessible There-fore to automate human assistance is desirable but also an ambi-tious task Due to the lack of solid Natural Language Understand-ing (NLU) advancing beyond primary tasks is still challenging (Lugerand Sellen 2016) and even the most prosperous dialogue systemsoften fail to meet userrsquos expectations (Jain et al 2018) Significantchallenges involved in the design and implementation of a conversa-tional agent varying from domain engineering to Natural LanguageUnderstanding (NLU) and designing of an adaptive domain-specificconversational flow The quintessential problems and how-to ques-tions while building conversational assistants include

bull How to deal with variability and flexibility of language

bull How to get high-quality in-domain data

bull How to build meaningful representations

bull How to integrate commonsense and domain knowledge

bull How to build a robust system

Besides that current Natural Language Processing (NLP) and Ar-tificial Intelligence (AI) techniques have weaknesses when applied todomain-specific task-oriented problems especially if underlying datacomes from an industrial source As machine learning techniquesheavily rely on structured and cleaned training data to deliver consis-tent performance typically noisy real-world data without access toadditional knowledge databases negatively impacts the classificationaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 9

Despite that conversational agents could be successfully employedto solve less cognitive but still highly relevant tasks In this case adialogue agent could be operated as a part of the Intelligent ProcessAutomation (IPA) system where it takes care of straightforward butroutine and time-consuming functions performed in a natural lan-guage while allowing a human worker to focus on more cognitivedemanding processes

22 conversational assistants

Development and design of a particular dialogue system usually varyby its objectives among the most fundamental are

bull Purpose Should it be an open-ended dialogue system (egchit-chat) task-oriented system (eg customer support) or apersonal assistant (eg Google Assistant)

bull Skills What kind of behavior should a system demonstrate

bull Data Which data is required to develop (resp train) specificblocks and how could it be assembled and curated

bull Deployment Will the final system be integrated into a partic-ular software platform What are the challenges in choosingsuitable tools How will the deployment happen

bull Implementation Should it be a rule-based information-retrievalmachine-learning or a hybrid system

In the following chapter we will take a look at the most commontaxonomies of dialogue systems and different design and implemen-tation strategies

221 Taxonomy

According to the goals conversational agents could be classified intotwo main categories task-oriented systems (eg for customer support)and social bots (eg for chit-chat purposes) Furthermore systemsalso vary regarding their implementation characteristics Below weprovide a detailed overview of the most common variations

task-oriented dialogue systems are famous for their abilityto lead a dialogue with the ultimate goal of solving a userrsquos problem(see Figure 1) A customer support agent or a flightrestaurant book-ing system exemplify this class of conversational assistants Suchsystems are more meaningful and useful as those with an entirely so-cial goal however they usually affect more complications during theimplementation stage Since Task-oriented Dialogue System (ToDS) re-quire a precise understanding of the userrsquos intent (ie goal) the Natu-ral Language Understanding (NLU) segment must perform flawlessly

[ September 4 2020 at 1017 ndash classicthesis ]

10 foundations

Besides that if applied in an industrial setting ToDS are usually builtmodular which means they mostly rely on handcrafted rules and pre-defined templates and are restricted in their abilities Personal assis-tants (ie Google Assistant) are members of the task-oriented familyas well Though compared to standard agents they involve more so-cial conversation and could be used for entertainment purposes (egldquoOk Google tell me a jokerdquo) The latter is typically not available inconventional task-oriented systems

social bots and chit-chat systems in contrast regularly donot hold any specific goal and are designed for small-talk and enter-tainment conversations (see Figure 1) Those agents are usually end-to-end trainable and highly data-driven which means they requirelarge amounts of training data However due to entirely learnabletraining such systems often produce inappropriate responses mak-ing them extremely unreliable for practical applications Alike chit-chat systems social bots are rich in social conversation however theypossess an ability to solve uncomplicated user requests and conducta more meaningful dialogue (Fang et al 2018) Those requests arenot the same as in task-oriented systems and are mostly pointed toanswer simple factoid questions

Figure 1 Taxonomy of dialogue systems according to the measure of theirtask accomplishment and social contribution Figure originatesfrom Fang et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 11

The aforementioned system types could be decomposed farther ac-cording to the implementation characteristics

open domain amp closed domain A Task-oriented Dialogue Sys-tem (ToDS) is typically built within a closed domain setting Thespace of potential inputs and outputs is restricted since the sys-tem attempts to achieve a specific goal Customer support orshopping assistants are cases of closed domain problems (Wenet al 2016) These systems do not need to be able to talk aboutoff-site topics but they have to fulfill their particular tasks asefficiently as possible

Whereas chit-chat systems do not assume a well-defined goalor intention and thus could be built within an open domainThis means the conversation can progress in any direction andwithout any or minimal functional purpose (Serban et al 2016)However the infinite number of various topics and the fact thata certain amount of commonsense is required to create reason-able responses makes an open-domain setting a hard problem

content-driven amp user-centric Social bots and chit-chat sys-tems are typically utilizing a content-driven dialogue mannerThat means that their responses are based on the large and dy-namic content derived from the daily web mining and knowl-edge graphs This approach builds a dialogue based on trendycontent from diverse sources and thus it is an ideal way forsocial bots to catch usersrsquo attention and bring a user into thedialogue

User-centric approaches in turn assume precise language un-derstanding to detect the sentiment of usersrsquo statements Ac-cording to the sentiment the dialogue manager learns usersrsquopersonalities handles rapid topic changes and tracks engage-ment In this case the dialogue builds on specific user interests

long amp short conversations Models with a short conversationstyle attempt to create a single response to a single input (egweather forecast) In the case with a long conversation a systemhas to go through multiple turns and to keep track of what hasbeen said before (ie dialogue history) The longer the conver-sation the more challenging it is to train a system Customersupport conversations are typically long conversational threadswith multiple questions

response generation One of the essential elements of a dialoguesystem is a response generation This could be either done in aretrieval-based manner or by using generative models (ie NaturalLanguage Generation (NLG))

The former usually utilizes predefined responses and heuristicsto select a relevant response based on the input and context Theheuristic could be a rule-based expression match (ie RegularExpressions (RegEx)) or an ensemble of machine learning clas-

[ September 4 2020 at 1017 ndash classicthesis ]

12 foundations

sifiers (eg entity extraction prediction of next action) Suchsystems do not generate any original text instead they select aresponse from a fixed set of predefined candidates These mod-els are usually used in an industrial setting where the systemmust show high robustness performance and interpretabilityof the outcomes

The latter in contrast do not rely on predefined replies Sincesuch models are typically based on (deep-) machine learningtechniques they attempt to generate new responses from scratchThis is however a complicated task and is mostly used forvanilla chit-chat systems or in a research domain where thestructured training data is available

222 Conventional Architecture

In this and the following subsection we review the pipeline and meth-ods for Task-oriented Dialogue System (ToDS) Such agents are typi-cally composed of several components (Figure 2) addressing diversetasks required to facilitate a natural language dialogue (WilliamsRaux and Henderson 2016)

Figure 2 Example of a conventional dialogue architecture with its maincomponents in the bounding box Language Understanding (NLU)Dialogue Manager (DM) Response Generation (NLG) Figure origi-nates from NAACL 2018 Tutorial Su et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 13

A conventional dialogue architecture implements this with the fol-lowing components

natural language understanding unit has the following ob-jectives it parses the user input classifies the domain (ie cus-tomer support restaurant booking not required for a singledomain systems) and intent (ie find-flight book-taxi) and ex-tracts entities (ie Italian food Berlin)

Generally NLU unit maps every user utterance into semanticframes that are predefined according to different scenarios Ta-ble 1 illustrates an example of sentence representation wherethe departure-location (Berlin) and the arrival-location (Munich)are specified as slot values and the domain (airline travel) andintent (find-flight) are specified as well Typically there are twokinds of possible representations The first one is the utterancecategory represented in the form of the userrsquos intent The sec-ond could be the word-level information extraction in the formof Named Entity Recognition (NER) or Slot Filling tasks

Sentence Find flight from Berlin to Munich today

SlotsConcepts O O O B-dept O B-arr B-date

Named Entity O O O B-city O B-city O

Intent Find_Flight

Domain Airline Travel

Table 1 An illustrative example of a sentence representation1

A dialog act classification sub-module also known as intent clas-sification is dedicated to detecting the primary userrsquos goal atevery dialogue state It labels each userrsquos utterance with oneof the predefined intents Deep neural networks can be directlyapplied to this conventional classification problem (KhanpourGuntakandla and Nielsen 2016 Lee and Dernoncourt 2016)

Slot filling is another sub-module for language understandingUnlike intent detection (ie classification problem) slot fillingis defined as a sequence labeling problem where words in thesequence are labeled with semantic tags The input is a se-quence of words and the output is a sequence of slots onefor every word Variations of Recurrent Neural Network (RNN)architectures (LSTM BiLSTM) (Deoras et al 2015) and (attention-based) encoder-decoder architectures (Kurata et al 2016) havebeen employed for this task

[ September 4 2020 at 1017 ndash classicthesis ]

14 foundations

dialogue manager unit is subdivided into two components ndashfirst is a Dialogue State Tracker (DST) that holds the conver-sational state and second is a Policy Learning (PL) that definesthe systems next action

Dialogue State Tracker (DST) is the core component to guaranteea robust dialogue flow since it manages the userrsquos intent at ev-ery turn The conventional methods which are broadly appliedin commercial implementations often adopt handcrafted rulesto pick the most probable candidate (Goddeau et al 1996 Wangand Lemon 2013) among the predefined Though a variety ofstatistical approaches have emerged since the Dialogue StateTracking Challenge (DSTC)2 was arranged by Jason D Williamsin year 2013 Among the deep-learning based approaches isthe Belief Tracker proposed by Henderson Thomson and Young(2013) It employs a sliding window to produce a sequence ofprobability distributions over an arbitrary number of possiblevalues (Chen et al 2017) In the work of Mrkšic et al (2016) au-thors introduced a Neural Belief Tracker to identify the slot-valuepairs The system used the user intents preceding the user in-put the user utterance itself and a candidate slot-value pairwhich it needs to decide about as the input Then the systemiterated overall candidate slot-value pairs to determine whichones have just been expressed by the user (Chen et al 2017)Finally Vodolaacuten Kadlec and Kleindienst proposed a Hybrid Di-alog State Tracker that achieved the state-of-the-art performancefor DSTC2 shared task Their model used a separate-learned de-coder coupled with a rule-based system

Based on the state representation from the Dialogue State Tracker(DST) the PL unit generates the next system action that is thendecoded by the Natural Language Generation (NLG) moduleUsually a rule-based agent is used to initialize the system (Yanet al 2017) and then supervised learning is applied to theactions generated by these rules Alternatively the dialoguepolicy can be trained end-to-end with reinforcement learningto manage the system making policies toward the final perfor-mance (Cuayaacutehuitl Keizer and Lemon 2015)

natural language generation unit transforms the next actioninto a natural language response Conventional approaches toNatural Language Generation (NLG) typically adopt a template-based style where a set of rules is defined to map every nextaction to a predefined natural language response Such ap-proaches are simple generally error-free and easy to controlhowever their implementation is time-consuming and the styleis repetitive and hardly scalable

Among the deep learning techniques is the work by Wen etal (2015) that introduced neural network-based approaches toNLG with an LSTM-based structure Later Zhou and Huang

2Currently Dialog System Technology Challenge

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 15

(2016) adopted this model along with the question informa-tion semantic slot values and dialogue act types to generatemore precise answers Finally Wu Socher and Xiong (2019)proposed Global to Local Memory Pointer Networks (GLMP) ndash anarchitecture that incorporates knowledge bases directly into alearning framework (due to the Memory Networks component)and is able to re-use information from the user input (due tothe Pointer Networks component)

In particular cases dialogue flow can include speech recognitionand speech synthesis units (if managing a spoken natural languagesystem) and connection to third-party APIs to request informationstored in external databases (eg weather forecast) (as depicted inFigure 2)

223 Existing Approaches

As stated above a conventional task-oriented dialogue system is im-plemented in a pipeline manner That means that once the systemreceives a user query it interprets it and acts according to the dia-logue state and the corresponding policy Additionally based on un-derstanding it may access the knowledge base to find the demandedinformation there Finally a system transforms a systemrsquos next ac-tion (and if given an extracted information) into its surface form as anatural language response (Monostori et al 1996)

Individual components (ie Natural Language Understanding (NLU)Dialogue Manager (DM) Natural Language Generation (NLG)) of aparticular conversational system could be implemented using differ-ent approaches starting with entirely rule- and template-based meth-ods and going towards hybrid approaches (using learnable componentsalong with handcrafted units) and end-to-end trainable machine learn-ing methods

rule-based approaches Though many of the latest researchapproaches handle Natural Language Understanding (NLU) and Nat-ural Language Generation (NLG) units by using statistical NaturalLanguage Processing (NLP) models (Bocklisch et al 2017 Burtsevet al 2018 Honnibal and Montani 2017) most of the industriallydeployed dialogue systems still utilise handcrafted features and rulesfor the state tracking action prediction intent recognition and slotfilling tasks (Chen et al 2017 Ultes et al 2017) So most of thePyDial3 framework modules offer rule-based implementations usingRegular Expressions (RegEx) rule-based dialogue trackers (Hender-son Thomson and Williams 2014) and handcrafted policies Alsoin order to map the next action to text Ultes et al (2017) suggest rulesalong with a template-based response generation

3PyDial Framework httpwwwcamdialorgpydial

[ September 4 2020 at 1017 ndash classicthesis ]

16 foundations

The rule-based approach ensures robustness and stable performancethat is crucial for industrial systems that communicate with a largenumber of users simultaneously However it is highly expensive andtime-consuming to deploy a real dialogue system built in that man-ner The major disadvantage is that the usage of handcrafted systemsis restricted to a specific domain and possible adaptation requiresextensive manual engineering

end-to-end learning approaches Due to the recent advanceof end-to-end neural generative models (Collobert et al 2011) manyefforts have been made to build an end-to-end trainable architecturefor conversational agents Rather than using the traditional pipeline(see Figure 2) the end-to-end model is conceived as a single module(Chen et al 2017)

Wen et al (2016) followed by Bordes Boureau and Weston (2016)were among the pioneers who adapted an end-to-end trainable ap-proach for conversational systems Their approach was to treat dia-logue learning as the problem of learning a mapping from dialoguehistories to system responses and to apply an encoder-decoder model(Sutskever Vinyals and Le 2014) to train the entire system The pro-posed model still had its significant limitations (Glasmachers 2017)such as missing policies to learn a dialogue flow and the inability toincorporate additional knowledge that is especially crucial for train-ing a task-oriented agent To overcome the first problem Dhingraet al (2016) introduced an end-to-end reinforcement learning approachthat jointly trains dialogue state tracker and policy learning unitsThe second weakness was addressed by Eric and Manning (2017)who augmented existing recurrent network architectures with a dif-ferentiable attention-based key-value retrieval mechanism over theentries of a knowledge base This approach was inspired by Key-Value Memory Networks (Miller et al 2016) Later on Wu Socherand Xiong (2019) proposed Global to Local Memory Pointer Networks(GLMP) where the authors incorporated knowledge bases directlyinto a learning framework In their model a global memory encoderand a local memory decoder are employed to share external knowl-edge The encoder encodes dialogue history modifies global con-textual representation and generates a global memory pointer Thedecoder first produces a sketch response with empty slots Next ittransfers the global memory pointer to filter the external knowledgefor relevant information and then instantiates the slots via the localmemory pointers

Despite having better adaptability compared to any rule-based sys-tem and being simpler to train end-to-end approaches remain unattain-able for commercial conversational agents that operating on real-world data A well and carefully constructed task-oriented dialoguesystem in an observed domain using handcrafted rules (in NLU andDST units) and predefined responses for NLG still outperforms the

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 17

end-to-end systems due to its robustness (Glasmachers 2017 WuSocher and Xiong 2019)

hybrid approaches Though end-to-end learning is an attrac-tive solution for conversational systems current techniques are data-intensive and require large amounts of dialogues to learn simple be-haviors To overcome this barrier Williams Asadi and Zweig (2017)introduce Hybrid Code Networks (HCNs) which is an ensemble ofretrieval and trainable units System utilizes an Recurrent NeuralNetwork (RNN) for Natural Language Understanding (NLU) blockdomain-specific knowledge that are encoded as software (ie API

calls) and custom action templates used for Natural Language Gen-eration (NLG) unit Compared to existing end-to-end methods theauthors report that their approach considerably reduces the amountof data required for training (Williams Asadi and Zweig 2017) Ac-cording to the authors HCNs achieve state-of-the-art performance onthe bAbI dialog dataset (Bordes Boureau and Weston 2016) and out-perform two commercial customer dialogue systems4

Along with this work Wen et al (2016) propose a neural network-based model that is end-to-end trainable but still modularly con-nected In their work authors treat dialogue as a sequence to se-quence mapping problem (Sutskever Vinyals and Le 2014) aug-mented with the dialogue history and the current database searchoutcome (Wen et al 2016) The authors explain the learning processas follows At every turn the system takes a user input representedas a sequence of tokens and converts it into two internal represen-tations a distributed representation of the intent and a probabilitydistribution over slot-value pairs called the belief state (Young et al2013) The database operator then picks the most probable values inthe belief state to form the search result At the same time the intentrepresentation and the belief state are transformed and combined bya Policy Learning (PL) unit to form a single vector representing thenext system action This system action is then used to generate asketch response token by the token The final system response iscomposed by substituting the actual values of the database entriesinto the sketch sentence structure (Wen et al 2016)

Hybrid models appear to replace the established rule- and template-based approaches currently utilized in an industrial setting The per-formance of the most Natural Language Understanding (NLU) com-ponents (ie Named Entity Recognition (NER) domain and intentclassification) is reasonably high to apply them for the developmentof commercial conversational agents Whereas the Natural LanguageGeneration (NLG) unit could still be realized as a template-based so-lution by employing predefined response candidates and predictingthe next system action employing deep learning systems

4Internal Microsoft Research datasets in customer support and troubleshootingdomains were employed for training and evaluation

[ September 4 2020 at 1017 ndash classicthesis ]

18 foundations

23 target domain amp task definition

MUMIE5 is an open-source e-learning platform for studying and teach-ing mathematics and computer science Online Mathematik Bruumlck-enkurs (OMB+) is one of the projects powered by MUMIE SpecificallyOnline Mathematik Bruumlckenkurs (OMB+)6 is a German online learningplatform that assists students who are preparing for an engineeringor computer science study at a university

The coursersquos central purpose is to assist students in reviving theirmathematical skills so that they can follow the upcoming universitycourses This online course is thematically segmented into 13 partsand includes free mathematical classes with theoretical and practi-cal content Every chapter begins with an overview of its sectionsand consists of explanatory texts with built-in videos examples andquick checks of understanding Furthermore various examinationmodes to practice the content (ie training exercises) and to checkthe understanding of the theory (ie quiz) are available Besidesthat OMB+ provides a possibility to get assistance from a human tu-tor Usually the students and tutors interact via a chat interface inwritten form The language of communication is German

The current dilemma of the OMB+ platform is that the number ofstudents grows significantly every year but it is challenging to findqualified tutors ndash and it is expensive to hire more human tutors Thisresults in a longer waiting period for students until their problemscan be considered and processed In general all student questionscan be grouped into three main categories Organizational questions(ie course certificate) contextual questions (ie content theorem) andmathematical questions (exercises solutions) To assist a student witha mathematical question a tutor has to know the following regularinformation What kind of topic (or subtopic) a student has a problemwith At which examination mode (quiz chapter level training or exer-cise section level training or exercise or final examination) a studentis working right now And finally the exact question number and exactproblem formulation This means that a tutor has to ask the same on-boarding questions every time a new conversation starts This is how-ever very time-consuming and could be successfully solved within anIntelligent Process Automation (IPA) system

The work presented in Chapter 3 and Chapter 4 was enabled by thecollaboration with MUMIE and Online Mathematik Bruumlckenkurs (OMB+)team and addresses the abovementioned problem Within Chapter 3we implemented an Intelligent Process Automation (IPA) dialogue sys-tem with rule- and retrieval-based algorithms that interacts with stu-dents and performs the onboarding This system is multipurposeon the one hand it eases the assistance process for human tutors byreducing repetitive and time-consuming activities and therefore al-lows workers to focus on more challenging tasks Second interacting

5Mumie httpswwwmumienet6httpswwwombplusde

[ September 4 2020 at 1017 ndash classicthesis ]

24 summary 19

with students it augments the resources with structured and labeledtraining data The latter in turn allowed us to re-train specific sys-tem components utilizing machine and deep learning methods Wedescribe this procedure in Chapter 4

We report that the earlier collected human-to-human data from theOMB+ platform could not be used for training a machine learning sys-tem since it is highly unstructured and contains no direct question-answer matching which is crucial for trainable conversational algo-rithms We analyzed these conversations to get insight from themthat we used to define the dialogue flow

We also do not consider the automatization of mathematical ques-tions in this work since this task would require not only preciselanguage understanding and extensive commonsense knowledge butalso the accurate perception of mathematical notations The tutor-ing process at OMB+ differentiates a lot from the standard tutoringmethods Here instructors attempt to navigate students by givinghints and tips instead of providing an out-of-the-box solution Exist-ing human-to-human dialogues revealed that such kind of task couldbe even challenging for human tutors since it is not always directlyperceptible what precisely the student does not understand or hasdifficulties with Furthermore dialogues containing mathematical ex-planations could last for a long time and the topic (ie intent andfinal goal) can shift multiple times during the conversation

To our knowledge this task would not be feasible to sufficientlysolve with current machine learning algorithms especially consider-ing the missing training data A rule-based system would not bethe right solution for this purpose as well due to a large number ofcomplex rules which would need to be defined Thus in this workwe focus on the automatization of less cognitively demanding taskswhich are still relevant and must be solved in the first instance to easethe tutoring process for instructors

24 summary

Conversational agents are a highly demanding solution for indus-tries due to their potential ability to support a large number of usersmulti-threaded while performing a flawless natural language conver-sation Many messenger platforms are created and custom chatbotsin various constellations emerge daily Unfortunately most of thecurrent conversational agents are often disappointing to use due tothe lack of substantial natural language understanding and reason-ing Furthermore Natural Language Processing (NLP) and ArtificialIntelligence (AI) techniques have limitations when applied to domain-specific and task-oriented problems especially if no structured train-ing data is available

Despite that conversational agents could be successfully applied tosolve less cognitive but still highly relevant tasks ndash that is within the

[ September 4 2020 at 1017 ndash classicthesis ]

20 foundations

Intelligent Process Automation (IPA) domain Such IPA-bots wouldundertake tedious and time-consuming responsibilities to free up theknowledge worker for more complex and intricate tasks IPA-botscould be seen as a member of goal-oriented dialogue family and arefor the most part performed either in a rule-based manner or throughhybrid models when it comes to a real-world setting

[ September 4 2020 at 1017 ndash classicthesis ]

3M U LT I P U R P O S E I PA V I A C O N V E R S AT I O N A LA S S I S TA N T

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research describedin this chapter was carried out in its entirety by the au-thor of this thesis The other authors of the publicationacted as advisers At the time of writing this thesis thesystem described in this work was deployed at the OnlineMathematik Bruumlckenkurs (OMB+) platform

As we outlined in the introduction Intelligent Process Automa-tion (IPA) is an emerging technology with the primary goal of assist-ing the knowledge worker by taking care of repetitive routine andlow-cognitive tasks Conversational assistants that interact with usersin a natural language are a potential application for the IPA domainSuch smart virtual agents can support the user by responding to var-ious inquiries and performing routine tasks carried out in a naturallanguage (ie customer support)

In this work we tackle the challenge of implementing an IPA con-versational agent in a real-world industrial context and conditions ofmissing training data Our system has two meaningful benefits Firstit decreases monotonous and time-consuming tasks and hence letsworkers concentrate on more intellectual processes Second interact-ing with users it augments the resources with structured and labeledtraining data The latter in turn can be used to retrain a systemutilizing machine and deep learning methods

31 introduction

Conversational dialogue systems can give attention to several userssimultaneously while supporting natural communication They arethus exceptionally needed for practical assistance applications wherea high number of users are involved at the same time Regularly di-alogue agents require a lot of natural language understanding andcommonsense knowledge to hold a conversation reasonably and in-telligently Therefore nowadays it is still challenging to substitutehuman assistance with an intelligent agent completely However adialogue system could be successfully employed as a part of the In-telligent Process Automation (IPA) application In this case it wouldtake care of simple but routine and time-consuming tasks performed

21

[ September 4 2020 at 1017 ndash classicthesis ]

22 multipurpose ipa via conversational assistant

in a natural language while allowing a human worker to focus onmore cognitive demanding processes

Recent research in the dialogue generation domain is conducted byemploying Artificial Intelligence (AI) techniques like machine- anddeep learning (Lowe et al 2017 Wen et al 2016) However thosemethods require high-quality data for training that is often not di-rectly available for domain-specific and industrial problems Eventhough companies gather and store vast volumes of data it is of-ten low-quality with regards to machine learning approaches (Daniel2015) Especially if it concerns dialogue data which has to be prop-erly structured as well annotated Whereas if trained on artificiallycreated datasets (which is often the case in the research setting) thesystems can solve only specific problems and are hardly adaptableto other domains Hence despite the popularity of deep learningend-to-end models one still needs to rely on traditional pipelines inpractical dialogue engineering mainly while introducing a new do-main

311 Outline and Contributions

This work addresses the challenge of implementing a dialogue systemfor IPA purposes within the practical e-learning domain under condi-tions of missing training data The chapter is structured as followsIn Section 32 we introduce our method and describe the design ofthe rule-based dialogue architecture with its main components suchas Natural Language Understanding (NLU) Dialogue Manager (DM)and Natural Language Generation (NLG) units In Section 33 wepresent the results of our dual-focused evaluation and Section 34 de-scribes the format of collected structured data Finally we concludein Section 35

Our contributions within this work are as follows

bull We implemented a robust conversational system for IPA pur-poses within a practical e-learning domain and under the con-ditions of missing training (ie dialogue) data

bull The system has two main objectives

ndash First it reduces repetitive and time-consuming activitiesand allows workers of the e-learning platform to focussolely on mathematical inquiries

ndash Second by interacting with users it augments the resourceswith structured and partially labeled training data for fur-ther implementation of trainable dialogue components

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 23

32 model

The central purpose of the introduced system is to interact with stu-dents at the beginning of every chat conversation and gather informa-tion on the topic (and sub-topic) examination mode and level questionnumber and exact problem formulation Such onboarding process savestime for human tutors and allows them to handle mathematical ques-tions solely Furthermore the system is realized in a way such that itaccumulates labeled and structured dialogues in the background

Figure 4 displays the intact conversation flow After the systemaccepts user input it analyzes it at different levels and extracts infor-mation if such is provided If some of the information required fortutoring is missing the system asks the student to provide it Whenall the information points are collected they are automatically vali-dated and forwarded to a human tutor The latter can then directlyproceed with the assistance process In the following we explain thekey components of the system

321 OMB+ Design

Figure 3 illustrates the internal structure and design of the OMB+ plat-form It has topics and sub-topics as well as four various examinationmodes Each topic (Figure 3 tag 1) correspond to a chapter level and al-ways has sub-topics (Figure 3 tag 2) that correspond to a section levelExamination modes training and exercise are ambiguous because theycorrespond to either a chapter (Figure 3 tag 3) or a section (Figure 3tag 5) level and it is important to differentiate between them sincethey contain different types of content The mode final examination(Figure 3 tag 4) always corresponds to a chapter level whereas quiz(Figure 3 tag 5) can belong only to a section level According to thedesign of the OMB+ platform there are several ways of how a possibledialogue flow can proceed

322 Preprocessing

In a natural language conversation one may respond in various formsmaking the extraction of data from user-generated text challengingdue to potential misspellings or confusable spellings (ie Aufgabe 1aAufgabe 1 (a)) Hence to facilitate a substantial recognition of intentsand extraction of entities we normalize (eg correct misspellingsdetect synonyms) and preprocess every user input before moving for-ward to the Natural Language Understanding (NLU) module

[ September 4 2020 at 1017 ndash classicthesis ]

24 multipurpose ipa via conversational assistant

Figure 3 OMB+ Online Learning Platform where 1 is the Topic (corre-sponds to a chapter level) 2 is a Sub-Topic (corresponds to a sectionlevel) 3 is chapter level examination mode 4 is the Final Exam-ination Mode (available only for chapter level) 5 are the Examina-tion Modes Exercise Training (available at section levels) and Quiz(available only at section level) and 6 is the OMB+ Chat

We perform both linguistic and domain-specific preprocessing whichinclude the following steps

bull lowercasing and stemming of words in the input query

bull removal of stop words and punctuation

bull all mentions of X in mathematical formulations are removed toavoid confusion with roman number 10 (ldquoXrdquo)

bull in a combination of the type word ldquoKapitelAufgaberdquo + digitwritten as a word (ie ldquoeinsrdquo ldquozweitesrdquo etc) word is replacedwith a digit (ldquoKapitel einsrdquo rarr ldquoKapitel 1rdquo ldquoim fuumlnften Kapitelrdquo rarrldquoim Kapitel 5rdquo) roman numbers are replaced with a digit as well(ldquoKapitel IVrdquorarr ldquoKapitel 4rdquo)

bull detected ambiguities are normalized (eg ldquoTrainingsaufgaberdquorarrldquoTrainingrdquo)

bull recognized misspellings resp type errors are corrected (egldquoDifeernzialrechnungrdquorarr ldquoDifferentialrechnungrdquo)

bull permalinks are parsed and analyzed From each permalink it ispossible to extract topic examination mode and question num-ber

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 25

323 Natural Language Understanding

We implement the Natural Language Understanding (NLU) unit uti-lizing handcrafted rules Regular Expressions (RegEx) and Elasctic-search1 API

intent classification According to the design of the OMB+platform all student questions can be classified into three cate-gories organizational questions (ie course certificate) contextualquestions (ie content theorem) and mathematical questions (exer-cises solutions) To classify the input query by its intent (ie cat-egory) we employ weighted key-word information in the formof handcrafted rules We assume that specific words are explic-itly associated with a corresponding intent (eg theorem or rootdenote the mathematical question feedback or certificate denoteorganizational inquire) If no intent could be ordered then itis assumed that the NLU unit was not capable of understand-ing and the intent is unknown In this case the virtual agentrequests the user to provide an intent manually by picking onefrom the mentioned three options The queries from organiza-tional and theoretical categories are directly handed over to ahuman tutor while mathematical questions are analyzed by theautomated agent for further information extraction

entity extraction On the next step the system retrieves the en-tities from a user message We specify five following prerequi-site entities which have to be collected before the dialogue canbe handed over to a human tutor topic sub-topic examina-tion mode and level and question number This part is imple-mented using ElasticSearch (ES) and RegEx To facilitate the useof ES we indexed the OMB+ web-page to an internal databaseBesides indexing the titles of topics and sub-topics we also pro-vided supplementary information on possible synonyms andwriting styles We additionally filed OMB+ permalinks whichdirect to the site pages To query the resulting database we em-ploy the internal Elasticsearch multi match function and set theminimum should match parameter to 20 This parameter spec-ifies the number of terms that must match for a document tobe considered relevant Besides that we adopted fuzziness withthe maximum edit distance set to 2 characters The fuzzy queryuses similarity based on Levenshtein edit distance (Levenshtein1966) Finally the system produces a ranked list of potentialmatching entries found in the database within the predefinedrelevance threshold (we set it to θ=15) We then pick the mostprobable entry as the right one and select the correspondingentity from the user input

Overall the Natural Language Understanding (NLU) module re-ceives the user input as a preprocessed text and examines it across all

1httpswwwelasticcoproductselasticsearch

[ September 4 2020 at 1017 ndash classicthesis ]

26 multipurpose ipa via conversational assistant

predefined RegEx statements and for a match in the ElasticSearch (ES)database We use ES to retrieve entities for the topic sub-topic andexamination mode whereas RegEx is used to extract the informationon a question number Every time the entity is extracted it is filled inthe Informational Dictionary (ID) The ID has the following six slotsto be filled in topic sub-topic examination level examination modequestion number and exact problem formulation (this information isrequested on further steps)

324 Dialogue Manager

A conventional Dialogue Manager (DM) consists of two interactingcomponents The first component is the Dialogue State Tracker (DST)that maintains a representation of the current conversational stateand the second is the Policy Learning (PL) that defines the next systemaction (ie response)

In our model every agentrsquos next action is determined by the stateof the previously obtained information accumulated in the Informa-tional Dictionary (ID) For instance if the system recognizes that thestudent works on the final examination it also knows (defined by hand-coded logic in the predefined rules) that there is no necessity to askfor sub-topic because the final examination always corresponds to achapter level (due to the layout of OMB+ platform) If the system rec-ognizes that the user struggles in solving quiz it has to ask for boththe corresponding topic and sub-topic because the quiz always relatesto a section level

To discover all of the potential conversational flows we implementMutually Exclusive Rules (MER) which indicate that two events e1and e2 are mutually exclusive or disjoint since they cannot both oc-cur at the same time (ie the intersection of these events is emptyP(A cap B) = 0) Additionally we defined transition and mapping rulesHence only particular entities can co-occur while others must beexcluded from the specific rule combination Assume a topic t =

Geometry According to MER this topic can co-occur with one of thepertaining sub-topics defined at the OMB+ (ie s = Angel) Thus thelevel of the examination would be l = Section (but l 6= Chapter)because the student already reported a sub-section he or she is work-ing on In this specific case the examination mode could either bee isin [TrainingExerciseQuiz] but not e = Final_ExaminationThis is because e = Final_Examination can co-occur only with atopic (chapter-level) but not with a sub-topic (section-level) The ques-tion number q could be arbitrary and has to co-occur with every othercombination

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 27

This allows the formal solution to be found Assume a list of alltheoretically possible dialogue states S = [topic sub-topic trainingexercise chapter level section level quiz final examination questionnumber] and for each element sn in S is true that

sn =

1 if information is given

0 otherwise(1)

This gives all general (resp possible) dialogue states without ref-erence to the design of the OMB+ platform However to make thedialogue states fully suitable for the OMB+ from the general stateswe select only those which are valid To define the validness of thestate we specify the following five MERrsquos

R1 Topic

not T T

Table 2 Rule 1 ndash Admissible topic configurations

Rule (R1) in Table 2 denotes admissible configurations for topic andmeans that the topic could be either given (T ) or not (notT )

R2 Examination Mode

not TR not E not Q not FE

TR not E not Q not FE

not TR E not Q not FE

not TR not E Q not FE

not TR not E not Q FE

Table 3 Rule 2 ndash Admissible examination mode configurations

Rule (R2) in Table 3 denotes that either no information on the ex-amination mode is given or examination mode is Training (TR) orExercise (E) or Quiz (Q) or Final Examination (FE) but not more thanone mode at the same time

R3 Level

not CL not SL

CL not SL

not CL SL

Table 4 Rule 3 ndash Admissible level configurations

Rule (R3) in Table 4 indicates that either no level information isprovided or the level corresponds to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

[ September 4 2020 at 1017 ndash classicthesis ]

28 multipurpose ipa via conversational assistant

R4 Examination amp Level

not TR not E not SL not CL

TR not E SL not CL

TR not E not SL CL

TR not E not SL not CL

not TR E SL not CL

not TR E not SL CL

not TR E not SL not CL

Table 5 Rule 4 ndash Admissible examination mode (only for Training and Exer-cise) and corresponding level configurations

Rule (R4) in Table 5 means that Training (TR) and Exercise (E) ex-amination modes can either belong to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

R5 Topic amp Sub-Topic

not T not ST

T not ST

not T ST

T ST

Table 6 Rule 5 ndash Admissible topic and corresponding sub-topic configura-tions

Rule (R5) in Table 6 symbolizes that we could be either given onlya topic (T ) or the combination of topic and sub-topic (ST ) at the sametime or only sub-topic or no information on this point at all

We then define a valid dialogue state as a dialogue state that meetsall requirements of the abovementioned rules

foralls(State(s)and R1minus5(s))rarr Valid(s) (2)

Once we have the valid states for dialogues we perform a mappingfrom each valid dialogue state to the next possible systems action Forthat we first define five transition rules The order of transition rulesis important

T1 = notT (3)

means that no topic (T ) is found in the ID (ie could not be ex-tracted from user input)

T2 = notEM (4)

indicates that no examination mode (EM) is found in the ID

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 29

T3 = EM where EM isin [TRE] (5)

denotes that the extracted examination mode (EM) is either Train-ing (TR) or Exercise (E)

T4 =

T notST TR SL

T notST E SL

T notST Q

(6)

means that no sub-topic (ST ) is provided by a user but ID either al-ready contains the combination of topic (T ) training (TR) and sectionlevel (SL) or the combination of topic exercise (E) and section levelor the combination of topic and quiz (Q)

T5 = notQNR (7)

indicates that no question number (QNR) was provided by a stu-dent (or could not be successfully extracted)

Finally we assumed the list of possible next actions for the system

A =

Ask for Topic

Ask for Examination Mode

Ask for Level

Ask for Sub-Topic

Ask for Question Nr

(8)

Following the transition rules we map each valid dialogue state tothe possible next action am in A

existaforalls(Valid(s)and T1(s))rarr a(s T) (9)

in the case where we do not have any topic provided the nextaction is to ask for the topic (T )

existaforalls(Valid(s)and T2(s))rarr a(s EM) (10)

if no examination mode is provided by a user (or it could not besuccessfully extracted from the user query) the next action is definedas ask for examination mode (EM)

existaforalls(Valid(s)and T3(s))rarr a(s L) (11)

in case where we know the examination mode EM isin [TrainingExercise] we have to ask about the level (ie training at chapter levelor training at section level) thus the next action is ask for level (L)

[ September 4 2020 at 1017 ndash classicthesis ]

30 multipurpose ipa via conversational assistant

existaforalls(Valid(s)and T4(s))rarr a(s ST) (12)

if no sub-topic is provided but the examination mode EM isin [Train-ing Exercise] at section level the next action is defined as ask for sub-topic (ST )

existaforalls(Valid(s)and T5(s))rarr a(s QNR) (13)

if no question number is provided by a user then the next action isask for question number (QNR)

Following this scheme we generate 56 state transitions which spec-ify the next system actions Being on a new conversation state wecompare the extracted (or updated) data in the Informational Dictio-nary (ID) with the valid dialogue states and select the mapped actionas the next systemrsquos action

flexibility of dst The DST transitions outlined above are meantto maintain the current configuration of the OMB+ learning platformStill further MERs could be effortlessly generated to create new tran-sitions Examining this we performed tests with the previous de-sign of the platform where all the topics except for the first onehad sub-topics Whereas the first topic contained both sub-topics andsub-sub-topics We could produce the missing transitions with the de-scribed approach The number of permissible transitions in this caseincreased from 56 to 117

325 Meta Policy

Since the proposed system is expected to operate in a real-world set-ting we had to develop additional policies that manage the dialogueflow and verify the systemrsquos accuracy We explain these policies be-low

completeness of informational dictionary The model au-tomatically verifies the completeness of the Informational Dic-tionary (ID) which is determined by the number of essentialslots filled in the informational dictionary There are 6 discretecases when the ID is deemed to be complete For example ifa user works on a final examination mode the agent does nothas to request a sub-topic or examination level Hence the ID

has to be filled only with data for a topic examination typeand question number Whereas if the user works on trainingmode the system has to collect information about the topic sub-topic examination level examination mode and the questionnumber Once the ID is intact it is provided to the verificationstep Otherwise the agent proceeds according to the next ac-tion The system extracts the information in each dialogue stateand therefore if the user gives updated information on any sub-

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 31

ject later in the dialogue history the corresponding slot will beupdated in the ID

Below are examples of two final cases (out of six) where Infor-mational Dictionary (ID) is considered to be complete

Case1 =

intent = Math

topic 6= None

exam mode = Final Examination

level = None

question nr 6= None

(14)

Table 7 Case 1 ndash Any topic examination mode is final examination exami-nation level does not matter any question number

Case2 =

intent = Math

topic 6= None

sub-topic 6= None

exam mode isin [Training Exercise]

level = Section

question nr 6= None

(15)

Table 8 Case 2 Any topic any related sub-topic examination mode is ei-ther training or exercise examination level is section any questionnumber

verification step After the system has collected all the essentialdata (ie ID is complete) it continues to the final verification stepHere the obtained data is presented to the current student inthe dialogue session The student is asked to review the exactnessof the data and if any entries are faulty to correct them TheInformational Dictionary (ID) is where necessary updated withthe user-provided data This procedure repeats until the userconfirms the correctness of the compiled data

fallback policy In the cases where the system fails to retrieveinformation from a studentrsquos query it re-asks a user and at-tempts to retrieve information from a follow-up response Themaximum number of re-ask trials is a parameter and set to threetimes (r = 3) If the system is still incapable of extracting in-formation the user input is considered as the ground truth andfilled to the appropriate slot in ID An exception to this ruleapplies where the user has to specify the intent manually Afterthree unclassified trials a dialogues session is directly handedover to a human tutor

[ September 4 2020 at 1017 ndash classicthesis ]

32 multipurpose ipa via conversational assistant

human request In every dialogue state a user can shift to a hu-man tutor For this a user can enter the ldquohumanrdquo (or ldquoMenschrdquoin German language) key-word Therefore every user messageis investigated for the presence of this key-word

326 Response Generation

The semantic representation of the systems next action is transformedinto a natural language utterance Hence each possible action ismapped to precisely one response that is predefined in the templateSome of the responses are fixed (ie ldquoWelches Kapitel bearbeitest dugeraderdquo) 2 Others have placeholders for custom values In the lattercase the utterance can be formulated dependent on the InformationalDictionary (ID)

Assume predefined response ldquoDeine Angaben waren wie folgt a)Kapitel K b) Abschnitt A c) Aufgabentyp M d) Aufgabe Q e)Ebene E Habe ich dich richtig verstanden Bitte antworte mit Ja oderNeinrdquo3 The slots [KAMQE] are the placeholders for the informa-tion stored in the ID During the conversation the system selects thetemplate according to the next action and fills the slots with extractedvalues if required After the extracted information is filled in the cor-responding placeholders in the utterance the final response could beas follows ldquoDeine Angaben waren wie folgt a) Kapitel I ElementaresRechnen b) Abschnitt Rechenregel fuumlr Potenzen c) Aufgabentyp Quiz d)Aufgabe 1(a) e) Ebene Abschnitt Habe ich dich richtig verstanden Bitteantworte mit Ja oder Neinrdquo4

Below we demonstrate five sample dialogues in original Germanlanguage with variable flows

bull short dialogue where certain information is already providedin the first studentrsquos input

bull organizational question case

bull contextual question case

bull example with the manual intent classification

bull example with fallback and human policies

bull long flow with entry correction on the final verification phase

2Translates to ldquoWhich chapter are you currently working onrdquo3Translates to ldquoYour details were as follows a) Chapter b) Section c) Examina-

tion Mode d) Question Number e) Level Did I understand you correctly Pleaseanswer yes or no

4Translates to ldquoYour details were as follows a) Elementary Calculus b) PowerLaws c) Quiz d) Question 1 (a) e) Section Level Did I understand you correctlyPlease answer yes or no

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 33

Student Hallo ich habe Schwierigkeiten beim Kapitel I Elementares Rech-nen mit der Uumlbungsaufgabe 1 a

Next-Action [Ask for a Level Chapter or Section]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel ldquoGeometrierdquo) oder um eine Aufgabe auf der Ebene einesAbschnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt ldquoWinkelrdquoim Kapitel ldquoGeometrierdquo) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlrAbschnitt

Student Kapitel

Next-Action [Final-Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Aufgabentyp Exercisec) Aufgabe 1ad) Ebene KapitelHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yep

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 9 Showcase 1 Short flow most of the information was extracted fromthe initial user input In bold are the extracted entities

Student Hallo ich habe eine Frage bzgl eines Zertifikates und zwar wiekann ich es austellen lassen

Next-Action [Organisational Question]Bot Es scheint sich wohl um eine organisatorische Frage zu handeln Bittefasse deine Frage in einem kurzen Text zusammen damit ich sie an meinenmenschlichen Kollegen weiterleiten kann

Student Wie bekomme ich ein Zertifikat

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 10 Showcase 2 Organisational Question Underlined are the key-words which point on the organisational intent

[ September 4 2020 at 1017 ndash classicthesis ]

34 multipurpose ipa via conversational assistant

Student hallo

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student ma

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student math

Next-Action [Ask for Topic]Bot Welches Kapitel bearbeitest du gerade

Student Lineare Gleichungssysteme

Next-Action [Ask for Examination Mode]Bot Bearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfung

Student human

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 11 Showcase 3 Manual intent classification Fallback and Human Re-quest policies (in bold type)

Student Halli-Hallo koumlnnte mir bitte jemand die Regel 72 (vertikale streck-ung und stauchung des graphen) erklaumlren Danke im Voraus

Next-Action [Contextual Question]Bot Es scheint sich wohl um eine inhaltliche Frage zu handeln Bitte fassedeine Frage in einem kurzen Text zusammen damit ich sie an meinen men-schlichen Kollegen weiterleiten kann

Student Ich habe Schwierigkeiten beim Verstehen der Regel 72 Was istgenau die vertikale Streckung des Graphen

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 12 Showcase 4 Contextual Question Underlined are the key-wordswhich point on the contextual intent

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 35

Student Hallo ich kann beim Kapitel 1 nicht weiterkommen Trainingsauf-gabe 1 (a)

Next-Action [Ask for Level]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel Geometrie) oder um eine Aufgabe auf der Ebene eines Ab-schnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt Winkel imKapitel Geometrie) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlr Ab-schnitt

Student ich glaube es ist eine Sektion

Next-Action [Ask for Sub-Topic]Bot Wie heiszligt der Abschnitt in dem sich die Aufgabe befindet auf die sichdeine Frage bezieht

Student ich bearbeite rechenregeln und potenzen

Next-Action [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Quizd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student nope

Next-Action [Ask_For]Bot Welche der folgenden Punkten habe ich falsch erkannt Schreibe bittewelche der folgenden Punkte es sind a b c d e

Student c

State-Transition [Correct-Fields]Bot Gib bitte die korrekte Information fuumlr c) ein

Student Training

State-Transition [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Trainingd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yes

State-Transition [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 13 Showcase 5 Long Flow Correction of entries In bold type are theextracted entities

[ September 4 2020 at 1017 ndash classicthesis ]

36 multipurpose ipa via conversational assistant

Figure 4 Dialogue Flow Abbreviations UNK - unknown ID - informa-tional dictionary RegEx - regular expressions

[ September 4 2020 at 1017 ndash classicthesis ]

33 evaluation 37

33 evaluation

In order to get the feedback on the quality functionality and use-fulness of the proposed model we evaluated it in two steps firstwith an automated method (Section 331) utilizing 130 manually an-notated conversations to verify the robustness of the system Secondwith the help of human tutors (Section 332) from OMB+ to exam-ine the user experience We describe the details as well as the mostcommon errors below

331 Automated Evaluation

To manage an automated evaluation we manually assembled a datasetwith 130 conversational flows We predefined viable initial user ques-tions user answers as well as gold system responses These flowscover most frequent dialogues that we previously observed in human-to-human dialogue data We examined our system by implementinga self-chatting evaluation bot The evaluation cycle is as follows

bull The system accepts the first question from a predefined dia-logue via an API request preprocesses and analyzes it to de-termine the intent and extract entities

bull Then it estimates the appropriate next action and responds ac-cordingly

bull This response is then compared to the systems gold answerIf the predicted result is correct then the system receives thenext predefined user input from the templated dialogue and re-sponds again as defined above This procedure proceeds untilthe dialogue terminates (ie ID is complete) Otherwise thesystem reports an unsuccessful case number

The system successfully passed this evaluation for all 130 cases

332 Human Evaluation amp Error Analysis

To evaluate the system on irregular cases we carried out experimentswith human tutors from the OMB+ platform The tutors are experi-enced regarding rare or complex issues ambiguous responses mis-spellings and other infrequent but still relevant problems which oc-cur during a natural language dialogue In the following we reviewsome common errors and describe additional observations

misspellings and confusable spelling frequently befall in theuser-generated text Since we attempt to let the conversationremain natural to provide better user experience we do notinstruct students for formal writing Therefore we have to ac-count for various writing issues One of the prevalent problems

[ September 4 2020 at 1017 ndash classicthesis ]

38 multipurpose ipa via conversational assistant

are misspellings German words are commonly long and canbe complex and because users often type quickly it can lead tothe wrong order of characters within a given word To tacklethis challenge we utilized fuzzy match within ES Though themaximum allowed edit distance in ElasticSearch (ES) is set to 2characters which means that all the misspellings beyond thisthreshold could not be correctly identified by ES (eg Differenzial-rechnung vs Differnetialrechnung) Another representative exam-ple would be the formulation of the section or question num-ber The equivalent information can be communicated in sev-eral discrete ways which has to be considered by RegEx unit(eg Aufgabe 5 a Aufgabe V a Aufgabe 5 (a)) A similar prob-lem occurs with confusable spelling (ie Differenzialrechnung vsDifferenzialgleichung)

We analyzed the cases stated above and improved our modelby adding some of the most common misspellings to the Elas-ticSearch (ES) database or handling them with RegEx during thepreprocessing step

elasticsearch threshold We observed both cases where thesystem failed to extract information although the user providedit and examples where ES extracted information not mentionedin a user query According to our understanding these errorsoccur due to the relevancy scoring algorithm of ElasticSearch (ES)where a documentrsquos score is a combination of textual similarityand other metadata based scores Our examination revealedthat ES mostly fails to extract the information if the user mes-sage is rather short (eg about 5 tokens) To overcome this diffi-culty we coupled the current input ut with the dialogue history(h = utmiddotmiddotmiddot tminus2) That eliminated the problem and improved theretrieval quality

To solve the opposite problem where ES extracts false infor-mation (or information that was not mentioned in a query)was more challenging We learned that the problem comesfrom short words or sub-words (ie suffix prefix) which ES

locates in a database and considers them credible enough TheES documentation suggests getting rid of stop words to elim-inate this behavior However this did not improve the searchAlso fine-tuning of ES parameters such as the relevance thresholdprefix length5 and minimum should match6 parameter did not re-sult in notable improvements To cope with this problem weimplemented a verification step where a user can edit the erro-neously retrieved data

5The amount of fuzzified characters This parameter helps to decrease the num-ber of terms which must be reviewed

6Indicates a number of terms that must match for a document to be consideredrelevant

[ September 4 2020 at 1017 ndash classicthesis ]

34 structured dialogue acquisition 39

34 structured dialogue acquisition

As we stated the implemented system not only supports the humantutor by assisting students but also collects structured and labeledtraining data in the background In a trial run of the rule-based sys-tem we accumulated a toy-dataset with dialogues The assembleddataset includes the following

bull Plain dialogues with unique dialogue-indexes

bull Plain ID information collected for the whole conversation

bull Pairs of questions (ie user requests) and responses (ie botreplies) with the unique dialogue- and turn-indexes

bull Triples in the form of (Question Next Action Response) Informa-tion on the next systemrsquos action could be employed to train aDM unit with (deep-) machine learning algorithms

bull On every conversational state we keep the entities that the sys-tem was able to extract along with their position in the utter-ance This information could be used to train a custom domain-specific NER model

35 summary

In this work we realized a conversational agent for IPA purposesthat addresses two problems First it decreases repetitive and time-consuming activities which allows workers of the e-learning plat-form to focus solely on mathematical and hence cognitively demand-ing questions Second by interacting with users it augments theresources with structured and labeled training data The latter canbe utilized for further implementation of learnable dialogue compo-nents

The realization of such a system was connected with multiple chal-lenges Among others were missing structured conversational dataambiguous or erroneous user-generated text and the necessity todeal with existing corporate tools and their design

The proposed conversational agent enabled us to accumulate struc-tured labeled data without any special efforts from the human (ietutors) side (eg manual annotation and post-processing of existingdata change of the conversational structure within the tutoring pro-cess) Once we collected structured dialogues we were able to re-train particular segments of the system with deep learning methodsand achieved consistent performance for both of the proposed tasksWe report the re-implemented elements in Chapter 4

At the time of writing this thesis the system described in this workwas deployed at the OMB+ platform

[ September 4 2020 at 1017 ndash classicthesis ]

40 multipurpose ipa via conversational assistant

36 future work

Possible extensions of our work are

bull In this work we implemented a rule-based dialogue modelfrom scratch and adjusted it for the specific domain (ie e-learning) The core of the model ndash is a dialogue manager thatdetermines the current state of the conversation and the possi-ble next action Rule-based systems are generally consideredto be hardly adaptable to new domains however our dialoguemanager proved to be flexible to slight modifications in a work-flow One of the possible directions of future work would bethus the investigation of the general adaptability of the dialoguemanager core to other scenarios and domains (eg differentcourse)

bull A further possible extension would we the introduction of dif-ferent languages The current model is implemented for the Ger-man language but the OMB+ platform provides this course alsoin English and Chinese Therefore it would be interesting toinvestigate whether it is possible to adjust the existing systemto other languages without drastic changes in the model

[ September 4 2020 at 1017 ndash classicthesis ]

4D E E P L E A R N I N G R E - I M P L E M E N TAT I O N O F U N I T S

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research described inthis chapter was carried out in its entirety by the authorof this thesis The other authors of the publication actedas advisors

In this chapter we address the challenge of re-implementation oftwo central components of our IPA dialogue system described in Chap-ter 3 by employing deep learning techniques Since the data for thetraining was collected in a (short) trial-run of the rule-based systemit is not sufficient to train a neural network from scratch Thereforewe utilize Transfer Learning (TL) methods and employ the previouslypre-trained Bidirectional Encoder Representations from Transform-ers (BERT) model that we fine-tune for two of our custom tasks

41 introduction

Data mining and machine learning technologies have gained notableadvancement in many research fields including but not limited tocomputer vision robotics advanced analytics and computational lin-guistics (Pan and Yang 2009 Trautmann et al 2019 Werner et al2015) Besides that in recent years deep neural networks with theirability to accurately map from inputs to outputs (ie labels imagessentences) while analyzing patterns in data became a potential solu-tion for many industrial automatization tasks (eg fake news detec-tion sentiment analysis)

Nevertheless conventional supervised models still have limitationswhen applied to real-world data and industrial tasks The first chal-lenge here is a training phase since a robust model requires an im-mense amount of structured and labeled data Still there are just afew cases where such data is publicly available (eg research datasets)but for most of the tasks domains and languages the data has tobe gathered over a long time The second hurdle is that if trainedon existing data from a distinct domain a supervised model cannotgeneralize to conditions that are different from those faced in thetraining dataset Most of the machine learning methods perform cor-rectly only under a general assumption the training and test dataare mapped to the same feature space and the same distribution (Panand Yang 2009) When the distribution changes most statistical mod-els need to be retrained from scratch using newly obtained training

41

[ September 4 2020 at 1017 ndash classicthesis ]

42 deep learning re-implementation of units

samples It is especially crucial if applying such approaches to real-world data that is usually very noisy and contains a large numberof scenarios In this case a model would be ill-trained to make deci-sions about novel patterns Furthermore for most of the real-worldapplications it is expensive or even not feasible at all to recollect therequired training data to retrain the model Hence the possibility totransfer the knowledge obtained during the training on existing datato the new unseen domain is one of the possible solutions for indus-trial problems Such a technique is called Transfer Learning (TL) andit allows dealing with the abovementioned scenarios by leveragingexisting labeled data of some related tasks or domains

411 Outline and Contributions

This work addresses the challenge of re-implementing two out ofthree central components in our rule-based IPA conversational assis-tant with deep learning methods As we discussed in Chapter 2 hy-brid dialogue models (ie consisting of both rule-based and trainablecomponents) appear to replace the established rule- and template-based methods which are to the most part employed in the in-dustrial setting To investigate this assumption and examine currentstate-of-the-art deep learning techniques we conducted experimentson our custom domain-specific data Since the available data werecollected in a trial-run and the obtained dataset was relatively smallto train a conventional machine learning model from scratch we uti-lized the Transfer Learning (TL) approach and fine-tuned the existingpre-trained model for our target domain

For the experiments we defined two tasks

bull First we considered the NER problem in a custom domain set-ting We defined a sequence labeling task and employed a Bidirec-tional Encoder Representations from Transformers (BERT) (De-vlin et al 2018) as one of the transfer learning methods widelyutilized in the NLP area We applied the model on our datasetand fine-tuned it for seven (7) domain-specific (ie e-learning)entities

bull Second we examined the effectiveness of transfer learning forthe dialogue manager core The DM is one of the fundamentalparts of the conversational agent and requires compelling learn-ing techniques to hold the conversation intelligently For thatexperiment we defined a classification task and applied the BERT

model to predict the systems next action for every given userinput in a conversation We then computed the macro F1-scorefor 14 possible classes (ie actions) and an average dialogueaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

42 related work 43

The main characteristics of both settings are

bull the set of named entities and labels (ie actions) between sourceand target domain is different

bull for both settings we utilized the BERT model in German andmultilingual versions with additional parameters

bull the dataset used to train the target model is small and consistsof highly domain-specific labels The latter is often appearingin industrial scenarios

We finally verified that the selected method performed reasonablywell on both tasks We reached the performance of 093 macro F1

points for Named Entity Recognition (NER) and 075 macro F1 pointsfor the Next Action Prediction (NAP) task Therefore we concludethat both NER and NAP components could be employed to substituteor extend the existing rule-based modules

42 related work

As discussed in the previous section the main benefit of TransferLearning (TL) is that it allows the domains and tasks used duringtraining and testing phases to be different Initial steps in the for-mulation of transfer learning were made after NeurIPS-951 workshopthat was focused on the demand for lifelong machine-learning meth-ods that preserve and reuse previously learned knowledge (Chen etal 2018) Research on transfer learning has attracted more and moreattention since then especially in the computer vision domain andin particular for image recognition tasks (Donahue et al 2014 SharifRazavian et al 2014)

Recent studies have shown that TL can also be successfully appliedwithin the Natural Language Processing (NLP) domain especially onsemantically equivalent tasks (Dai et al 2019 Devlin et al 2018Mou et al 2016) Several research works were carried out specificallyon the Named Entity Recognition (NER) task and explored the capa-bilities of transfer learning when applied to different named entitycategories (ie different output spaces) For instance Qu et al (2016)pre-trained a linear-chain Conditional Random Fields (CRF) on a largeamount of labeled data in the source domain Then they introduceda two linear layer neural network that learns the difference betweenthe source and target label distributions Finally the authors initial-ized another CRF with parameters learned in linear layers to predictthe labels of the target domain Kim et al (2015) conducted experi-ments with transferring features and model parameters between re-lated domains where the label classes are different but may have asemantic similarity Their primary approach was to construct labelembeddings to automatically map the source and target label classesto improve the transfer (Chen et al 2018)

1Conference on Neural Information Processing Systems Formerly NIPS

[ September 4 2020 at 1017 ndash classicthesis ]

44 deep learning re-implementation of units

However there are also models trained in a manner such that theycould be applied to multiple NLP tasks simultaneously Among sucharchitectures are ULMFit (Howard and Ruder 2018) BERT (Devlin etal 2018) GPT2 (Radford et al 2019) and TransformerXL (Dai et al2019) ndash powerful models that were recently proposed for transferlearning within the NLP domain These architectures are called lan-guage models since they attempt to learn language structure from largetextual collections in an unsupervised manner and then predict thenext word in a sequence based on previously observed context wordsThe Bidirectional Encoder Representations from Transformers (BERT)(Devlin et al 2018) model is one of the most popular architectureamong the abovementioned and it is also one of the first that showeda close to human-level performance on the General Language Un-derstanding Evaluation (GLUE) benchmark2 (Wang et al 2018) thatincludes tasks on sentiment analysis question answering semanticsimilarity and language inference The architecture of BERT consistsof stacked transformer blocks that are pre-trained on a large corpusconsisting of 800M words from modern English books (ie BooksCor-pus by Zhu et al) and 2 500M words of text from pre-processed (iewithout markup) English Wikipedia articles Overall BERT advancedthe state-of-the-art performance for eleven (11) NLP tasks We describethis approach in detail in the subsequent section

43 bert bidirectional encoder representations from

transformers

Bidirectional Encoder Representations from Transformers (BERT) ar-chitecture is divided into two steps pre-training and fine-tuning Thepre-training step was enabled through two following tasks

bull Masked Language Model ndash the idea behind this approach is totrain a deep bidirectional representation that is different fromconventional language models which are either trained left-to-right or right-to-left To train a bidirectional model the authorsmask out a certain number of tokens in a sequence The modelis then asked to predict the original words for masked tokens Itis essential that in contrast to denoising auto-encoders (Vincentet al 2008) the model does not need to reconstruct the entireinput instead it attempts to predict the masked words Sincethe model does not know which words exactly it will be askedabout it learns a representation for every token in a sequence(Devlin et al 2018)

bull Next Sentence Prediction ndash aims to capture relationships be-tween two sentences A and B In order to train the model theauthors pre-trained a binarized next sentence prediction taskwhere pairs of sequences were labeled according to the criteriaA and B either follow each other directly in the corpus (label

2httpsgluebenchmarkcom

[ September 4 2020 at 1017 ndash classicthesis ]

43 bert bidirectional encoder representations from transformers 45

IsNext) or are both taken from random places (label NotNext)The model is then asked to predict whether the former or thelatter is the case The authors demonstrated that pre-trainingtowards this task is extremely useful for both question answeringand natural language inference tasks (Devlin et al 2018)

Furthermore BERT uses WordPiece tokenization for embeddings(Wu et al 2016) which segments words into subword-level tokensThis approach allows the model to make additional inferences basedon word structure (ie -ing ending denotes similar grammatical cat-egories)

The fine-tuning step follows the pre-training process For that theBERT model is initialized with parameters obtained during the pre-training step and a task-specific fully-connected layer is used aftertransformer blocks which maps the general representations to cus-tom labels (employing softmax operation)

self-attention The key operation of BERT architecture is theself-attention which is a sequence-to-sequence operation with conven-tional encoder-decoder structure (Vaswani et al 2017) The encodermaps an input sequence (x1 x2 xn) to a sequence of continuousrepresentations z = (z1 z2 zn) Given z the decoder then gener-ates an output sequence (y1y2 ym) of symbols one element ata time (Vaswani et al 2017) To produce output vector yi the selfattention operation takes a weighted average over all input vectors

yi =sumj

wijxj (16)

where j is an index over the entire sequence and the weights sumto one (1) over all j The weight wij is derived from a dot productfunction over xi and xj (Bloem 2019 Vaswani et al 2017)

w primeij = xTi xj (17)

The dot product returns a value between negative and positive in-finity and a softmax maps these values to [0 1] to ensure that theysum to one (1) over the entire sequence (Bloem 2019 Vaswani et al2017)

wij =expwlsquoijsumj expwlsquoij

(18)

A self-attention is not the only operation in transformers but is theessential one since it propagates information between vectors Otheroperations in the transformer are applied to every vector in the inputsequence without interactions between vectors (Bloem 2019 Vaswaniet al 2017)

[ September 4 2020 at 1017 ndash classicthesis ]

46 deep learning re-implementation of units

44 data amp descriptive statistics

The benchmark that we accumulated through the trial-run includes300 structured and partially labeled dialogues with the average lengthof dialogue being six (6) utterances All the conversations were heldin the German language Detailed general statistics can be found inTable 14

Value

Max Len Dialogue (in utterances) 15

Avg Len Dialogue (in utterances) 6

Max Len Utterance (in tokens) 100

Avg Len Utterance (in tokens) 9

Overall Unique Action Labels 13

Overall Unique Entity Labels 7

Train ndash Dialogues ( Utterances) 200 (1161)

Eval ndash Dialogues ( Utterances) 50 (279)

Test ndash Dialogues ( Utterances) 50 (300)

Table 14 General statistics for conversational dataset

The dataset covers 14 unique next actions and seven (7) uniquenamed entities Detailed information on next actions and entities canbe found in Table 15 and Table 16 Below we listed possible actionswith the corresponding explanation and predefined mapped state-ments

unk ndash requests to define the intent of the question mathematicalorganizational or contextual (ie fallback policy in case if theautomated intent recognition unit fails) rarr ldquoHast du eine Fragezu einer Aufgabe (MATH) zu einem Text im Kurs (TEXT) oder eineorganisatorische Frage (ORG)rdquo

org ndash handovers discussion to a human tutor in case of organiza-tional question rarr ldquoEs scheint sich wohl um eine organisatorischeFrage zu handeln Bitte fasse deine Frage in einem kurzen Text zusam-men damit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

text ndash handovers discussion to a human tutor in case of contextualquestion rarr ldquoEs scheint sich wohl um eine inhaltliche Frage zuhandeln Bitte fasse deine Frage in einem kurzen Text zusammendamit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

[ September 4 2020 at 1017 ndash classicthesis ]

44 data amp descriptive statistics 47

topic ndash requests information on the topic rarr ldquoWelches Kapitel bear-beitest du gerade Antworte bitte mit der Kapitelnummer zB IA IB II oder mit dem Kapitelnamen zB Geometrierdquo

examination ndash requests information about the examination moderarrldquoBearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfungrdquo

subtopic ndash requests information about the subtopic in case of quizor section-level trainingexercise modes rarr ldquoWie heiszligt der Ab-schnitt in dem sich die Aufgabe befindet auf die sich deine Fragebeziehtrdquo

level ndash requests information about the level of examination modechapter or section rarr ldquoGeht es um eine Aufgabe auf Kapitel-Ebene(zB um eine Trainingsaufgabe im Kapitel Geometrie) oder um eineAufgabe auf der Ebene eines Abschnitts in einem Kapitel (zB umeine Aufgabe zum Abschnitt Winkel im Kapitel Geometrie) Antwortebitte mit KAP fuumlr Kapitel und SEK fuumlr Abschnittrdquo

question number ndash requests information about the question num-ber rarr ldquoWelche Aufgabe bearbeitest du gerade Wenn du zB beiTeilaufgabe c in der zweiten Trainingsaufgabe (oder der zweiten Uumlbungs-aufgabe) bist antworte mit 2c Bearbeitest du gerade einen Quiz odereine Schlusspruumlfung gib bitte nur die Teilaufgabe anrdquo

final request ndash considers the Informational Dictionary (ID) to becomplete and initiates the verification step rarr ldquoDeine Angabenwaren wie folgt Habe ich dich richtig verstanden Bitte antwortemit Ja oder Neinrdquo

verify request ndash requests the student to verify the assembled in-formation rarr ldquoWelche der folgenden Punkte habe ich falsch erkanntSchreibe bitte welche der folgenden Punkte es sind [a b c ]3 rdquo

correct request ndash requests the student to provide a correct infor-mation if there is an error in the assembled data rarr ldquoGib bittedie korrekte Information fuumlr [a]4 einrdquo

exact question ndash requests the student to provide the detailed ex-planation of the problem rarr ldquoBitte fasse deine Frage in einemkurzen Text zusammen damit ich sie an meinen menschlichen Kolle-gen weiterleiten kannrdquo

human handover ndash finalizes the conversation rarr ldquoVielen Dankunsere menschlichen Kollegen melden sich gleich bei dirrdquo

3Variables that depend on the state of Informational Dictionary (ID)4Variable that was selected as erroneous and needs to be corrected

[ September 4 2020 at 1017 ndash classicthesis ]

48 deep learning re-implementation of units

Action Counts Action Counts

Final Request 321 Unk 80

Human Handover 300 Subtopic 55

Exact Question 286 Correct Request 40

Question Number 176 Verify Request 34

Examination 175 Org 17

Topic 137 Text 13

Level 130

Table 15 Detailed statistics on possible systems actions Columns Countsdenote the number of occurrences of each action in the entiredataset

Entity Counts

Question Nr 317

Chapter 311

Examination 303

Subtopic 198

Level 80

Intent 70

Table 16 Detailed statistics on possible named entities Column Counts de-note the number of occurrences of each entity in the entire dataset

45 named entity recognition

We defined a sequence labeling task to extract custom entities fromuser input We considered seven (7) viable entities (see Table 16) tobe recognized by the model topic subtopic examination mode and levelquestion number intent as well as the entity other for remaining wordsin the utterance Since the data collected from the rule-based sys-tem already includes information on the entities we able to train adomain-specific NER unit However since the original user-input wasinformal the same information could be provided in different writ-ing styles That means that a single entity could have different surfaceforms (eg synonyms writing styles) Therefore entities that we ex-tracted from the rule-based system were transformed into a universalstandard (eg official chapter names) To consider all of the variableentity forms while post-labeling the original dataset we determinedgeneric entity names (eg chapter question nr) and mapped vari-ations of entities from the user input (eg Chapter = [ElementaryCalculus Chapter I ]) to them The overall dataset consists of 300labeled dialogues where 200 (with 1161 utterances) were employedfor training and 100 for evaluation and test sets (50 dialogues withca 300 utterances for each set respectively)

[ September 4 2020 at 1017 ndash classicthesis ]

46 next action prediction 49

451 Model Settings

We conducted experiments with German and multilingual BERT im-plementations5 The capitalization of words is significant for the Ger-man language therefore we run the experiments on the capitalizeddata while preserving the original punctuation We employed the avail-able base model for both multilingual and German BERT implemen-tations in the cased version We initiated the learning rate for bothmodels to 1e minus 4 and the maximum length of the tokenized inputwas set to 128 tokens We conducted the experiments multiple timeswith different seeds for a maximum of 50 epochs with the trainingbatch size set to 32 We utilized AdamW as the optimizer and appliedearly stopping if the performance did not improve significantly after5 epochs

46 next action prediction

We defined a classification problem where we aim to predict the sys-temrsquos next action according to the given user input We assumed 14custom actions (see Table 15) that we considered being our classesWhile collecting the toy dataset every input was automatically la-beled by the rule-based system with the corresponding next actionand the dialogue-id Thus no additional post-labeling was requiredon this step

We investigated two settings for this task

bull Default Setting Utilizing a user input and the correspondingclass (ie next action) without any supplementary context Bydefault we conduct all of our experiments in this setting

bull Extended Setting Employing a user input corresponding nextaction and previous systems action as a source of supplementarycontext For this setting we experimented with the best per-forming model from the default setting

The overall dataset consists of 300 labeled dialogues where 200(with 1161 utterances) were employed for training and 100 for evalu-ation and test sets (50 dialogues with ca 300 utterances for each setrespectively)

461 Model Settings

For the NAP task we carried out experiments with German and multi-lingual BERT implementations as well We examined the performanceof both capitalized and lower-cased inputs as well as plain and prepro-cessed data For the multilingual BERT we used the base model inboth cased and uncased modifications For the German BERT we

5httpsgithubcomhuggingfacetransformers

[ September 4 2020 at 1017 ndash classicthesis ]

50 deep learning re-implementation of units

utilized the base model in the cased modification only6 For bothmodels we initiated the learning rate to 4e minus 5 and the maximumlength of the tokenized input was set to 128 tokens We conductedthe experiments multiple times with different seeds for a maximumof 300 epochs with the training batch size set to 32 We utilized AdamW

as the optimizer and applied early stopping if the performance didnot improve significantly after 15 epochs

47 evaluation and results

To evaluate the models we computed word-level macro F1 score forthe Named Entity Recognition (NER) task and utterance-level macro F1

score for the Next Action Prediction (NAP) task The word-level F1

is estimated as the average of the F1 scores per class each computedfrom all words in the evaluation and test sets The outcomes for theNER task are represented in Table 17 For utterance-level F1 a singleclass label (ie next action) is taken for the whole utterance Theresults for the NAP task are shown in Table 18

We additionally computed average dialogue accuracy for the best per-forming NAP models This score indicates how well the predictednext actions compose the conversational flow The average dialogueaccuracy was estimated for 50 conversations in the evaluation andtest sets respectively The results are displayed in Table 19

The obtained results for the NER task revealed that German BERT

performed significantly better than the multilingual BERT model Theperformance of the custom NER unit is at 093 macro F1 points for allpossible named entities (see Table 17) In contrast for the NAP taskthe multilingual BERT model obtained better performance than theGerman BERT model The best performing method in the default set-ting gained a macro F1 of 0677 points for 14 possible classes whereasthe model in the extended setting performed better ndash its best macroF1 score is 0752 for the equal number of classes (see Table 18)

Regarding the dialogue accuracy the extended system fine-tunedwith multilingual BERT obtained better outcomes as the default one(see Table 19) The overall conclusion for the NAP is that the capi-talized setting increased the performance of the model whereas theinclusion of punctuation has not influenced the results

6Uncased pre-trained modification of model was not available

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 51

Task Model Cased Punct F1

Eval Test

NER GER X X 0971 0930

NER Mult X X 0926 0905

Table 17 Word-level F1 for the Named Entity Recognition (NER) task Inbold best performance for evaluation and test sets

Task Model Cased Punct Extended F1

Eval Test

NAP GER X times times 0711 0673

NAP GER X X times 0701 0606

NAP Mult X X times 0688 0625

NAP Mult X times times 0769 0677

NAP Mult X times X 0810 0752

NAP Mult times X times 0664 0596

NAP Mult times times times 0742 0502

Table 18 Utterance-level F1 for the Next Action Prediction (NAP) task Un-derlined best performance for evaluation and test sets for defaultsetting (without previous action context) In bold best perfor-mance for evaluation and test sets on extended setting (with previ-ous action context)

Model Accuracy

Eval Test

NAP default 0765 0724

NAP extended 0813 0801

Table 19 Average dialogue accuracy computed for the Next Action Predic-tion (NAP) task for best preforming models In bold best perfor-mance for evaluation and test sets

[ September 4 2020 at 1017 ndash classicthesis ]

52 deep learning re-implementation of units

Figure5M

acroF1

evaluationN

AP

ndashdefault

modelC

omparison

between

Germ

anand

multilingual

BERT

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 53

Figu

re6

Mac

roF1

test

NA

Pndash

defa

ult

mod

elC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

54 deep learning re-implementation of units

Figure7M

acroF1

evaluationN

AP

ndashextended

model

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 55

Figu

re8

Mac

roF1

test

NA

Pndash

exte

nded

mod

el

[ September 4 2020 at 1017 ndash classicthesis ]

56 deep learning re-implementation of units

Figure9M

acroF1

evaluationN

ERndash

Com

parisonbetw

eenG

erman

andm

ultilingualBER

T

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 57

Figu

re1

0M

acroF1

test

NER

ndashC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

58 deep learning re-implementation of units

48 error analysis

We then investigated the cases where models failed to predict thecorrect action or labeled the named entity span erroneously Belowwe outline the most common errors

next action prediction One of the prevalent flaws in the de-fault model was the mismatch between two consecutive actionsndash the action question number and subtopic That is due to theposition of these actions in the conversational flow The occur-rence of both actions in the dialogue is not strict and largelydepends on the previous action To explain this we illustratethe following possible conversational scenarios

bull The system can either directly proceed with the request forthe question number if the examination mode is the finalexamination

bull If the examination mode is training or exercise the systemhas to ask about the level first (ie chapter or section)

bull If it is a chapter-level the system can again directly pro-ceed with the request for the question number

bull In the case of section-level the system has to ask aboutthe subsection first (if this information was not providedbefore)

The examination of the extended model showed that the inductionof supplementary context (ie previous action) improved theperformance of the system by about 60

named entity recognition The cases where the system failedcover mismatches between the labels chapter and other andthe labels question number and other This failure arose dueto the imperfectly labeled span of a multi-word named entity(eg ldquoElementary Calculus Proportionality Percentagerdquo) The firstor last words in the named entity were excluded from the spanand erroneously labeled with the other label

49 summary

In this chapter we re-implemented two of three fundamental unitsof our IPA conversational agent ndash Named Entity Recognition (NER)and Dialogue Manager (DM) ndash using the methods of Transfer Learn-ing (TL) specifically the Bidirectional Encoder Representations fromTransformers (BERT) model

We believe that the obtained results are reasonably high consider-ing a comparatively little amount of training data we employed tofine-tune the models Therefore we conclude that both NAP and NER

segments could be used to replace or extend the existing rule-based

[ September 4 2020 at 1017 ndash classicthesis ]

410 future work 59

modules Rule-based components as well as ElasticSearch (ES) arelimited in their capacities and not flexible to novel patterns Whereasthe trainable units generalize better and could possibly decrease theamount of inaccurate predictions in case of accidental dialogue be-havior Furthermore to increase the overall robustness rule-basedand trainable components could be used synchronously when onesystem fails (eg ElasticSearch (ES) module can not extract an entity)the conversation continues on the prediction obtained from the othermodel

410 future work

There are possible directions for future work

bull At the time of writing of this thesis both units were trained andinvestigated on its own ndash that means without being incorpo-rated into the initial IPA conversational system Thus one of thepotential directions of future work would be the embodimentof these units into the original system and the examination ofpossible failure cases Furthermore the resulting hybrid sys-tem could then imply the constraint of additional meta policiesto prevent an unexpected systems behavior

bull Further investigation could be towards the multi-language mo-dality for the re-implemented units Since the Online Mathe-matik Bruumlckenkurs (OMB+) platform also supports English andChinese it would be interesting to examine whether the sim-ple translation from target language (ie English Chinese) tosource language (ie German) would be sufficient to employalready-assembled dataset and pre-trained units Presumablyit should be achievable in the case of the Next Action Predic-tion (NAP) unit but could lead to a noticeable performance dropfor the Named Entity Recognition (NER) task since there couldbe a considerable gap between translated and original entitiesspans

bull Last but not least one of the further extensions for the IPA

would be the eventual applicability for more cognitively de-manding tasks For instance it is of interest to extend the modelfor more complex domains like handling of mathematical ques-tions Would it be possible to automate the human assistance atleast by less complicated mathematical inquiries or is it still un-achievable with currently existing NLP techniques and machinelearning methods

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

Part II

C L U S T E R I N G - B A S E D T R E N D A N A LY Z E R

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

5F O U N D AT I O N S

As discussed in Chapter 1 predictive analytics is one of the poten-tial Intelligent Process Automation (IPA) tasks due to its capabilityto forecast future events by using current and historical data andtherethrough assist knowledge workers with for instance sales or in-vestments Modern machine learning algorithms allow an intelligentand fast analysis of large amounts of data However the evaluationof such algorithms and models remains one of the grave problemsin this domain This is mostly due to the lack of publicly availablebenchmarks for the evaluation of topic modeling and trend detectionalgorithms Hence in this part of the thesis we are addressing thischallenge and assembling the benchmark for detecting both trendsand downtrends (ie topics that become less frequent overtime) Tothe best of our knowledge the task of downtrend detection has notbeen well addressed before

In this chapter we introduce the concepts required by the secondpart of the thesis We begin with the enlightenment to the topic mod-eling task and define the emerging trend in Section 51 Next weexamine the related work existing approaches to topic modeling andtrend detection and their limitations in Section 52 In Chapter 6we introduce our TRENDNERT benchmark for trend and downtrenddetection and describe the method we employed for its assembly

51 motivation and challenges

Science changes and evolves rapidly novel research areas emergewhile others fade away Keeping pace with these changes is challeng-ing Therefore recognizing and forecasting emerging research trends isof significant importance for researchers academic publishers as wellas for funding agencies stakeholders and innovation companies

Previously the task of trend detection was mostly solved by do-main experts who used specialized and cumbersome tools to inves-tigate the data to get useful insights from it (eg through data visu-alization or statistics) However manual analysis of large amountsof data could be time-consuming and hiring domain experts is verycostly Furthermore the overall increase of research data (ie GoogleScholar PubMed DBLP) in the past decade makes the approachbased on human domain experts less and less scalable In contrastautomated approaches become a more desirable solution for the taskof emerging trend identification (Kontostathis et al 2004 Salatino2015) Due to a large number of electronic documents appearing on

63

[ September 4 2020 at 1017 ndash classicthesis ]

64 foundations

the web daily several machine learning techniques were developed tofind patterns of word occurrences in those documents using hierarchi-cal probabilistic models These approaches are called ndash topic modelsand they allow the analysis of extensive textual collections and valu-able insights from them The main advantage of topic models is theirability to discover hidden patterns of word-use and connecting docu-ments containing similar patterns In other words the idea of a topicmodel is that documents are a mixture of topics where a topic is de-fined as a probability distribution over words (Alghamdi and Alfalqi2015)

Topics have a property of being able to evolve thus modeling top-ics without considering time may confuse the precise topic identifi-cation Whereas modeling topics by considering time is called topicevolution modeling and it can reveal important hidden informationin the document collection allowing recognizing topics with the ap-pearance of time This approach also provides the ability to check theevolution of topics during a given time window and examine whethera given topic is gaining interest and popularity over time (ie becom-ing an emerging trend) or if its development is stable Considering thelatter approach we turn to the task of emerging trend identification

How is an emerging trend defined An emerging trend is a topicthat is gaining interest and utility over time and it has two attributesnovelty and growth (Small Boyack and Klavans 2014 Tu and Seng2012) Rotolo Hicks and Martin (2015) explain the correlations ofthese two attributes in the following way Up to the point of theemergence a research topic is specified by a high level of novelty Atthat time it does not attract any considerable attention from the sci-entific community and due to the limited impact its growth is nearlyflat After some turning points (ie the appearance of significant sci-entific publications) the research topic starts to grow faster and maybecome a trend if the growth is fast however the level of a noveltydecreases continuously once the emergence becomes clear Then atthe final phase the topic becomes well-established while its noveltylevels out (He and Chen 2018)

To detect trends in textual data one employs computational anal-ysis techniques and Natural Language Processing (NLP) In the sub-sequent section we give an overview of the existing approaches fortopic modeling and trend detection

52 detection of emerging research trends

The task of trend detection is closely associated with the topic mod-eling task since in the first instance it detects the latent topics in thesubstantial collection of data It secondarily employs techniques toinvestigate the evolution of the topic and whether it has the potentialto become a trend

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 65

Topic modeling has an extensive history of applications in the scien-tific domain including but not limited to studies of temporal trends(Griffiths and Steyvers 2004 Wang and McCallum 2006) and investi-gation of the impact prediction (Yogatama et al 2011) Below we givea general overview of the most significant methods used in this do-main explain their importance and if any the potential limitations

521 Methods for Topic Modeling

In order to extract a topic from textual documents (ie research pa-pers blog posts) the precise definition of the model for topic repre-sentation is crucial

latent semantic analysis or formerly called Latent SemanticIndexing is one of the fundamental methods in the topic model-ing domain proposed by Deerwester et al in 1990 Its primarygoal is to generate vector-based representations for documentsand terms and then employ these representations to computethe similarity between documents (resp terms) in order to se-lect the most related ones In this regard Latent Semantic Anal-ysis (LSA) takes a matrix of documents and terms and then de-composes it into a separate document-topic matrix and a topic-term matrix Before turning to the central task of finding latenttopics that capture the relationships among the words and docu-ments it also performs dimensionality reduction utilizing trun-cated Singular Value Decomposition This technique is usuallyapplied to reduce the sparseness noisiness and redundancyacross dimensions in the document-topic matrix Finally usingthe resulting document and term vectors one applies measuressuch as cosine similarity to evaluate the similarity of differentdocuments words or queries

One of the extensions to traditional LSA is probabilistic LatentSemantic Analysis (pLSA) that was introduced by Hofmann in1999 to fix several shortcomings The foremost advantage ofthe follow-up method was that it could successfully automatedocument indexing which in turn improved the LSA in a prob-abilistic sense by using a generative model (Alghamdi and Al-falqi 2015) Furthermore the pLSA can differentiate betweendiverse contexts of word usage without utilizing dictionariesor a thesaurus That affects better polysemy disambiguationand discloses typical similarities by grouping words that sharea similar context Among the works that employ pLSA is theapproach proposed by Gohr et al (2009) where authors applypLSA in a window that slides across the stream of documents toanalyze the evolution of topics Also Mei et al (2008) uses thepLSA in order to create a network of topics

[ September 4 2020 at 1017 ndash classicthesis ]

66 foundations

latent dirichlet allocation was developed by Blei Ng andJordan (2003) to overcome the limitations of both LSA and pLSA

models Nowadays Latent Dirichlet Allocation (LDA) is ex-pected to be one of the most utilized and robust probabilistictopic models The fundamental concept of LDA is the assump-tion that every document is represented as a mixture of topicswhere each topic is a discrete probability distribution that de-fines the probability of each word to appear in a given topicThese topic probabilities provide a compressed representationof a document In other words LDA models all of d documentsin the collection as a mixture over n latent topics where eachof the topics describes a multinomial distribution over awwordin the vocabulary (Alghamdi and Alfalqi 2015)

LDA fixes such weaknesses of the pLSA like the incapacity toassign a probability to previously unseen documents (becausepLSA learns the topic mixtures only for documents seen in thetraining phase) and the overfitting problem (ie the number ofparameters in pLSA increases linearly with the size of the corpus)(Salatino 2015)

An extension of the conventional LDA is the hierarchical LDA

(Griffiths et al 2004) where topics are grouped in hierarchiesEach node in the hierarchy is associated with a topic and a topicis a distribution across words Another comparable approach isthe Relational Topic Model (Chang and Blei 2010) which is amixture of a topic model and a network model for groups oflinked documents The attribute of every document is its text(ie discrete observations taken from a fixed vocabulary) andthe links between documents are hyperlinks or citations

LDA is a prevalent method for discovering topics (Blei and Laf-ferty 2006 Bolelli Ertekin and Giles 2009 Bolelli et al 2009Hall Jurafsky and Manning 2008 Wang and Blei 2011 Wangand McCallum 2006) However it was explored that to trainand fine-tune a stable LDA model could be very challenging andtime-consuming (Agrawal Fu and Menzies 2018)

correlated topic model One of the main drawbacks of the LDA

is its incapability of capturing dependencies among the topics(Alghamdi and Alfalqi 2015 He et al 2017) Therefore Bleiand Lafferty (2007) proposed the Correlated Topic Model as anextension of the LDA approach In this model the authors re-placed the Dirichlet distribution with a logistic-normal distribu-tion that models pairwise topic correlations with the Gaussiancovariance matrix (He et al 2017)

However one of the shortcomings of this model is the compu-tational cost The number of parameters in the covariance ma-trix increases quadratic to the number of topics and parameterestimation for the full-rank matrix can be inaccurate in high-dimensional space (He et al 2017) Therefore He et al (2017)

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 67

proposed their method to overcome this problem The authorsintroduced the model which learns dense topic embeddingsand captures topic correlations through the similarity betweenthe topic vectors According to He et al their method allowsefficient inference in the low-dimensional embedding space andsignificantly reduces previous cubic or quadratic time complex-ity to linear concerning the topic number

522 Topic Evolution of Scientific Publications

Although the topic modeling task is essential it has a significant off-spring task namely the modeling of topic evolution Methods able toidentify topics within the context of time are highly applicable in thescientific domain They allow understanding of the topic lineage andreveal how research on one topic influences another Generally thesemethods could be classified according to the primary sources of in-formation they employ text citations or key phrases

text-based Extensive textual collections have dynamic co-occurre-nces of word patterns that change over time Thus models thattrack topics evolution over time should consider both word co-occurrence patterns and the timestamps Blei and Lafferty 2006

In 2006 Wang and McCallum proposed a Non-Markov Continu-ous Time model for the detection of topical trends in the col-lection of NeurIPS publications (the overall timestamp of 17years was considered) The authors made the following as-sumption If the pattern of the word co-occurrence exists fora short time then the system creates a narrow-time-distributiontopic Whereas for the long-term patterns system generated abroad- time-distribution topic The principal objective of this ap-proach is that it models topic evolution without discretizing thetime ie the state at time t + t1 is independent of the state attime t (Wang and McCallum 2006)

Another work done by Blei and Lafferty (2006) proposed the Dy-namic Topic Models The authors assumed that the collectionof documents is organized based on certain time spans and thedocuments belonging to every time span are represented witha K-component model Thereby the topics are associated withtime span t and emerge from topics corresponding to span timetminus 1 One of the main differences of this model compared toother existing approaches is that it utilizes Gaussian distribu-tion for the topic parameters instead of the conventional Dirich-let distribution and therefore can capture the evolution overvarious time spans

[ September 4 2020 at 1017 ndash classicthesis ]

68 foundations

citation-based analysis is the second popular direction that isconsidered to be effective for trend identification He et al 2009Le Ho and Nakamori 2005 Shibata Kajikawa and Takeda2009 Shibata et al 2008 Small 2006 This method assumes thatcitations in their various forms (ie bibliographic citations co-citation networks citation graphs) indicate the meaningful rela-tionship between topics and uses citations to model the topicevolution in the scientific domain Nie and Sun (2017) utilizethis approach along with the Latent Dirichlet Allocation (LDA)and k-means clustering to identify research trends The authorsfirst use LDA to extract features and determine the optimal num-ber of topics Then they use k-means to obtain thematic clus-ters and finally they compute citation functions to identify thechanges of clusters over time

Though the citation-based approachrsquos main drawback is that aconsistent collection of publications associated with a specificresearch topic is required for these techniques to detect the clus-ter and novelty of the particular topic He and Chen 2018 Fur-thermore there are also many trend detection scenarios (egresearch news articles or research blog posts) in which citationsare not readily available That makes this approach not applica-ble to this kind of data

keyphrase-based Another standard solution is based on the useof keyphrase information extracted from research papers In thiscase every keyword is representative of a single research topicSystems like Saffron (Monaghan et al 2010) as well as Saffron-based models use this approach Saffron is a research toolkitthat provides algorithms for keyphrase extraction entity link-ing and taxonomy extraction Asooja et al (2016) utilize Saffronto extract the keywords from LREC proceedings and then pro-posed trend forecast based on regression models as an extensionto Saffron

However this method raises many questions including the fol-lowing how to deal with the noisiness of keywords (ie notrelevant keywords) and how to handle keywords that have theirhierarchies (ie sub-areas) Other drawbacks of this approachinclude the ambiguity of keywords (ie ldquoJavardquo as an island andldquoJavardquo as a programming language) or necessity to deal withsynonyms which could be treated as different topics

53 summary

An emerging trend detection algorithm aims to recognize topics thatwere earlier inappreciable but now gaining the importance and in-terest within the specific domain Knowledge of emerging trends isespecially essential for stakeholders researchers academic publish-ers and funding organizations Assume a business analyst in the

[ September 4 2020 at 1017 ndash classicthesis ]

53 summary 69

biotech company who is interested in the analysis of technical arti-cles for recent trends that could potentially impact the companiesinvestments A manual review of all the available information in aspecific domain would be extremely time-consuming or even not fea-sible However this process could be solved through Intelligent Pro-cess Automation (IPA) algorithms Such software would assist humanexperts who are tasked with identifying emerging trends through au-tomatic analysis of extensive textual collections and visualization ofoccurrences and tendencies of a given topic

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

6B E N C H M A R K F O R S C I E N T I F I C T R E N D D E T E C T I O N

This chapter covers work already published at interna-tional peer-reviewed conferences The relevant publica-tion is Moiseeva and Schuumltze 2020 The research describedin this chapter was carried out in its entirety by the authorof the thesis The other author of the publication acted asan advisor

Computational analysis and modeling of the evolution of trendsis a significant area of research because of its socio-economic impactHowever no large publicly available benchmark for trend detectioncurrently exists making a comparative evaluation of methods impos-sible We remedy this situation by publishing the benchmark TREND-NERT consisting of a set of gold trends (resp downtrends) and doc-ument labels that is available as an unlimited download and a largeunderlying document collection that can also be obtained for free Wepropose Mean Average Precision (MAP) as an evaluation measure fortrend detection and apply this measure in an investigation of severalbaselines

61 motivation

What is an emerging research topic and how is it defined in the litera-ture At the time of emergence a research topic often does not attractmuch recognition from the scientific society and is exemplified by justa few publications He and Chen (2018) associate the appearance ofa trend to some triggering event eg the publication of an article ina high-impact journal Later the research topic starts to grow fasterand becomes a trend (He and Chen 2018)

We adopt this as our key representation of trend in this paper atrend is a research topic with a strongly increasing number of publi-cations in a particular time interval In opposite to trends there arealso downtrends which refer to the topics that move lower in theirdemand or growth as they evolve Therefore we define a downtrendas the converse to a trend a downtrend is a research topic that has astrongly decreasing number of publications in a particular time inter-val Downtrends can be as important to detect as trends eg for afunding agency planning its budget

To detect both types of topics (ie trends and downtrends) intextual data one employs computational analysis techniques and NLPThese methods enable the analysis of extensive textual collectionsand gain valuable insights into topical trendiness and evolution over

71

[ September 4 2020 at 1017 ndash classicthesis ]

72 benchmark for scientific trend detection

time Despite the overall importance of trend analysis there is to ourknowledge no large publicly available benchmark for trend detectionThis makes a comparative evaluation of methods and algorithms im-possible Most of the previous work has used proprietary or moder-ate datasets or evaluation measures were not directly bound to theobjectives of trend detection (Gollapalli and Li 2015)

We remedy this situation by publishing the benchmark TREND-NERT1 The benchmark consists of the following

1 A set of gold trends compiled based on an extensive analysis ofthe meta-literature

2 A set of labeled documents (based on more than one millionscientific publications) where each document is assigned to atrend a downtrend or the class of flat topics and supported bythe topic-title

3 An evaluation script to compute Mean Average Precision (MAP)as the measure for the accuracy of trend detection

We make the benchmark available as unlimited downloads2 Theunderlying research corpus can be obtained for free from SemanticScholar3 (Ammar et al 2018)

611 Outline and Contributions

In this work we address the challenge of the benchmark creationfor (down)trend detection This chapter is structured as follows Sec-tion 62 introduces our model for corpus creation and its fundamen-tal components In Section 63 we present the crowdsourcing proce-dure for the benchmark labeling Section 65 outlines the proposedbaseline for trend detection our experimental setup and outcomesFinally we illustrate the details on the distributed benchmark in Sec-tion 66 and conclude in Section 67

Our contributions within this work are as follows

bull TRENDNERT is the first publicly available benchmark for (down)-trend detection

bull TRENDNERT is based on a collection of more than one milliondocuments It is among the largest that has been used for trenddetection and therefore offers a realistic setting for developingtrend detection algorithms

bull TRENDNERT addresses the task of detecting both trends anddowntrends To the best of our knowledge the task of down-trend detection has not been addressed before

1The name is a concatenation of trend and its ananym dnert Because it supportsthe evaluation of both trends and downtrends

2httpsdoiorg1017605OSFIODHZCT3httpsapisemanticscholarorgcorpus

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 73

62 corpus creation

In this section we explain the methodology we used to create theTRENDNERT benchmark

621 Underlying Data

We employ a corpus provided by Semantic Scholar4 It includes morethan one million papers published mostly between 2000 and 2015

in about 5000 computer science journals and conference proceedingsEvery record in the collection consists of a title key phrases abstractfull text and metadata

622 Stratification

The distribution of documents over time in the entire original datasetis skewed (see Figure 11) We discovered that clustering the entire col-lection as well as a random sample produces corrupt results becausemore weight is given to later years than to earlier years Thereforewe first generated a stratified sample of the original document collec-tion To this end we randomly pick 10 000 documents for each yearbetween 2000 and 2016 for an overall sample size of 160 000

Figure 11 Overall distribution of papers in the entire dataset Years 1975 to2016

4httpswwwsemanticscholarorg

[ September 4 2020 at 1017 ndash classicthesis ]

74 benchmark for scientific trend detection

623 Document representations amp Clustering

As we mentioned this workrsquos primary focus was the creation of abenchmark not the development of new algorithms or models fortrend detection Therefore for our experiments we have selected analgorithm based on k-means clustering as a simple baseline methodfor trend detection

In conventional document clustering documents are usually repre-sented as bag-of-words (BOW) feature vectors (Manning Raghavanand Schuumltze 2008) Those however have a major weakness theyneglect the semantics of words (Le and Mikolov 2014) Still the re-cent work in representation learning domain and particularly doc2vecndash the method proposed by Le and Mikolov (2014) ndash can provide rep-resentations to document clustering that overcome this weakness

The doc2vec5 algorithm is inspired by techniques for learning wordvectors (Mikolov et al 2013) and is capable of capturing semantic reg-ularities in textual collections This is an unsupervised approach forlearning continuous distributed vector representations for text (ieparagraphs documents) This approach maps documents into a vec-tor space such that semantically related documents are assigned sim-ilar vector representations (eg an article about ldquogenomicsrdquo is closerto an article about ldquogene expressionrdquo than to an article about ldquofuzzysetsrdquo) Formally each paragraph is mapped to the individual vectorrepresented by a column in matrix D and every word is also mappedto the individual vector represented by a column in matrix W Theparagraph vector and word vectors are concatenated to predict thenext word in a context (Le and Mikolov 2014) Other work has al-ready successfully employed this type of document representationsfor topic modeling combining them with both Latent Dirichlet Allo-cation (LDA) and clustering approaches (Curiskis et al 2019 DiengRuiz and Blei 2019 Moody 2016 Xie and Xing 2013)

In our work we run doc2vec on the stratified sample of 160 000papers and represent each document as a length-normalized vector(ie document embedding) These vectors are then clustered intok = 1000 clusters using the scikit-learn6 implementation of k-means(MacQueen 1967) with default parameters The combination of doc-ument representations with a clustering algorithm is conceptuallysimple and interpretable Also the comprehensive comparative eval-uation of topic modeling methods utilizing document embeddingsperformed by Curiskis et al (2019) showed that doc2vec feature repre-sentations with k-means clustering outperform several other methods7

on three evaluation measures (Normalized Mutual Information Ad-justed Mutual Information and Adjusted Rand Index)

5We use the implementation provided by Gensim httpsradimrehurekcom

gensimmodelsdoc2vechtml6httpsscikit-learnorgstable7hierarchical clustering k-medoids NMF and LDA

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 75

We run 10 trials of stratification and clustering resulting in 10 dif-ferent clusterings We do this to protect against the variability ofclustering and because we do not want to rely on a single clusteringfor proposing (down)trends for the benchmark

624 Trend amp Downtrend Estimation

Recalling our definitions of trend and downtrend A trend (resp down-trend) is defined as a research topic that has a strongly increasing(resp decreasing) number of publications in a particular time inter-val

Linear trend estimation is a widely used technique in predictive(time-series) analysis that proves statements about tendencies in thedata by correlating the measurements to the times at which they oc-curred (Hess Iyer and Malm 2001) Considering the definition oftrend (resp downtrend) and the main concept of linear trend esti-mation models we utilize the linear regression to identify trend anddowntrend candidates in resulting clustering sets

Specifically we count the number of documents per year for a clus-ter and then estimate the parameters of the best fitting line (ie slope)Clusters are then ranked according to the resulting slope and then = 150 clusters with the largest (resp smallest) slope are selectedas trend (resp downtrend) candidates Thus our definition of a single-clustering (down)trend candidate is a cluster with extreme ascendingor descending slope Figure 12 and Figure 13 give two examples of(down)trend candidates

625 (Down)trend Candidates Validation

As detailed in Section 623 we run ten series of clustering result-ing in ten discrete clustering sets We then estimated the (down)trendcandidates according to the procedure described in Section 624 Con-sequently for each of the ten clustering sets we obtained 150 trendand 150 downtrend candidates

Nonetheless for the benchmark among all the (down)trend can-didates across the ten clustering sets we want to sort out only thosethat are consistent overall sets To obtain consistent (down)trend candi-dates we apply a Jaccard coefficient metric that is used for measuringthe similarity and diversity of sample sets We define an equivalencerelation simR two candidates (ie clusters) are equivalent if their Jac-card coefficient is gt τsim where τsim = 05 Following this notion wecompute the equivalence of (down)trend candidates across the tensets

Finally we define an equivalence class as an equivalence class trendcandidate (resp equivalence class downtrend candidate) if it contains trend(resp downtrends) candidates in at least half of the clusterings This

[ September 4 2020 at 1017 ndash classicthesis ]

76 benchmark for scientific trend detection

Figure 12 A trend candidate (positive slope) Topic Sentiment and EmotionAnalysis (using Social Media Channels)

Figure 13 A downtrend candidate (negative slope) Topic XML

[ September 4 2020 at 1017 ndash classicthesis ]

63 benchmark annotation 77

procedure gave us in total 110 equivalence class trend candidates and107 equivalence class downtrend candidates that we then annotatefor our benchmark8

63 benchmark annotation

To annotate the equivalence class (down)trend candidates obtainedfrom our model we used the Figure-Eight9 platform It provides anintegrated quality-control mechanism and a training phase before theactual annotation (Kolhatkar Zinsmeister and Hirst 2013) whichminimizes the rate of poorly qualified annotators

We design our annotation task as follows A worker is shown adescription of a particular (down)trend candidate along with the list ofpossible (down)trend tags Descriptions are mixed and presented inrandom order The temporal presentation order of candidates is alsoarbitrary The worker is asked to pick from the list with (down)trendtags an item that matches the cluster (resp candidate) best Evenif there are several items that are a possible match the worker mustchoose only one If there is no matching item the worker can selectthe option other Three different workers evaluated each cluster andthe final label is the majority label If there is no majority label wepresent the cluster repeatedly to the workers until there is a majority

The descriptors of candidates are collected through the followingprocedure

bull First we review the documents assigned to a candidate cluster

bull Then we extract keywords and titles of the papers attributed tothe cluster Semantic Scholar provides keywords and titles as apart of the metadata

bull Finally we compute the top 15 most frequent keywords andrandomly select 15 titles These keywords and titles are thedescriptor of the cluster candidate presented to crowd-workers

We create the (down)trend tags based on keywords titles and meta-data such as publication venue The cluster candidates were manu-ally labeled by graduate employees (ie domain experts in the com-puter science domain) considering the abovementioned items

631 Inter-annotator Agreement

To ensure the quality of obtained annotations we computed the InterAnnotator Agreement (IAA) - a measure of how well two (or more)

8Note that the overall number of 217 refers to the number of the (down)trendcandidates and not to the number of documents Each candidate contains hundredsof documents

9Formerly called CrowdFlower

[ September 4 2020 at 1017 ndash classicthesis ]

78 benchmark for scientific trend detection

annotators can make the same decision for a particular category Todo that we used the following metrics

krippendorffrsquos α is a reliability coefficient developed to measurethe agreement among observers or measuring instruments draw-ing distinctions among typically unstructured phenomena orassign computable values to them (Krippendorff 2011) Wechoose Krippendorffrsquos α as an inter-annotator agreement mea-sure because it ndash unlike other specialized coefficients ndash is a gen-eralization of several reliability criteria and applies to situationslike ours where we have more than two annotators and a givenannotator only labels a subset of the data (ie we have to dealwith missing values)

Krippendorff (2011) has proposed several metric variations eachof which is associated with a specific set of weights We useKrippendorffrsquos α considering any number of observers and miss-ing data In this case Krippendorffrsquos α for the overall numberof 217 annotated documents is α = 0798 According to Landisand Koch (1977) this value corresponds to a substantial level ofagreement

figure-eight agreement rate Figure-Eight10 provides an agree-ment score c for each annotated unit u which is based onthe majority vote of the trusted workers The score has beenproven to perform well compared to the classic metrics (Kol-hatkar Zinsmeister and Hirst 2013) We computed the averageC for the 217 annotated documents C is 0799

Since both Krippendorffrsquos α and internal Figure Eight IAA are ratherhigh we consider the obtained annotations for the benchmark to beof good quality

632 Gold Trends

We compiled a list of computer science trends for the years 2016based on the analysis of the twelve survey publications (Ankerholz2016 Augustin 2016 Brooks 2016 Frot 2016 Harriet 2015 IEEEComputer Society 2016 Markov 2015 Meyerson and Mariette 2016Nessma 2015 Reese 2015 Rivera 2016 Zaino 2016) For this yearwe identified a total of 31 trends see Table 20

Since there is a variation as to the name of a specific trend wecreated a unique name for each of them Out of the 31 trends 28(90 3) are instantiated as trends by our model that we confirmedin the crowdsourcing procedure11 We searched for the three missingtrends using a variety of techniques (ie inspection of non-trendsdowntrends random browsing of documents not assigned to trend

10Former CrowdFlower platform11The author verified this based on her judgment of equivalence of one of the 31

gold trend names in the literature with one of the trend tags

[ September 4 2020 at 1017 ndash classicthesis ]

64 evaluation measure for trend detection 79

candidates and keyword searches) Two of the missing trends donot seem to occur in the collection at all ldquoHuman Augmentationrdquo andldquoFood and Water Technologyrdquo The third missing trend ldquoCryptocurrencyand Cryptographyrdquo does occur in the collection but the occurrencerate is very small (ca 001) We do not consider these three trendsin the rest of the paper

Computer Science Trend Computer Science Trend

Autonomous Agents and Systems Natural Language Processing

Autonomous Vehicles Open Source

Big Data Analytics Privacy

Bioinformatics and (Bio) Neuroscience Quantum Computing

Biometrics and Personal Identification Recommender Systems and Social Networks

Cloud Computing and Software as a Service Reinforcement Learning

Cryptocurrency and Cryptographylowast Renewable Energy

Cyber Security Robotics

E-business Semantic Web

Food and Water Technologylowast Sentiment and Emotion Analysis

Game-based Learning Smart Cities

Games and (Virtual) Augmented Reality Supply Chains and RFIDs

Human Augmentationlowast Technology for Climate Change

MachineDeep Learning Transportation and Energy

Medical Advances and DNA Computing Wearables

Mobile Computing

Table 20 31 areas identified as computer science trends for year 2016 in themedia 28 of these trends are instantiated by our model as welland thus covered in our benchmark Topics marked with asterisk(lowast) were not found in our underlying data collection

In summary based on our analysis of meta-literature we identified28 gold trends that also occur in our collection We will use them asthe gold standard that trend detection algorithms should aim to discover

64 evaluation measure for trend detection

Whether something is a trend is a graded concept For this reasonwe adopt an evaluation measure based on a ranking of candidatesspecifically Mean Average Precision (MAP) The score denotes theaverage of the precision values of ranks of correctly identified andnon-redundant trends

We approach trend detection as computing a ranked list of sets ofdocuments where each set is a trend candidate We refer to docu-ments as trendy downtrendy and flat depending on which class theyare assigned to in our benchmark We consider a trend candidate

[ September 4 2020 at 1017 ndash classicthesis ]

80 benchmark for scientific trend detection

c as a correct recognition of gold trend t if it satisfies the followingrequirements

bull c is trendy

bull t is the largest trend in c

bull |tcap c||c| gt ρ where ρ is a coverage parameter

bull t was not earlier recognized in the list

These criteria give us a true positive (ie recognition of a gold trend)or false positive (otherwise) for every position of the ranked list anda precision score for each trend Precision for a trend that was notobserved is 0 We then compute MAP see Table 21 for an example

Gold Trends Gold Downtrends

Cloud Computing Compilers

Bioinformatics Petri Nets

Sentiment Analysis Fuzzy Sets

Privacy Routing Protocols

Trend Candidate |tcap c||c| tpfp P

c1 Cloud Computing 060 tp 100

c2 Privacy 020 fp 050

c3 Cloud Computing 070 fp 033

c4 Bioinformatics 055 tp 050

c5 Sentiment Analysis 095 tp 060

c6 Sentiment Analysis 059 fp 050

c7 Bioinformatics 041 fp 043

c8 Compilers 060 fp 038

c9 Petri Nets 080 fp 033

c10 Privacy 033 fp 030

Table 21 Example of proposed MAP evaluation (with ρ = 05) MAP is theaverage of the precision values 10 (Cloud Computing) 05 (Bioin-formatics) 06 (Sentiment Analysis) and 00 (Privacy) ie MAP =214 = 0525 True positive tp False positive fp Precision P

65 trend detection baselines

In this section we investigate the impact of four different configura-tion choices on the performance of trend detection by way of ablation

[ September 4 2020 at 1017 ndash classicthesis ]

65 trend detection baselines 81

651 Configurations

abstracts vs full texts The underlying data collection containsboth abstracts and full texts12 Our initial expectation was thatthe full text of a document comprises more information thanthe abstract and it should be a more reliable basis for trend de-tection This is one of the configurations we test in the ablationWe employ our proposed method on full text (solely documenttexts without abstracts) and abstract (only abstract texts) collec-tion separately and observe the results

stratification Due to the uneven distribution of documents overthe years the last years (with increased volume of publications)may get too much impact on results compared to earlier yearsTo investigate this we conduct an ablation experiment wherewe compare (i) randomly sampled 160k documents from theentire collection and (ii) stratified sampling In stratified sam-pling we select 10k documents for each of the years 2000 ndash 2015resulting in a sampled collection of 160k

length Lt of interval Clusters are ranked by trendiness and theresulting ranked list is evaluated To measure the trendiness orgrowth of topics over time we fit a line to an interval (ini|i0 6i lt i0 + Lt by linear regression where Lt is the length of theinterval i is one of the 16 years (2000 2015) and ni is thenumber of documents that were assigned to cluster c in thatyear As a simple default baseline we apply the regression to ahalf of an entire interval ie Lt = 8 There are nine such inter-vals in our 16-year period As the final measure of growth forthe cluster we take the maximal of the nine individual slopesTo determine how much this configuration choice affects our re-sults we also test a linear regression over the entire time spanie Lt = 16 In this case there is a single interval

clustering method We consider Gaussian Mixture Models (GMM)and k-means We use GMM with spherical type of covariancewhere each component has the same variance For both weproceed as described in Section 623

652 Experimental Setup and Results

We conducted experiments on the Semantic Scholar corpus and eval-uated a ranked list of trend candidates against the benchmark con-sidering the configurations mentioned above We adopted Mean Av-erage Precision (MAP) as the principal evaluation measure As a sec-ondary evaluation measure we computed recall at 50 (R50) whichestimates the percentage of gold trends found in the list of 50 highest-ranked trend candidates We found that the configuration (0) in Ta-

12Semantic Scholar provided the underlying dataset at the beginning of our workin 2016

[ September 4 2020 at 1017 ndash classicthesis ]

82 benchmark for scientific trend detection

ble 22 works best We then conducted ablation experiments to dis-cover the importance of the configuration choices

(0) (1) (2) (3) (4)

document part A F A A A

stratification yes yes no yes yes

Lt 8 8 8 16 8

clustering GMMsph GMMsph GMMsph GMMsph kmicro

MAP 36 (03) 07 (19) 30 (03) 32 (04) 34 (21)

R50 avg 50 (012) 25 (018) 50 (016) 46 (018) 53(014)

R50 max 61 37 53 61 61

Table 22 Ablation results Standard deviations are in parentheses GMMclustering performed with the spherical (sph) type of covarianceK-means clustering is denoted as kmicro in the ablation table

comparing (0) and (1) 13 we observed that abstracts (A) are amore reliable representation for our trend detection baselinethan full texts (F) The possible explanation for that observa-tion is that an abstract is a summary of a scientific paper thatcovers only the central points and is semantically very conciseand rich In opposite the full text contains numerous parts thatare secondary (ie future work related work) and thus mayskew the documentrsquos overall meaning

comparing (0) and (2) we recognized that stratification (yes) im-proves results compared to the setting with non-stratified (no)randomly sampled data We would expect the effect of stratifi-cation to be even more substantial for collections in which thedistribution of documents over time is more skewed than in our

comparing (0) and (3) the length of the interval is a vital config-uration choice As we assumed the 16 year interval is ratherlong and 8 year interval could be a better choice primarily ifone aims to find short-term trends

comparing (0) and (4) we recognized that topics we obtain fromk-means have a similar nature to those from GMM HoweverGMM that take variance and covariance into account and esti-mate a soft assignment still perform slightly better

66 benchmark description

Below we report the contents of the TRENDNERT benchmark14

1 Two records containing information on documents One filewith the documents assigned to (down)trends and the otherfile with documents assigned to flat topics

13Numbers (0) (1) (2) (3) (4) are the configuration choices in the ablation table14httpsdoiorg1017605OSFIODHZCT

[ September 4 2020 at 1017 ndash classicthesis ]

67 summary 83

2 A script to produce document hash codes

3 A file to map every hash code to internal IDs in the benchmark

4 A script to compute MAP (default setting is ρ = 25)

5 READMEmd a file with guidance notes

Every document in the benchmark contains the following informa-tion

bull Paper ID internal ID of documents assigned to each documentin the original Semantic Scholar collection

bull Cluster ID unique ID of meta-cluster where the document wasobserved

bull LabelTag Name of the gold (down)trend assigned by crowd-workers (eg Cyber Security Fuzzy Sets and Systems)

bull Type of (down)trend candidate trend (T ) downtrend (D) or flattopic (F)

bull Hash ID15 of each paper from the original Semantic Scholarcollection

67 summary

Emerging trend detection is a promising task for Intelligent ProcessAutomation (IPA) that allows companies and funding agencies to au-tomatically analyze extensive textual collections in order to detectemerging fields and forecast the future employing machine learningalgorithms Thus the availability of publicly available benchmarksfor trend detection is of significant importance as only thereby canone evaluate the performance and robustness of the techniques em-ployed

Within this work we release TRENDNERT ndash the first publicly availablebenchmark for (down)trend detection that offers a realistic setting fordeveloping trend detection algorithms TRENDNERT also supportsdowntrend detection ndash a significant problem that was not addressedbefore We also present several experimental findings on trend detec-tion First stratification improves trend detection if the distribution ofdocuments is skewed over time Third abstract-based trend detectionperforms better than full text if straightforward models are used

68 future work

There are possible directions for future work

bull The presented approach to estimate the trendiness of the topicby means of linear regression is straightforward and can be

15MD5-hash of First Author + Title + Year

[ September 4 2020 at 1017 ndash classicthesis ]

84 benchmark for scientific trend detection

replaced by more sophisticated versions like Kendallrsquos τ cor-relation coefficient or the Least Squares Regression Smoother(LOESS) (Cleveland 1979)

bull It also would be challenging to replace the k-means clustering al-gorithm that we adopted within this work with the LDA modelwhile retaining the document representations as it was donein Dieng Ruiz and Blei 2019 The case to investigate wouldbe then whether the topic modeling outperforms the simpleclustering algorithm in this particular setting And would theresulting system benefit from the topic modeling algorithm

bull Furthermore it would be of interest to conduct more researchon the following What kind of document representation canmake effective use of the information in full texts Recentlyproposed contextualised word embeddings could be a possibledirection for these experiments

[ September 4 2020 at 1017 ndash classicthesis ]

7D I S C U S S I O N A N D F U T U R E W O R K

As we observed in the two parts of this thesis Intelligent Process Au-tomation (IPA) is a challenging research area due to its multifaceted-ness and appliance in a real-world scenario Its main objective is toautomatize time-consuming and routine tasks and free up the knowl-edge worker for more cognitively demanding tasks However thistask addresses many difficulties such as a lack of resources for bothtraining and evaluation of algorithms as well as an insufficient per-formance of machine learning techniques when applied to real-worlddata and practical problems In this thesis we investigated two IPA

applications within the Natural Language Processing (NLP) domainndash conversational agents and predictive analytics ndash as well as addressedsome of the most common issues

bull In the first part of the thesis we have addressed the challenge ofimplementing a robust conversational system for IPA purposeswithin the practical e-learning domain considering the condi-tions of missing training data The primary purpose of the re-sulting system is to perform the onboarding process betweenthe tutors and students This onboarding reduces repetitiveand time-consuming activities while allowing tutors to focuson solely mathematical questions which is the core task withthe most impact on students Besides that by interacting withusers the system augments the resources with structured andlabeled training data that we used to implement learnable dia-logue components

The realization of such a system was associated with severalchallenges Among others were missing structured data andambiguous user-generated content that requires a precise lan-guage understanding Also the system that simultaneouslycommunicates with a large number of users has to maintainadditional meta policies such as fallback and check-up policiesto hold the conversation intelligently

bull Moreover we have shown the rule-based dialogue systemrsquos po-tential extension using transfer learning We utilized a smalldataset of structured dialogues obtained in a trial-run of therule-based system and applied the current state-of-the-art Bidi-rectional Encoder Representations from Transformers (BERT) ar-chitecture to solve the domain-specific tasks Specifically weretrained two of three components in the conversational systemndash Named Entity Recognition (NER) and Dialogue Manager (DM)(ie NAP task) ndash and confirmed the applicability of such algo-

85

[ September 4 2020 at 1017 ndash classicthesis ]

86 discussion and future work

rithms in a real-world low-resource scenario by showing thereasonably high performance of resulting models

bull Last but not least in the second part of the thesis we addressedthe issue of missing publicly available benchmarks for the eval-uation of machine learning algorithms for the trend detectiontask To the best of our knowledge we released the first openbenchmark for this purpose which offers a realistic setting fordeveloping trend detection algorithms Besides that this bench-mark supports downtrend detection ndash a significant problemthat was not sufficiently addressed before We additionally pro-posed the evaluation measure and presented several experimen-tal findings on trend detection

The ultimate objective of Intelligent Process Automation (IPA) shouldbe a creation of robust algorithms in low-resource and domain-specificconditions This is still a challenging task however some of the theexperiments presented in this thesis revealed that this goal could bepotentially achieved by means of Transfer Learning (TL) techniques

[ September 4 2020 at 1017 ndash classicthesis ]

B I B L I O G R A P H Y

Agrawal Amritanshu Wei Fu and Tim Menzies (2018) bdquoWhat iswrong with topic modeling And how to fix it using search-basedsoftware engineeringldquo In Information and Software Technology 98pp 74ndash88 (cit on p 66)

Alan M (1950) bdquoTuringldquo In Computing machinery and intelligenceMind 59236 pp 433ndash460 (cit on p 7)

Alghamdi Rubayyi and Khalid Alfalqi (2015) bdquoA survey of topicmodeling in text miningldquo In Int J Adv Comput Sci Appl(IJACSA)61 (cit on pp 64ndash66)

Ammar Waleed Dirk Groeneveld Chandra Bhagavatula Iz BeltagyMiles Crawford Doug Downey Jason Dunkelberger Ahmed El-gohary Sergey Feldman Vu Ha et al (2018) bdquoConstruction ofthe literature graph in semantic scholarldquo In arXiv preprint arXiv1805 02262 (cit on p 72)

Ankerholz Amber (2016) 2016 Future of Open Source Survey Says OpenSource Is The Modern Architecture httpswwwlinuxcomnews2016-future-open-source-survey-says-open-source-modern-

architecture (cit on p 78)Asooja Kartik Georgeta Bordea Gabriela Vulcu and Paul Buitelaar

(2016) bdquoForecasting emerging trends from scientific literatureldquoIn Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) pp 417ndash420 (cit on p 68)

Augustin Jason (2016) Emergin Science and Technology Trends A Syn-thesis of Leading Forecast httpwwwdefenseinnovationmarket-placemilresources2016_SciTechReport_16June2016pdf (citon p 78)

Blei David M and John D Lafferty (2006) bdquoDynamic topic modelsldquoIn Proceedings of the 23rd international conference on Machine learn-ing ACM pp 113ndash120 (cit on pp 66 67)

Blei David M John D Lafferty et al (2007) bdquoA correlated topic modelof scienceldquo In The Annals of Applied Statistics 11 pp 17ndash35 (citon p 66)

Blei David M Andrew Y Ng and Michael I Jordan (2003) bdquoLatentdirichlet allocationldquo In Journal of machine Learning research 3Janpp 993ndash1022 (cit on p 66)

Bloem Peter (2019) Transformers from Scratch httpwwwpeterbloemnlblogtransformers (cit on p 45)

Bocklisch Tom Joey Faulkner Nick Pawlowski and Alan Nichol(2017) bdquoRasa Open source language understanding and dialoguemanagementldquo In arXiv preprint arXiv171205181 (cit on p 15)

Bolelli Levent Seyda Ertekin and C Giles (2009) bdquoTopic and trenddetection in text collections using latent dirichlet allocationldquo InAdvances in Information Retrieval pp 776ndash780 (cit on p 66)

87

[ September 4 2020 at 1017 ndash classicthesis ]

88 Bibliography

Bolelli Levent Seyda Ertekin Ding Zhou and C Lee Giles (2009)bdquoFinding topic trends in digital librariesldquo In Proceedings of the 9thACMIEEE-CS joint conference on Digital libraries ACM pp 69ndash72

(cit on p 66)Bordes Antoine Y-Lan Boureau and Jason Weston (2016) bdquoLearning

end-to-end goal-oriented dialogldquo In arXiv preprint arXiv160507683(cit on pp 16 17)

Brooks Chuck (2016) 7 Top Tech Trends Impacting Innovators in 2016httpinnovationexcellencecomblog201512267-top-

tech-trends-impacting-innovators-in-2016 (cit on p 78)Burtsev Mikhail Alexander Seliverstov Rafael Airapetyan Mikhail

Arkhipov Dilyara Baymurzina Eugene Botvinovsky NickolayBushkov Olga Gureenkova Andrey Kamenev Vasily Konovalovet al (2018) bdquoDeepPavlov An Open Source Library for Conver-sational AIldquo In (cit on p 15)

Chang Jonathan David M Blei et al (2010) bdquoHierarchical relationalmodels for document networksldquo In The Annals of Applied Statis-tics 41 pp 124ndash150 (cit on p 66)

Chen Hongshen Xiaorui Liu Dawei Yin and Jiliang Tang (2017)bdquoA survey on dialogue systems Recent advances and new fron-tiersldquo In Acm Sigkdd Explorations Newsletter 192 pp 25ndash35 (citon pp 14ndash16)

Chen Lingzhen Alessandro Moschitti Giuseppe Castellucci AndreaFavalli and Raniero Romagnoli (2018) bdquoTransfer Learning for In-dustrial Applications of Named Entity Recognitionldquo In NL4AIAI IA pp 129ndash140 (cit on p 43)

Cleveland William S (1979) bdquoRobust locally weighted regression andsmoothing scatterplotsldquo In Journal of the American statistical asso-ciation 74368 pp 829ndash836 (cit on p 84)

Collobert Ronan Jason Weston Leacuteon Bottou Michael Karlen KorayKavukcuoglu and Pavel Kuksa (2011) bdquoNatural language pro-cessing (almost) from scratchldquo In Journal of machine learning re-search 12Aug pp 2493ndash2537 (cit on p 16)

Cuayaacutehuitl Heriberto Simon Keizer and Oliver Lemon (2015) bdquoStrate-gic dialogue management via deep reinforcement learningldquo InarXiv preprint arXiv151108099 (cit on p 14)

Cui Lei Shaohan Huang Furu Wei Chuanqi Tan Chaoqun Duanand Ming Zhou (2017) bdquoSuperagent A customer service chat-bot for e-commerce websitesldquo In Proceedings of ACL 2017 SystemDemonstrations pp 97ndash102 (cit on p 8)

Curiskis Stephan A Barry Drake Thomas R Osborn and Paul JKennedy (2019) bdquoAn evaluation of document clustering and topicmodelling in two online social networks Twitter and Redditldquo InInformation Processing amp Management (cit on p 74)

Dai Zihang Zhilin Yang Yiming Yang William W Cohen Jaime Car-bonell Quoc V Le and Ruslan Salakhutdinov (2019) bdquoTransformer-xl Attentive language models beyond a fixed-length contextldquo InarXiv preprint arXiv190102860 (cit on pp 43 44)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 89

Daniel Ben (2015) bdquoBig Data and analytics in higher education Op-portunities and challengesldquo In British journal of educational tech-nology 465 pp 904ndash920 (cit on p 22)

Deerwester Scott Susan T Dumais George W Furnas Thomas K Lan-dauer and Richard Harshman (1990) bdquoIndexing by latent seman-tic analysisldquo In Journal of the American society for information sci-ence 416 pp 391ndash407 (cit on p 65)

Deoras Anoop Kaisheng Yao Xiaodong He Li Deng Geoffrey Ger-son Zweig Ruhi Sarikaya Dong Yu Mei-Yuh Hwang and Gre-goire Mesnil (2015) Assignment of semantic labels to a sequence ofwords using neural network architectures US Patent App 14016186

(cit on p 13)Devlin Jacob Ming-Wei Chang Kenton Lee and Kristina Toutanova

(2018) bdquoBert Pre-training of deep bidirectional transformers forlanguage understandingldquo In arXiv preprint arXiv181004805 (citon pp 42ndash45)

Dhingra Bhuwan Lihong Li Xiujun Li Jianfeng Gao Yun-NungChen Faisal Ahmed and Li Deng (2016) bdquoTowards end-to-end re-inforcement learning of dialogue agents for information accessldquoIn arXiv preprint arXiv160900777 (cit on p 16)

Dieng Adji B Francisco JR Ruiz and David M Blei (2019) bdquoTopicModeling in Embedding Spacesldquo In arXiv preprint arXiv190704907(cit on pp 74 84)

Donahue Jeff Yangqing Jia Oriol Vinyals Judy Hoffman Ning ZhangEric Tzeng and Trevor Darrell (2014) bdquoDecaf A deep convolu-tional activation feature for generic visual recognitionldquo In Inter-national conference on machine learning pp 647ndash655 (cit on p 43)

Eric Mihail and Christopher D Manning (2017) bdquoKey-value retrievalnetworks for task-oriented dialogueldquo In arXiv preprint arXiv 170505414 (cit on p 16)

Fang Hao Hao Cheng Maarten Sap Elizabeth Clark Ari HoltzmanYejin Choi Noah A Smith and Mari Ostendorf (2018) bdquoSoundingboard A user-centric and content-driven social chatbotldquo In arXivpreprint arXiv180410202 (cit on p 10)

Friedrich Fabian Jan Mendling and Frank Puhlmann (2011) bdquoPro-cess model generation from natural language textldquo In Interna-tional Conference on Advanced Information Systems Engineering pp 482

ndash496 (cit on p 2)Frot Mathilde (2016) 5 Trends in Computer Science Research https

wwwtopuniversitiescomcoursescomputer-science-info-

rmation-systems5-trends-computer-science-research (citon p 78)

Glasmachers Tobias (2017) bdquoLimits of end-to-end learningldquo In arXivpreprint arXiv170408305 (cit on pp 16 17)

Goddeau David Helen Meng Joseph Polifroni Stephanie Seneffand Senis Busayapongchai (1996) bdquoA form-based dialogue man-ager for spoken language applicationsldquo In Proceeding of FourthInternational Conference on Spoken Language Processing ICSLPrsquo96Vol 2 IEEE pp 701ndash704 (cit on p 14)

[ September 4 2020 at 1017 ndash classicthesis ]

90 Bibliography

Gohr Andre Alexander Hinneburg Rene Schult and Myra Spiliopou-lou (2009) bdquoTopic evolution in a stream of documentsldquo In Pro-ceedings of the 2009 SIAM International Conference on Data MiningSIAM pp 859ndash870 (cit on p 65)

Gollapalli Sujatha Das and Xiaoli Li (2015) bdquoEMNLP versus ACLAnalyzing NLP research over timeldquo In EMNLP pp 2002ndash2006

(cit on p 72)Griffiths Thomas L and Mark Steyvers (2004) bdquoFinding scientific top-

icsldquo In Proceedings of the National academy of Sciences 101suppl 1pp 5228ndash5235 (cit on p 65)

Griffiths Thomas L Michael I Jordan Joshua B Tenenbaum andDavid M Blei (2004) bdquoHierarchical topic models and the nestedChinese restaurant processldquo In Advances in neural information pro-cessing systems pp 17ndash24 (cit on p 66)

Hall David Daniel Jurafsky and Christopher D Manning (2008) bdquoStu-dying the history of ideas using topic modelsldquo In Proceedingsof the conference on empirical methods in natural language processingAssociation for Computational Linguistics pp 363ndash371 (cit onp 66)

Harriet Taylor (2015) Privacy will hit tipping point in 2016 httpwwwcnbccom20151109privacy-will-hit-tipping-point-

in-2016html (cit on p 78)He Jiangen and Chaomei Chen (2018) bdquoPredictive Effects of Novelty

Measured by Temporal Embeddings on the Growth of ScientificLiteratureldquo In Frontiers in Research Metrics and Analytics 3 p 9

(cit on pp 64 68 71)He Junxian Zhiting Hu Taylor Berg-Kirkpatrick Ying Huang and

Eric P Xing (2017) bdquoEfficient correlated topic modeling with topicembeddingldquo In Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ACM pp 225ndash233 (cit on pp 66 67)

He Qi Bi Chen Jian Pei Baojun Qiu Prasenjit Mitra and Lee Giles(2009) bdquoDetecting topic evolution in scientific literature How cancitations helpldquo In Proceedings of the 18th ACM conference on In-formation and knowledge management ACM pp 957ndash966 (cit onp 68)

Henderson Matthew Blaise Thomson and Jason D Williams (2014)bdquoThe second dialog state tracking challengeldquo In Proceedings of the15th Annual Meeting of the Special Interest Group on Discourse andDialogue (SIGDIAL) pp 263ndash272 (cit on p 15)

Henderson Matthew Blaise Thomson and Steve Young (2013) bdquoDeepneural network approach for the dialog state tracking challengeldquoIn Proceedings of the SIGDIAL 2013 Conference pp 467ndash471 (cit onp 14)

Hess Ann Hari Iyer and William Malm (2001) bdquoLinear trend analy-sis a comparison of methodsldquo In Atmospheric Environment 3530Visibility Aerosol and Atmospheric Optics pp 5211 ndash5222 issn1352-2310 doi httpsdoiorg101016S1352- 2310(01)00342-9 url httpwwwsciencedirectcomsciencearticlepiiS1352231001003429 (cit on p 75)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 91

Hofmann Thomas (1999) bdquoProbabilistic latent semantic analysisldquo InProceedings of the Fifteenth conference on Uncertainty in artificial in-telligence Morgan Kaufmann Publishers Inc pp 289ndash296 (cit onp 65)

Honnibal Matthew and Ines Montani (2017) bdquospacy 2 Natural lan-guage understanding with bloom embeddings convolutional neu-ral networks and incremental parsingldquo In To appear 7 (cit onp 15)

Howard Jeremy and Sebastian Ruder (2018) bdquoUniversal languagemodel fine-tuning for text classificationldquo In arXiv preprint arXiv180106146 (cit on p 44)

IEEE Computer Society (2016) Top 9 Computing Technology Trends for2016 httpswwwscientificcomputingcomnews201601top-9-computing-technology-trends-2016 (cit on p 78)

Ivancic Lucija Dalia Suša Vugec and Vesna Bosilj Vukšic (2019) bdquoRobo-tic Process Automation Systematic Literature Reviewldquo In In-ternational Conference on Business Process Management Springerpp 280ndash295 (cit on pp 1 2)

Jain Mohit Pratyush Kumar Ramachandra Kota and Shwetak NPatel (2018) bdquoEvaluating and informing the design of chatbotsldquoIn Proceedings of the 2018 Designing Interactive Systems ConferenceACM pp 895ndash906 (cit on p 8)

Khanpour Hamed Nishitha Guntakandla and Rodney Nielsen (2016)bdquoDialogue act classification in domain-independent conversationsusing a deep recurrent neural networkldquo In Proceedings of COL-ING 2016 the 26th International Conference on Computational Lin-guistics Technical Papers pp 2012ndash2021 (cit on p 13)

Kim Young-Bum Karl Stratos Ruhi Sarikaya and Minwoo Jeong(2015) bdquoNew transfer learning techniques for disparate label setsldquoIn Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference onNatural Language Processing (Volume 1 Long Papers) pp 473ndash482

(cit on p 43)Kolhatkar Varada Heike Zinsmeister and Graeme Hirst (2013) bdquoAn-

notating anaphoric shell nouns with their antecedentsldquo In Pro-ceedings of the 7th Linguistic Annotation Workshop and Interoperabil-ity with Discourse pp 112ndash121 (cit on pp 77 78)

Kontostathis April Leon M Galitsky William M Pottenger SomaRoy and Daniel J Phelps (2004) bdquoA survey of emerging trend de-tection in textual data miningldquo In Survey of text mining Springerpp 185ndash224 (cit on p 63)

Krippendorff Klaus (2011) bdquoComputing Krippendorffrsquos alpha- relia-bilityldquo In Annenberg School for Communication (ASC) DepartmentalPapers 43 (cit on p 78)

Kurata Gakuto Bing Xiang Bowen Zhou and Mo Yu (2016) bdquoLever-aging sentence-level information with encoder lstm for semanticslot fillingldquo In arXiv preprint arXiv160101530 (cit on p 13)

Landis J Richard and Gary G Koch (1977) bdquoThe measurement ofobserver agreement for categorical dataldquo In biometrics pp 159ndash174 (cit on p 78)

[ September 4 2020 at 1017 ndash classicthesis ]

92 Bibliography

Le Minh-Hoang Tu-Bao Ho and Yoshiteru Nakamori (2005) bdquoDe-tecting emerging trends from scientific corporaldquo In InternationalJournal of Knowledge and Systems Sciences 22 pp 53ndash59 (cit onp 68)

Le Quoc V and Tomas Mikolov (2014) bdquoDistributed Representationsof Sentences and Documentsldquo In ICML Vol 14 pp 1188ndash1196

(cit on p 74)Lee Ji Young and Franck Dernoncourt (2016) bdquoSequential short-text

classification with recurrent and convolutional neural networksldquoIn arXiv preprint arXiv160303827 (cit on p 13)

Leopold Henrik Han van der Aa and Hajo A Reijers (2018) bdquoIden-tifying candidate tasks for robotic process automation in textualprocess descriptionsldquo In Enterprise Business-Process and Informa-tion Systems Modeling Springer pp 67ndash81 (cit on p 2)

Levenshtein Vladimir I (1966) bdquoBinary codes capable of correctingdeletions insertions and reversalsldquo In Soviet physics doklady 8pp 707ndash710 (cit on p 25)

Liao Vera Masud Hussain Praveen Chandar Matthew Davis Yasa-man Khazaeni Marco Patricio Crasso Dakuo Wang Michael Mu-ller N Sadat Shami Werner Geyer et al (2018) bdquoAll Work andNo Playldquo In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems ACM p 3 (cit on p 8)

Lowe Ryan Thomas Nissan Pow Iulian Vlad Serban Laurent Char-lin Chia-Wei Liu and Joelle Pineau (2017) bdquoTraining end-to-enddialogue systems with the ubuntu dialogue corpusldquo In Dialogueamp Discourse 81 pp 31ndash65 (cit on p 22)

Luger Ewa and Abigail Sellen (2016) bdquoLike having a really bad PAthe gulf between user expectation and experience of conversa-tional agentsldquo In Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems ACM pp 5286ndash5297 (cit on p 8)

MacQueen James (1967) bdquoSome methods for classification and analy-sis of multivariate observationsldquo In Proceedings of the fifth Berkeleysymposium on mathematical statistics and probability Vol 1 OaklandCA USA pp 281ndash297 (cit on p 74)

Manning Christopher D Prabhakar Raghavan and Hinrich Schuumltze(2008) Introduction to Information Retrieval Web publication at in-formationretrievalorg Cambridge University Press (cit on p 74)

Markov Igor (2015) 13 Of 2015rsquos Hottest Topics In Computer ScienceResearch httpswwwforbescomsitesquora2015042213-of-2015s-hottest-topics-in-computer-science-research

fb0d46b1e88d (cit on p 78)Martin James H and Daniel Jurafsky (2009) Speech and language pro-

cessing An introduction to natural language processing computationallinguistics and speech recognition PearsonPrentice Hall Upper Sad-dle River

Mei Qiaozhu Deng Cai Duo Zhang and ChengXiang Zhai (2008)bdquoTopic modeling with network regularizationldquo In Proceedings ofthe 17th international conference on World Wide Web ACM pp 101ndash110 (cit on p 65)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 93

Meyerson Bernard and DiChristina Mariette (2016) These are the top10 emerging technologies of 2016 httpwww3weforumorgdocsGAC16_Top10_Emerging_Technologies_2016_reportpdf (cit onp 78)

Mikolov Tomas Kai Chen Greg Corrado and Jeffrey Dean (2013)bdquoEfficient estimation of word representations in vector spaceldquo InarXiv preprint arXiv13013781 (cit on p 74)

Miller Alexander Adam Fisch Jesse Dodge Amir-Hossein KarimiAntoine Bordes and Jason Weston (2016) bdquoKey-value memorynetworks for directly reading documentsldquo In (cit on p 16)

Mohanty Soumendra and Sachin Vyas (2018) bdquoHow to Compete inthe Age of Artificial Intelligenceldquo In (cit on pp III V 1 2)

Moiseeva Alena and Hinrich Schuumltze (2020) bdquoTRENDNERT A Bench-mark for Trend and Downtrend Detection in a Scientific DomainldquoIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial In-telligence (cit on p 71)

Moiseeva Alena Dietrich Trautmann Michael Heimann and Hin-rich Schuumltze (2020) bdquoMultipurpose Intelligent Process Automa-tion via Conversational Assistantldquo In arXiv preprint arXiv200102284 (cit on pp 21 41)

Monaghan Fergal Georgeta Bordea Krystian Samp and Paul Buite-laar (2010) bdquoExploring your research Sprinkling some saffron onsemantic web dog foodldquo In Semantic Web Challenge at the Inter-national Semantic Web Conference Vol 117 Citeseer pp 420ndash435

(cit on p 68)Monostori Laacuteszloacute Andraacutes Maacuterkus Hendrik Van Brussel and E West-

kaumlmpfer (1996) bdquoMachine learning approaches to manufactur-ingldquo In CIRP annals 452 pp 675ndash712 (cit on p 15)

Moody Christopher E (2016) bdquoMixing dirichlet topic models andword embeddings to make lda2vecldquo In arXiv preprint arXiv160502019 (cit on p 74)

Mou Lili Zhao Meng Rui Yan Ge Li Yan Xu Lu Zhang and ZhiJin (2016) bdquoHow transferable are neural networks in nlp applica-tionsldquo In arXiv preprint arXiv160306111 (cit on p 43)

Mrkšic Nikola Diarmuid O Seacuteaghdha Tsung-Hsien Wen Blaise Thom-son and Steve Young (2016) bdquoNeural belief tracker Data-drivendialogue state trackingldquo In arXiv preprint arXiv160603777 (citon p 14)

Nessma Joussef (2015) Top 10 Hottest Research Topics in Computer Sci-ence http www pouted com top - 10 - hottest - research -

topics-in-computer-science (cit on p 78)Nie Binling and Shouqian Sun (2017) bdquoUsing text mining techniques

to identify research trends A case study of design researchldquo InApplied Sciences 74 p 401 (cit on p 68)

Pan Sinno Jialin and Qiang Yang (2009) bdquoA survey on transfer learn-ingldquo In IEEE Transactions on knowledge and data engineering 2210pp 1345ndash1359 (cit on p 41)

Qu Lizhen Gabriela Ferraro Liyuan Zhou Weiwei Hou and Tim-othy Baldwin (2016) bdquoNamed entity recognition for novel types

[ September 4 2020 at 1017 ndash classicthesis ]

94 Bibliography

by transfer learningldquo In arXiv preprint arXiv161009914 (cit onp 43)

Radford Alec Jeffrey Wu Rewon Child David Luan Dario Amodeiand Ilya Sutskever (2019) bdquoLanguage models are unsupervisedmultitask learnersldquo In OpenAI Blog 18 (cit on p 44)

Reese Hope (2015) 7 trends for artificial intelligence in 2016 rsquoLike 2015on steroidsrsquo httpwwwtechrepubliccomarticle7-trends-for-artificial-intelligence-in-2016-like-2015-on-steroids

(cit on p 78)Rivera Maricel (2016) Is Digital Game-Based Learning The Future Of

Learning httpselearningindustrycomdigital-game-based-

learning-future (cit on p 78)Rotolo Daniele Diana Hicks and Ben R Martin (2015) bdquoWhat is an

emerging technologyldquo In Research Policy 4410 pp 1827ndash1843

(cit on p 64)Salatino Angelo (2015) bdquoEarly detection and forecasting of research

trendsldquo In (cit on pp 63 66)Serban Iulian V Alessandro Sordoni Yoshua Bengio Aaron Courville

and Joelle Pineau (2016) bdquoBuilding end-to-end dialogue systemsusing generative hierarchical neural network modelsldquo In Thirti-eth AAAI Conference on Artificial Intelligence (cit on p 11)

Sharif Razavian Ali Hossein Azizpour Josephine Sullivan and Ste-fan Carlsson (2014) bdquoCNN features off-the-shelf an astoundingbaseline for recognitionldquo In Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops pp 806ndash813 (citon p 43)

Shibata Naoki Yuya Kajikawa and Yoshiyuki Takeda (2009) bdquoCom-parative study on methods of detecting research fronts using dif-ferent types of citationldquo In Journal of the American Society for in-formation Science and Technology 603 pp 571ndash580 (cit on p 68)

Shibata Naoki Yuya Kajikawa Yoshiyuki Takeda and KatsumoriMatsushima (2008) bdquoDetecting emerging research fronts basedon topological measures in citation networks of scientific publica-tionsldquo In Technovation 2811 pp 758ndash775 (cit on p 68)

Small Henry (2006) bdquoTracking and predicting growth areas in sci-enceldquo In Scientometrics 683 pp 595ndash610 (cit on p 68)

Small Henry Kevin W Boyack and Richard Klavans (2014) bdquoIden-tifying emerging topics in science and technologyldquo In ResearchPolicy 438 pp 1450ndash1467 (cit on p 64)

Su Pei-Hao Nikola Mrkšic Intildeigo Casanueva and Ivan Vulic (2018)bdquoDeep Learning for Conversational AIldquo In Proceedings of the 2018Conference of the North American Chapter of the Association for Com-putational Linguistics Tutorial Abstracts pp 27ndash32 (cit on p 12)

Sutskever Ilya Oriol Vinyals and Quoc V Le (2014) bdquoSequence tosequence learning with neural networksldquo In Advances in neuralinformation processing systems pp 3104ndash3112 (cit on pp 16 17)

Trautmann Dietrich Johannes Daxenberger Christian Stab HinrichSchuumltze and Iryna Gurevych (2019) bdquoRobust Argument Unit Recog-nition and Classificationldquo In arXiv preprint arXiv190409688 (citon p 41)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 95

Tu Yi-Ning and Jia-Lang Seng (2012) bdquoIndices of novelty for emerg-ing topic detectionldquo In Information processing amp management 482pp 303ndash325 (cit on p 64)

Turing Alan (1949) London Times letter to the editor (cit on p 7)Ultes Stefan Lina Barahona Pei-Hao Su David Vandyke Dongho

Kim Inigo Casanueva Paweł Budzianowski Nikola Mrkšic Tsu-ng-Hsien Wen et al (2017) bdquoPydial A multi-domain statistical di-alogue system toolkitldquo In Proceedings of ACL 2017 System Demon-strations pp 73ndash78 (cit on p 15)

Vaswani Ashish Noam Shazeer Niki Parmar Jakob Uszkoreit LlionJones Aidan N Gomez Łukasz Kaiser and Illia Polosukhin (2017)bdquoAttention is all you needldquo In Advances in neural information pro-cessing systems pp 5998ndash6008 (cit on p 45)

Vincent Pascal Hugo Larochelle Yoshua Bengio and Pierre-AntoineManzagol (2008) bdquoExtracting and composing robust features withdenoising autoencodersldquo In Proceedings of the 25th internationalconference on Machine learning ACM pp 1096ndash1103 (cit on p 44)

Vodolaacuten Miroslav Rudolf Kadlec and Jan Kleindienst (2015) bdquoHy-brid dialog state trackerldquo In arXiv preprint arXiv151003710 (citon p 14)

Wang Alex Amanpreet Singh Julian Michael Felix Hill Omer Levyand Samuel R Bowman (2018) bdquoGlue A multi-task benchmarkand analysis platform for natural language understandingldquo InarXiv preprint arXiv180407461 (cit on p 44)

Wang Chong and David M Blei (2011) bdquoCollaborative topic modelingfor recommending scientific articlesldquo In Proceedings of the 17thACM SIGKDD international conference on Knowledge discovery anddata mining ACM pp 448ndash456 (cit on p 66)

Wang Xuerui and Andrew McCallum (2006) bdquoTopics over time anon-Markov continuous-time model of topical trendsldquo In Pro-ceedings of the 12th ACM SIGKDD international conference on Knowl-edge discovery and data mining ACM pp 424ndash433 (cit on pp 65ndash67)

Wang Zhuoran and Oliver Lemon (2013) bdquoA simple and generic be-lief tracking mechanism for the dialog state tracking challengeOn the believability of observed informationldquo In Proceedings ofthe SIGDIAL 2013 Conference pp 423ndash432 (cit on p 14)

Weizenbaum Joseph et al (1966) bdquoELIZAmdasha computer program forthe study of natural language communication between man andmachineldquo In Communications of the ACM 91 pp 36ndash45 (cit onp 7)

Wen Tsung-Hsien Milica Gasic Nikola Mrksic Pei-Hao Su DavidVandyke and Steve Young (2015) bdquoSemantically conditioned lstm-based natural language generation for spoken dialogue systemsldquoIn arXiv preprint arXiv150801745 (cit on p 14)

Wen Tsung-Hsien David Vandyke Nikola Mrksic Milica Gasic LinaM Rojas-Barahona Pei-Hao Su Stefan Ultes and Steve Young(2016) bdquoA network-based end-to-end trainable task-oriented dia-logue systemldquo In arXiv preprint arXiv160404562 (cit on pp 1116 17 22)

[ September 4 2020 at 1017 ndash classicthesis ]

96 Bibliography

Werner Alexander Dietrich Trautmann Dongheui Lee and RobertoLampariello (2015) bdquoGeneralization of optimal motion trajecto-ries for bipedal walkingldquo In pp 1571ndash1577 (cit on p 41)

Williams Jason D Kavosh Asadi and Geoffrey Zweig (2017) bdquoHybridcode networks practical and efficient end-to-end dialog controlwith supervised and reinforcement learningldquo In arXiv preprintarXiv170203274 (cit on p 17)

Williams Jason Antoine Raux and Matthew Henderson (2016) bdquoThedialog state tracking challenge series A reviewldquo In Dialogue ampDiscourse 73 pp 4ndash33 (cit on p 12)

Wu Chien-Sheng Richard Socher and Caiming Xiong (2019) bdquoGlobal-to-local Memory Pointer Networks for Task-Oriented DialogueldquoIn arXiv preprint arXiv190104713 (cit on pp 15ndash17)

Wu Yonghui Mike Schuster Zhifeng Chen Quoc V Le Moham-mad Norouzi Wolfgang Macherey Maxim Krikun Yuan CaoQin Gao Klaus Macherey et al (2016) bdquoGooglersquos neural ma-chine translation system Bridging the gap between human andmachine translationldquo In arXiv preprint arXiv160908144 (cit onp 45)

Xie Pengtao and Eric P Xing (2013) bdquoIntegrating Document Cluster-ing and Topic Modelingldquo In Proceedings of the 30th Conference onUncertainty in Artificial Intelligence (cit on p 74)

Yan Zhao Nan Duan Peng Chen Ming Zhou Jianshe Zhou andZhoujun Li (2017) bdquoBuilding task-oriented dialogue systems foronline shoppingldquo In Thirty-First AAAI Conference on Artificial In-telligence (cit on p 14)

Yogatama Dani Michael Heilman Brendan OrsquoConnor Chris DyerBryan R Routledge and Noah A Smith (2011) bdquoPredicting a scien-tific communityrsquos response to an articleldquo In Proceedings of the Con-ference on Empirical Methods in Natural Language Processing Asso-ciation for Computational Linguistics pp 594ndash604 (cit on p 65)

Young Steve Milica Gašic Blaise Thomson and Jason D Williams(2013) bdquoPomdp-based statistical spoken dialog systems A reviewldquoIn Proceedings of the IEEE 1015 pp 1160ndash1179 (cit on p 17)

Zaino Jennifer (2016) 2016 Trends for Semantic Web and Semantic Tech-nologies http www dataversity net 2017 - predictions -

semantic-web-semantic-technologies (cit on p 78)Zhou Hao Minlie Huang et al (2016) bdquoContext-aware natural lan-

guage generation for spoken dialogue systemsldquo In Proceedingsof COLING 2016 the 26th International Conference on ComputationalLinguistics Technical Papers pp 2032ndash2041 (cit on pp 14 15)

Zhu Yukun Ryan Kiros Rich Zemel Ruslan Salakhutdinov RaquelUrtasun Antonio Torralba and Sanja Fidler (2015) bdquoAligningbooks and movies Towards story-like visual explanations by watch-ing movies and reading booksldquo In Proceedings of the IEEE interna-tional conference on computer vision pp 19ndash27 (cit on p 44)

[ September 4 2020 at 1017 ndash classicthesis ]

  • Abstract
  • Abstract
  • Acknowledgements
  • Contents
  • List of Figures
    • List of Figures
      • List of Tables
        • List of Tables
          • Acronyms
            • Acronyms
              • 1 Introduction
                • 11 Outline
                  • E-learning Conversational Assistant
                    • 2 Foundations
                      • 21 Introduction
                      • 22 Conversational Assistants
                        • 221 Taxonomy
                        • 222 Conventional Architecture
                        • 223 Existing Approaches
                          • 23 Target Domain amp Task Definition
                          • 24 Summary
                            • 3 Multipurpose IPA via Conversational Assistant
                              • 31 Introduction
                                • 311 Outline and Contributions
                                  • 32 Model
                                    • 321 OMB+ Design
                                    • 322 Preprocessing
                                    • 323 Natural Language Understanding
                                    • 324 Dialogue Manager
                                    • 325 Meta Policy
                                    • 326 Response Generation
                                      • 33 Evaluation
                                        • 331 Automated Evaluation
                                        • 332 Human Evaluation amp Error Analysis
                                          • 34 Structured Dialogue Acquisition
                                          • 35 Summary
                                          • 36 Future Work
                                            • 4 Deep Learning Re-Implementation of Units
                                              • 41 Introduction
                                                • 411 Outline and Contributions
                                                  • 42 Related Work
                                                  • 43 BERT Bidirectional Encoder Representations from Transformers
                                                  • 44 Data amp Descriptive Statistics
                                                  • 45 Named Entity Recognition
                                                    • 451 Model Settings
                                                      • 46 Next Action Prediction
                                                        • 461 Model Settings
                                                          • 47 Evaluation and Results
                                                          • 48 Error Analysis
                                                          • 49 Summary
                                                          • 410 Future Work
                                                              • Clustering-based Trend Analyzer
                                                                • 5 Foundations
                                                                  • 51 Motivation and Challenges
                                                                  • 52 Detection of Emerging Research Trends
                                                                    • 521 Methods for Topic Modeling
                                                                    • 522 Topic Evolution of Scientific Publications
                                                                      • 53 Summary
                                                                        • 6 Benchmark for Scientific Trend Detection
                                                                          • 61 Motivation
                                                                            • 611 Outline and Contributions
                                                                              • 62 Corpus Creation
                                                                                • 621 Underlying Data
                                                                                • 622 Stratification
                                                                                • 623 Document representations amp Clustering
                                                                                • 624 Trend amp Downtrend Estimation
                                                                                • 625 (Down)trend Candidates Validation
                                                                                  • 63 Benchmark Annotation
                                                                                    • 631 Inter-annotator Agreement
                                                                                    • 632 Gold Trends
                                                                                      • 64 Evaluation Measure for Trend Detection
                                                                                      • 65 Trend Detection Baselines
                                                                                        • 651 Configurations
                                                                                        • 652 Experimental Setup and Results
                                                                                          • 66 Benchmark Description
                                                                                          • 67 Summary
                                                                                          • 68 Future Work
                                                                                            • 7 Discussion and Future Work
                                                                                            • Bibliography
Page 7: Statistical Natural Language Processing Methods for

A C K N O W L E D G E M E N T S

This thesis was possible due to several people who contributed signif-icantly to my work I owe my gratitude to each of them for guidanceencouragement and inspiration during my doctoral study

First of all I would like to thank my advisor Prof Dr HinrichSchuumltze for his mentorship support and thoughtful guidance Myspecial gratitude goes for Prof Dr Ruedi Seiler and his fantastic teamat Integral-Learning GmbH I am thankful for your valuable inputinterest in our collaborative work and for your continuous supportI would also like to show gratitude to Prof Dr Alexander Fraser forco-advising support and helpful feedback

Finally none of this would be possible without my familyrsquos sup-port there are no words to express the depth of gratitude I feel forthem My earnest appreciation goes to my husband Dietrich thankyou for all the time you were by my side and for the encouragementyou provided me through these years I am very grateful to my par-ents for their understanding and backing at any moment I appreciatethe love of my grandparents it is hard to realize how significant theircare is for me Besides that I am also grateful to my sister Alexan-dra elder brother Denis and parents-in-law for merely being alwaysthere for me

VII

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

C O N T E N T S

List of Figures XIList of Tables XIIAcronyms XIII1 introduction 1

11 Outline 3

i e-learning conversational assistant 5

2 foundations 7

21 Introduction 7

22 Conversational Assistants 9

221 Taxonomy 9

222 Conventional Architecture 12

223 Existing Approaches 15

23 Target Domain amp Task Definition 18

24 Summary 19

3 multipurpose ipa via conversational assistant 21

31 Introduction 21

311 Outline and Contributions 22

32 Model 23

321 OMB+ Design 23

322 Preprocessing 23

323 Natural Language Understanding 25

324 Dialogue Manager 26

325 Meta Policy 30

326 Response Generation 32

33 Evaluation 37

331 Automated Evaluation 37

332 Human Evaluation amp Error Analysis 37

34 Structured Dialogue Acquisition 39

35 Summary 39

36 Future Work 40

4 deep learning re-implementation of units 41

41 Introduction 41

411 Outline and Contributions 42

42 Related Work 43

43 BERT Bidirectional Encoder Representations from Trans-formers 44

44 Data amp Descriptive Statistics 46

45 Named Entity Recognition 48

451 Model Settings 49

46 Next Action Prediction 49

461 Model Settings 49

47 Evaluation and Results 50

48 Error Analysis 58

IX

[ September 4 2020 at 1017 ndash classicthesis ]

X contents

49 Summary 58

410 Future Work 59

ii clustering-based trend analyzer 61

5 foundations 63

51 Motivation and Challenges 63

52 Detection of Emerging Research Trends 64

521 Methods for Topic Modeling 65

522 Topic Evolution of Scientific Publications 67

53 Summary 68

6 benchmark for scientific trend detection 71

61 Motivation 71

611 Outline and Contributions 72

62 Corpus Creation 73

621 Underlying Data 73

622 Stratification 73

623 Document representations amp Clustering 74

624 Trend amp Downtrend Estimation 75

625 (Down)trend Candidates Validation 75

63 Benchmark Annotation 77

631 Inter-annotator Agreement 77

632 Gold Trends 78

64 Evaluation Measure for Trend Detection 79

65 Trend Detection Baselines 80

651 Configurations 81

652 Experimental Setup and Results 81

66 Benchmark Description 82

67 Summary 83

68 Future Work 83

7 discussion and future work 85

bibliography 87

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F F I G U R E S

Figure 1 Taxonomy of dialogue systems 10

Figure 2 Example of a conventional dialogue architecture 12

Figure 3 OMB+ online learning platform 24

Figure 4 Ruled-Based System Dialogue flow 36

Figure 5 Macro F1 evaluation NAP ndash default modelComparison between German and multilingualBERT 52

Figure 6 Macro F1 test NAP ndash default model Compari-son between German and multilingual BERT 53

Figure 7 Macro F1 evaluation NAP ndash extended model 54

Figure 8 Macro F1 test NAP ndash extended model 55

Figure 9 Macro F1 evaluation NER ndash Comparison be-tween German and multilingual BERT 56

Figure 10 Macro F1 test NER ndash Comparison between Ger-man and multilingual BERT 57

Figure 11 (Down)trend detection Overall distribution ofpapers in the entire dataset 73

Figure 12 A trend candidate (positive slope) Topic Sen-timent and Emotion Analysis (using Social Me-dia Channels) 76

Figure 13 A downtrend candidate (negative slope) TopicXML 76

XI

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F TA B L E S

Table 1 Natural language representation of a sentence 13

Table 2 Rule 1 Admissible configurations 27

Table 3 Rule 2 Admissible configurations 27

Table 4 Rule 3 Admissible configurations 27

Table 5 Rule 4 Admissible configurations 28

Table 6 Rule 5 Admissible configurations 28

Table 7 Case 1 ID completeness 31

Table 8 Case 2 ID completeness 31

Table 9 Showcase 1 Short flow 33

Table 10 Showcase 2 Organisational question 33

Table 11 Showcase 3 Fallback and human-request poli-cies 34

Table 12 Showcase 4 Contextual question 34

Table 13 Showcase 5 Long flow 35

Table 14 General statistics for conversational dataset 46

Table 15 Detailed statistics on possible systems actionsin conversational dataset 48

Table 16 Detailed statistics on possible named entitiesin conversational dataset 48

Table 17 F1-results for the Named Entity Recognition task 51

Table 18 F1-results for the Next Action Prediction task 51

Table 19 Average dialogue accuracy computed for theNext Action Prediction task 51

Table 20 Trend detection List of gold trends 79

Table 21 Trend detection Example of MAP evaluation 80

Table 22 Trend detection Ablation results 82

XII

[ September 4 2020 at 1017 ndash classicthesis ]

acronyms XIII

AI Artificial Intelligence

API Application Programming Interface

BERT Bidirectional Encoder Representations from Transformers

BiLSTM Bidirectional Long Short Term Memory

CRF Conditional Random Fields

DM Dialogue Manager

DST Dialogue State Tracker

DSTC Dialogue State Tracking Challenge

EC Equivalence Classes

ES ElasticSearch

GM Gaussian Models

IAA Inter Annotator Agreement

ID Informational Dictionary

IPA Intelligent Process Automation

ITS Intelligent Tutoring System

LDA Latent Dirichlet Allocation

LSA Latent Semantic Analysis

LSTM Long Short Term Memory

MAP Mean Average Precision

MER Mutually Exclusive Rules

NAP Next Action Prediction

NER Named Entity Recognition

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OMB+ Online Mathematik Bruumlckenkurs

PL Policy Learning

pLSA probabilistic Latent Semantic Analysis

REST Representational State Transfer

RNN Recurrent Neural Network

RPA Robotic Process Automation

RegEx Regular Expressions

TL Transfer Learning

ToDS Task-oriented Dialogue System

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

1I N T R O D U C T I O N

Robotic Process Automation (RPA) is a type of software bots that im-itates manual human activities like entering data into a system log-ging into accounts or executing simple but repetitive workflows (Mo-hanty and Vyas 2018) In other words RPA-systems allow businessapplications to work without human involvement by replicating hu-man worker activities A further benefit of RPA-bots is that they aremuch cheaper compared to the employment of a human worker andthey are also able to work around-the-clock Overall advantages ofsoftware bots are hugely appealing and include cost savings accuracyand compliance while executing processes improved responsiveness(ie bots are faster than human workers) and finally such bots areusually agile and multi-skilled (Ivancic Vugec and Vukšic 2019)

However one of the main benefits and at the same time ndash draw-backs of the RPA-systems is their ability to precisely fulfill the as-signed task On the one hand the bot can carry out the task accu-rately and diligently On the other hand it is highly susceptible tochanges in defined scenarios Being designed for a particular taskthe RPA-bot is often not adaptable to other domains or even simplechanges in a workflow (Mohanty and Vyas 2018) This inability toadapt to changing conditions gave rise to a further area of improve-ment for RPA-bots ndash Intelligent Process Automation (IPA) systems

IPA-bots combine Robotic Process Automation (RPA) with ArtificialIntelligence (AI) and thus can perform complex and more cognitivelydemanding tasks that require reasoning and language understand-ing Hence IPA-bots went beyond automating simple ldquoclick tasksrdquo (asin RPA) and can accomplish jobs more intelligently ndash employing ma-chine learning algorithms and advanced analytics Such IPA-systemsundertake time-consuming and routine tasks and thus enable smartworkflows and free up skilled workers to accomplish higher-valueactivities

Previously IPA techniques were primarily employed within the com-puter vision domain Starting with the automation of simple displayclick-tasks and going towards sophisticated self-driving cars withtheir ability to interpret a stream of situational data obtained fromsensors and cameras However in recent times Natural LanguageProcessing (NLP) became one of the potential applications for IPA aswell due to its ability to understand and interpret human languageNLP methods are used to analyze large amounts of textual data andto respond to various inquiries Furthermore if the available datais unstructured and does not have any predefined format (ie cus-tomer emails) or if it is available in a variable format (ie invoices

1

[ September 4 2020 at 1017 ndash classicthesis ]

2 introduction

legal documents) then the Natural Language Processing (NLP) tech-niques are applied to extract the multiple relevant information Thisdata then can be utilized to solve distinct problems (ie natural lan-guage understanding predictive analytics) (Mohanty and Vyas 2018)For instance in the case of predictive analytics such a bot wouldmake forecasts about the future using current and historical data andtherethrough assist with sales or investments

However NLP in the context of Intelligent Process Automation (IPA)is not limited to the extraction of relevant data and insights from tex-tual documents One of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots) that are used toenable human-to-machine interaction This type of software is com-monly used for customer support or within recommendation- andbooking systems Conversational agents gain enormous demand dueto their ability to simultaneously support a large number of userswhile communicating in a natural language In a conventional chat-bot system a user provides input in a natural language the bot thenprocesses this query to extract potentially useful information and re-sponds with a relevant reply This process comprises multiple stagesand involves different types of NLP subtasks starting with NaturalLanguage Understanding (NLU) (eg intent recognition named en-tity extraction) and going towards dialogue management (ie deter-mining the next possible bots action considering the dialogue his-tory) and response generation (eg converting the semantic repre-sentation of next bots action to a natural language utterance) Typicaldialogue system for Intelligent Process Automation (IPA) purposesusually undertakes shallow customer support requests (eg answer-ing of FAQs) allowing human workers to focus on more sophisticatedinquiries

Natural Language Processing (NLP) techniques may also be usedindirectly for IPA purposes There are attempts to automatically iden-tify and classify tasks extracted from textual process descriptions asmanual user or automated The goal of such an NLP applicationis to reduce the effort required to identify suitable candidates for fur-ther robotic process automation (Friedrich Mendling and Puhlmann2011 Leopold Aa and Reijers 2018)

Intelligent Process Automation (IPA) is an emerging technology andevolves mainly due to the recognized potential benefit of combiningRobotic Process Automation (RPA) and Artificial Intelligence (AI) tech-niques to solve industrial problems (Ivancic Vugec and Vukšic 2019Mohanty and Vyas 2018) Industries are typically interested in be-ing responsive to customers and markets (Mohanty and Vyas 2018)Thus most customer support applications need to be real-time activeand highly robust to achieve higher client satisfaction rates The man-ual accomplishment of such tasks is often time-consuming expensiveand error-prone Furthermore due to the massive amounts of textualdata available for analysis it seems not to be feasible to process thisdata entirely by human means

[ September 4 2020 at 1017 ndash classicthesis ]

11 outline 3

11 outline

This thesis covers two topics united by the broader theme which isIntelligent Process Automation (IPA) utilizing statistical Natural Lan-guage Processing (NLP) methods

the first block of the thesis (Chapter 2 ndash Chapter 4) addressesthe challenge of implementing a conversational agent for Intelli-gent Process Automation (IPA) purposes within the e-learningfield As mentioned above chatbots are one of the central ap-plications of the IPA domain since they can effectively performtime-consuming tasks requiring natural language understand-ing allowing human workers to concentrate on more valuabletasks The IPA conversational bot that was realized within thisthesis takes care of routine and time-consuming tasks performedby human tutors within an online mathematical course Thisbot is deployed in a real-world setting and interact with stu-dents within the OMB+ mathematical platform Conducting ex-periments we observed two possibilities to build the conversa-tional agent in industrial settings ndash first with purely rule-basedmethods considering the missing training data and individualaspects of the target domain (ie e-learning) Second we re-implemented two of the main system components (ie Natu-ral Language Understanding (NLU) and Dialogue Manager (DM)units) using the current state-of-the-art deep-learning algorithm(ie Bidirectional Encoder Representations from Transformers(BERT)) and investigated their performance and potential use asa part of a hybrid model (ie containing both rule-based andmachine learning methods)

the second block of the thesis (Chapter 5 ndash Chapter 6) con-siders an Intelligent Process Automation (IPA) problem withinthe predictive analytics domain and addresses the task of scien-tific trend detection Predictive analytics makes forecasts aboutfuture outcomes based on historical and current data Henceorganizations can benefit through the advanced analytics mod-els by detecting trends and their behaviors and then employ-ing this information while making crucial business decisions(ie investments) In this work we deal with the subproblemof trend detection task ndash specifically we address the lack ofpublicly available benchmarks for the evaluation of trend de-tection algorithms Therefore we assembled the benchmark forboth trends and downtrends (ie topics that become less frequentovertime) detection To the best of our knowledge the task ofdowntrend detection has not been well addressed before Fur-thermore the resulting benchmark is based on a collection ofmore than one million documents which is among the largestthat has been used for trend detection before and therefore of-fers a realistic setting for developing trend detection algorithms

[ September 4 2020 at 1017 ndash classicthesis ]

4 introduction

All main chapters contain their specific introduction contribution listrelated work section and summary

[ September 4 2020 at 1017 ndash classicthesis ]

Part I

E - L E A R N I N G C O N V E R S AT I O N A L A S S I S TA N T

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

2F O U N D AT I O N S

In this block we address the challenge of designing and implement-ing a conversational assistant for Intelligent Process Automation (IPA)purposes within the e-learning domain The system aims to supporthuman tutors of the Online Mathematik Bruumlckenkurs (OMB+) plat-form during the tutoring round by undertaking routine and time-consuming responsibilities Specifically the system intelligently auto-mates the process of information accumulation and validation thatprecede every conversation Therethrough the IPA-bot frees up tutorsand allows them to concentrate on more complicated and cognitivelydemanding tasks

In this chapter we introduce the fundamental concepts required bythe topic of the first part of the thesis We begin with the foundationsto conversational assistants in Section 22 where we present the taxon-omy of existing dialogue systems in Section 221 give an overview ofthe conventional dialogue architecture in Section 222 with its mostessential components and describe the existing approaches for build-ing a conversational agent in Section 223 We finally introduce thetarget domain for our experiments in Section 23

21 introduction

Conversational Assistants - a type of software that interacts with peo-ple via written or spoken natural language have become widely pop-ularized in recent times Todayrsquos conversational assistants support abroad range of everyday skills including but not limited to creatinga reminder answering factoid questions and controlling smart homedevices

Initially the notion of artificial companion capable of conversingin a natural language was anticipated by Alan Turing in 1949 (Tur-ing 1949) Still the first significant step in this direction was donein 1966 with the appearance of Weizenbaums system ELIZA (Weizen-baum 1966) Its successor was the ALICE (Artificial LinguisticInternet Computer Entity) ndash a natural language processing dialoguesystem that conversed with a human by applying heuristical patternmatching rules Despite being remarkably popular and technologi-cally advanced for their days both systems were unable to pass theTuring test (Alan 1950) Following a long silent period the nextwave of growth of intelligent agents was caused by the advances inNatural Language Processing (NLP) This was enabled by access tolow-cost memory higher processing speed vast amounts of available

7

[ September 4 2020 at 1017 ndash classicthesis ]

8 foundations

data and sophisticated machine learning algorithms Hence one ofthe most notable leaps in the history of conversational agents beganin the year 2011 with intelligent personal assistants In that time Ap-ples Siri followed by Microsofts Cortana and Amazons Alexa in 2014and then Google Assistant in 2016 came to the scene The main focusof these agents was not to mimic humans as in ELIZA or ALICEbut to assist a human by answering simple factoid questions and set-ting notification tasks across a broad range of content Another typeof conversational agent called Task-oriented Dialogue System (ToDS)became a focus of industries in the year 2016 For the most part thesebots aimed to support customer service (Cui et al 2017) and respondto Frequently Asked Questions (Liao et al 2018) Their conversa-tional style provided more depth than those of personal assistantsbut a narrower range since they were patterned for task solving andnot for open-domain discussions

One of the central advantages of conversational agents and the rea-son why they are attracting considerable interest is their ability togive attention to several users simultaneously while supporting nat-ural language communication Due to this conversational assistantsgain momentum in numerous kinds of practical support applications(ie customer support) where a high number of users are involved atthe same time Classical engagement of human assistants is very cost-efficient and such support is usually not full-time accessible There-fore to automate human assistance is desirable but also an ambi-tious task Due to the lack of solid Natural Language Understand-ing (NLU) advancing beyond primary tasks is still challenging (Lugerand Sellen 2016) and even the most prosperous dialogue systemsoften fail to meet userrsquos expectations (Jain et al 2018) Significantchallenges involved in the design and implementation of a conversa-tional agent varying from domain engineering to Natural LanguageUnderstanding (NLU) and designing of an adaptive domain-specificconversational flow The quintessential problems and how-to ques-tions while building conversational assistants include

bull How to deal with variability and flexibility of language

bull How to get high-quality in-domain data

bull How to build meaningful representations

bull How to integrate commonsense and domain knowledge

bull How to build a robust system

Besides that current Natural Language Processing (NLP) and Ar-tificial Intelligence (AI) techniques have weaknesses when applied todomain-specific task-oriented problems especially if underlying datacomes from an industrial source As machine learning techniquesheavily rely on structured and cleaned training data to deliver consis-tent performance typically noisy real-world data without access toadditional knowledge databases negatively impacts the classificationaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 9

Despite that conversational agents could be successfully employedto solve less cognitive but still highly relevant tasks In this case adialogue agent could be operated as a part of the Intelligent ProcessAutomation (IPA) system where it takes care of straightforward butroutine and time-consuming functions performed in a natural lan-guage while allowing a human worker to focus on more cognitivedemanding processes

22 conversational assistants

Development and design of a particular dialogue system usually varyby its objectives among the most fundamental are

bull Purpose Should it be an open-ended dialogue system (egchit-chat) task-oriented system (eg customer support) or apersonal assistant (eg Google Assistant)

bull Skills What kind of behavior should a system demonstrate

bull Data Which data is required to develop (resp train) specificblocks and how could it be assembled and curated

bull Deployment Will the final system be integrated into a partic-ular software platform What are the challenges in choosingsuitable tools How will the deployment happen

bull Implementation Should it be a rule-based information-retrievalmachine-learning or a hybrid system

In the following chapter we will take a look at the most commontaxonomies of dialogue systems and different design and implemen-tation strategies

221 Taxonomy

According to the goals conversational agents could be classified intotwo main categories task-oriented systems (eg for customer support)and social bots (eg for chit-chat purposes) Furthermore systemsalso vary regarding their implementation characteristics Below weprovide a detailed overview of the most common variations

task-oriented dialogue systems are famous for their abilityto lead a dialogue with the ultimate goal of solving a userrsquos problem(see Figure 1) A customer support agent or a flightrestaurant book-ing system exemplify this class of conversational assistants Suchsystems are more meaningful and useful as those with an entirely so-cial goal however they usually affect more complications during theimplementation stage Since Task-oriented Dialogue System (ToDS) re-quire a precise understanding of the userrsquos intent (ie goal) the Natu-ral Language Understanding (NLU) segment must perform flawlessly

[ September 4 2020 at 1017 ndash classicthesis ]

10 foundations

Besides that if applied in an industrial setting ToDS are usually builtmodular which means they mostly rely on handcrafted rules and pre-defined templates and are restricted in their abilities Personal assis-tants (ie Google Assistant) are members of the task-oriented familyas well Though compared to standard agents they involve more so-cial conversation and could be used for entertainment purposes (egldquoOk Google tell me a jokerdquo) The latter is typically not available inconventional task-oriented systems

social bots and chit-chat systems in contrast regularly donot hold any specific goal and are designed for small-talk and enter-tainment conversations (see Figure 1) Those agents are usually end-to-end trainable and highly data-driven which means they requirelarge amounts of training data However due to entirely learnabletraining such systems often produce inappropriate responses mak-ing them extremely unreliable for practical applications Alike chit-chat systems social bots are rich in social conversation however theypossess an ability to solve uncomplicated user requests and conducta more meaningful dialogue (Fang et al 2018) Those requests arenot the same as in task-oriented systems and are mostly pointed toanswer simple factoid questions

Figure 1 Taxonomy of dialogue systems according to the measure of theirtask accomplishment and social contribution Figure originatesfrom Fang et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 11

The aforementioned system types could be decomposed farther ac-cording to the implementation characteristics

open domain amp closed domain A Task-oriented Dialogue Sys-tem (ToDS) is typically built within a closed domain setting Thespace of potential inputs and outputs is restricted since the sys-tem attempts to achieve a specific goal Customer support orshopping assistants are cases of closed domain problems (Wenet al 2016) These systems do not need to be able to talk aboutoff-site topics but they have to fulfill their particular tasks asefficiently as possible

Whereas chit-chat systems do not assume a well-defined goalor intention and thus could be built within an open domainThis means the conversation can progress in any direction andwithout any or minimal functional purpose (Serban et al 2016)However the infinite number of various topics and the fact thata certain amount of commonsense is required to create reason-able responses makes an open-domain setting a hard problem

content-driven amp user-centric Social bots and chit-chat sys-tems are typically utilizing a content-driven dialogue mannerThat means that their responses are based on the large and dy-namic content derived from the daily web mining and knowl-edge graphs This approach builds a dialogue based on trendycontent from diverse sources and thus it is an ideal way forsocial bots to catch usersrsquo attention and bring a user into thedialogue

User-centric approaches in turn assume precise language un-derstanding to detect the sentiment of usersrsquo statements Ac-cording to the sentiment the dialogue manager learns usersrsquopersonalities handles rapid topic changes and tracks engage-ment In this case the dialogue builds on specific user interests

long amp short conversations Models with a short conversationstyle attempt to create a single response to a single input (egweather forecast) In the case with a long conversation a systemhas to go through multiple turns and to keep track of what hasbeen said before (ie dialogue history) The longer the conver-sation the more challenging it is to train a system Customersupport conversations are typically long conversational threadswith multiple questions

response generation One of the essential elements of a dialoguesystem is a response generation This could be either done in aretrieval-based manner or by using generative models (ie NaturalLanguage Generation (NLG))

The former usually utilizes predefined responses and heuristicsto select a relevant response based on the input and context Theheuristic could be a rule-based expression match (ie RegularExpressions (RegEx)) or an ensemble of machine learning clas-

[ September 4 2020 at 1017 ndash classicthesis ]

12 foundations

sifiers (eg entity extraction prediction of next action) Suchsystems do not generate any original text instead they select aresponse from a fixed set of predefined candidates These mod-els are usually used in an industrial setting where the systemmust show high robustness performance and interpretabilityof the outcomes

The latter in contrast do not rely on predefined replies Sincesuch models are typically based on (deep-) machine learningtechniques they attempt to generate new responses from scratchThis is however a complicated task and is mostly used forvanilla chit-chat systems or in a research domain where thestructured training data is available

222 Conventional Architecture

In this and the following subsection we review the pipeline and meth-ods for Task-oriented Dialogue System (ToDS) Such agents are typi-cally composed of several components (Figure 2) addressing diversetasks required to facilitate a natural language dialogue (WilliamsRaux and Henderson 2016)

Figure 2 Example of a conventional dialogue architecture with its maincomponents in the bounding box Language Understanding (NLU)Dialogue Manager (DM) Response Generation (NLG) Figure origi-nates from NAACL 2018 Tutorial Su et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 13

A conventional dialogue architecture implements this with the fol-lowing components

natural language understanding unit has the following ob-jectives it parses the user input classifies the domain (ie cus-tomer support restaurant booking not required for a singledomain systems) and intent (ie find-flight book-taxi) and ex-tracts entities (ie Italian food Berlin)

Generally NLU unit maps every user utterance into semanticframes that are predefined according to different scenarios Ta-ble 1 illustrates an example of sentence representation wherethe departure-location (Berlin) and the arrival-location (Munich)are specified as slot values and the domain (airline travel) andintent (find-flight) are specified as well Typically there are twokinds of possible representations The first one is the utterancecategory represented in the form of the userrsquos intent The sec-ond could be the word-level information extraction in the formof Named Entity Recognition (NER) or Slot Filling tasks

Sentence Find flight from Berlin to Munich today

SlotsConcepts O O O B-dept O B-arr B-date

Named Entity O O O B-city O B-city O

Intent Find_Flight

Domain Airline Travel

Table 1 An illustrative example of a sentence representation1

A dialog act classification sub-module also known as intent clas-sification is dedicated to detecting the primary userrsquos goal atevery dialogue state It labels each userrsquos utterance with oneof the predefined intents Deep neural networks can be directlyapplied to this conventional classification problem (KhanpourGuntakandla and Nielsen 2016 Lee and Dernoncourt 2016)

Slot filling is another sub-module for language understandingUnlike intent detection (ie classification problem) slot fillingis defined as a sequence labeling problem where words in thesequence are labeled with semantic tags The input is a se-quence of words and the output is a sequence of slots onefor every word Variations of Recurrent Neural Network (RNN)architectures (LSTM BiLSTM) (Deoras et al 2015) and (attention-based) encoder-decoder architectures (Kurata et al 2016) havebeen employed for this task

[ September 4 2020 at 1017 ndash classicthesis ]

14 foundations

dialogue manager unit is subdivided into two components ndashfirst is a Dialogue State Tracker (DST) that holds the conver-sational state and second is a Policy Learning (PL) that definesthe systems next action

Dialogue State Tracker (DST) is the core component to guaranteea robust dialogue flow since it manages the userrsquos intent at ev-ery turn The conventional methods which are broadly appliedin commercial implementations often adopt handcrafted rulesto pick the most probable candidate (Goddeau et al 1996 Wangand Lemon 2013) among the predefined Though a variety ofstatistical approaches have emerged since the Dialogue StateTracking Challenge (DSTC)2 was arranged by Jason D Williamsin year 2013 Among the deep-learning based approaches isthe Belief Tracker proposed by Henderson Thomson and Young(2013) It employs a sliding window to produce a sequence ofprobability distributions over an arbitrary number of possiblevalues (Chen et al 2017) In the work of Mrkšic et al (2016) au-thors introduced a Neural Belief Tracker to identify the slot-valuepairs The system used the user intents preceding the user in-put the user utterance itself and a candidate slot-value pairwhich it needs to decide about as the input Then the systemiterated overall candidate slot-value pairs to determine whichones have just been expressed by the user (Chen et al 2017)Finally Vodolaacuten Kadlec and Kleindienst proposed a Hybrid Di-alog State Tracker that achieved the state-of-the-art performancefor DSTC2 shared task Their model used a separate-learned de-coder coupled with a rule-based system

Based on the state representation from the Dialogue State Tracker(DST) the PL unit generates the next system action that is thendecoded by the Natural Language Generation (NLG) moduleUsually a rule-based agent is used to initialize the system (Yanet al 2017) and then supervised learning is applied to theactions generated by these rules Alternatively the dialoguepolicy can be trained end-to-end with reinforcement learningto manage the system making policies toward the final perfor-mance (Cuayaacutehuitl Keizer and Lemon 2015)

natural language generation unit transforms the next actioninto a natural language response Conventional approaches toNatural Language Generation (NLG) typically adopt a template-based style where a set of rules is defined to map every nextaction to a predefined natural language response Such ap-proaches are simple generally error-free and easy to controlhowever their implementation is time-consuming and the styleis repetitive and hardly scalable

Among the deep learning techniques is the work by Wen etal (2015) that introduced neural network-based approaches toNLG with an LSTM-based structure Later Zhou and Huang

2Currently Dialog System Technology Challenge

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 15

(2016) adopted this model along with the question informa-tion semantic slot values and dialogue act types to generatemore precise answers Finally Wu Socher and Xiong (2019)proposed Global to Local Memory Pointer Networks (GLMP) ndash anarchitecture that incorporates knowledge bases directly into alearning framework (due to the Memory Networks component)and is able to re-use information from the user input (due tothe Pointer Networks component)

In particular cases dialogue flow can include speech recognitionand speech synthesis units (if managing a spoken natural languagesystem) and connection to third-party APIs to request informationstored in external databases (eg weather forecast) (as depicted inFigure 2)

223 Existing Approaches

As stated above a conventional task-oriented dialogue system is im-plemented in a pipeline manner That means that once the systemreceives a user query it interprets it and acts according to the dia-logue state and the corresponding policy Additionally based on un-derstanding it may access the knowledge base to find the demandedinformation there Finally a system transforms a systemrsquos next ac-tion (and if given an extracted information) into its surface form as anatural language response (Monostori et al 1996)

Individual components (ie Natural Language Understanding (NLU)Dialogue Manager (DM) Natural Language Generation (NLG)) of aparticular conversational system could be implemented using differ-ent approaches starting with entirely rule- and template-based meth-ods and going towards hybrid approaches (using learnable componentsalong with handcrafted units) and end-to-end trainable machine learn-ing methods

rule-based approaches Though many of the latest researchapproaches handle Natural Language Understanding (NLU) and Nat-ural Language Generation (NLG) units by using statistical NaturalLanguage Processing (NLP) models (Bocklisch et al 2017 Burtsevet al 2018 Honnibal and Montani 2017) most of the industriallydeployed dialogue systems still utilise handcrafted features and rulesfor the state tracking action prediction intent recognition and slotfilling tasks (Chen et al 2017 Ultes et al 2017) So most of thePyDial3 framework modules offer rule-based implementations usingRegular Expressions (RegEx) rule-based dialogue trackers (Hender-son Thomson and Williams 2014) and handcrafted policies Alsoin order to map the next action to text Ultes et al (2017) suggest rulesalong with a template-based response generation

3PyDial Framework httpwwwcamdialorgpydial

[ September 4 2020 at 1017 ndash classicthesis ]

16 foundations

The rule-based approach ensures robustness and stable performancethat is crucial for industrial systems that communicate with a largenumber of users simultaneously However it is highly expensive andtime-consuming to deploy a real dialogue system built in that man-ner The major disadvantage is that the usage of handcrafted systemsis restricted to a specific domain and possible adaptation requiresextensive manual engineering

end-to-end learning approaches Due to the recent advanceof end-to-end neural generative models (Collobert et al 2011) manyefforts have been made to build an end-to-end trainable architecturefor conversational agents Rather than using the traditional pipeline(see Figure 2) the end-to-end model is conceived as a single module(Chen et al 2017)

Wen et al (2016) followed by Bordes Boureau and Weston (2016)were among the pioneers who adapted an end-to-end trainable ap-proach for conversational systems Their approach was to treat dia-logue learning as the problem of learning a mapping from dialoguehistories to system responses and to apply an encoder-decoder model(Sutskever Vinyals and Le 2014) to train the entire system The pro-posed model still had its significant limitations (Glasmachers 2017)such as missing policies to learn a dialogue flow and the inability toincorporate additional knowledge that is especially crucial for train-ing a task-oriented agent To overcome the first problem Dhingraet al (2016) introduced an end-to-end reinforcement learning approachthat jointly trains dialogue state tracker and policy learning unitsThe second weakness was addressed by Eric and Manning (2017)who augmented existing recurrent network architectures with a dif-ferentiable attention-based key-value retrieval mechanism over theentries of a knowledge base This approach was inspired by Key-Value Memory Networks (Miller et al 2016) Later on Wu Socherand Xiong (2019) proposed Global to Local Memory Pointer Networks(GLMP) where the authors incorporated knowledge bases directlyinto a learning framework In their model a global memory encoderand a local memory decoder are employed to share external knowl-edge The encoder encodes dialogue history modifies global con-textual representation and generates a global memory pointer Thedecoder first produces a sketch response with empty slots Next ittransfers the global memory pointer to filter the external knowledgefor relevant information and then instantiates the slots via the localmemory pointers

Despite having better adaptability compared to any rule-based sys-tem and being simpler to train end-to-end approaches remain unattain-able for commercial conversational agents that operating on real-world data A well and carefully constructed task-oriented dialoguesystem in an observed domain using handcrafted rules (in NLU andDST units) and predefined responses for NLG still outperforms the

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 17

end-to-end systems due to its robustness (Glasmachers 2017 WuSocher and Xiong 2019)

hybrid approaches Though end-to-end learning is an attrac-tive solution for conversational systems current techniques are data-intensive and require large amounts of dialogues to learn simple be-haviors To overcome this barrier Williams Asadi and Zweig (2017)introduce Hybrid Code Networks (HCNs) which is an ensemble ofretrieval and trainable units System utilizes an Recurrent NeuralNetwork (RNN) for Natural Language Understanding (NLU) blockdomain-specific knowledge that are encoded as software (ie API

calls) and custom action templates used for Natural Language Gen-eration (NLG) unit Compared to existing end-to-end methods theauthors report that their approach considerably reduces the amountof data required for training (Williams Asadi and Zweig 2017) Ac-cording to the authors HCNs achieve state-of-the-art performance onthe bAbI dialog dataset (Bordes Boureau and Weston 2016) and out-perform two commercial customer dialogue systems4

Along with this work Wen et al (2016) propose a neural network-based model that is end-to-end trainable but still modularly con-nected In their work authors treat dialogue as a sequence to se-quence mapping problem (Sutskever Vinyals and Le 2014) aug-mented with the dialogue history and the current database searchoutcome (Wen et al 2016) The authors explain the learning processas follows At every turn the system takes a user input representedas a sequence of tokens and converts it into two internal represen-tations a distributed representation of the intent and a probabilitydistribution over slot-value pairs called the belief state (Young et al2013) The database operator then picks the most probable values inthe belief state to form the search result At the same time the intentrepresentation and the belief state are transformed and combined bya Policy Learning (PL) unit to form a single vector representing thenext system action This system action is then used to generate asketch response token by the token The final system response iscomposed by substituting the actual values of the database entriesinto the sketch sentence structure (Wen et al 2016)

Hybrid models appear to replace the established rule- and template-based approaches currently utilized in an industrial setting The per-formance of the most Natural Language Understanding (NLU) com-ponents (ie Named Entity Recognition (NER) domain and intentclassification) is reasonably high to apply them for the developmentof commercial conversational agents Whereas the Natural LanguageGeneration (NLG) unit could still be realized as a template-based so-lution by employing predefined response candidates and predictingthe next system action employing deep learning systems

4Internal Microsoft Research datasets in customer support and troubleshootingdomains were employed for training and evaluation

[ September 4 2020 at 1017 ndash classicthesis ]

18 foundations

23 target domain amp task definition

MUMIE5 is an open-source e-learning platform for studying and teach-ing mathematics and computer science Online Mathematik Bruumlck-enkurs (OMB+) is one of the projects powered by MUMIE SpecificallyOnline Mathematik Bruumlckenkurs (OMB+)6 is a German online learningplatform that assists students who are preparing for an engineeringor computer science study at a university

The coursersquos central purpose is to assist students in reviving theirmathematical skills so that they can follow the upcoming universitycourses This online course is thematically segmented into 13 partsand includes free mathematical classes with theoretical and practi-cal content Every chapter begins with an overview of its sectionsand consists of explanatory texts with built-in videos examples andquick checks of understanding Furthermore various examinationmodes to practice the content (ie training exercises) and to checkthe understanding of the theory (ie quiz) are available Besidesthat OMB+ provides a possibility to get assistance from a human tu-tor Usually the students and tutors interact via a chat interface inwritten form The language of communication is German

The current dilemma of the OMB+ platform is that the number ofstudents grows significantly every year but it is challenging to findqualified tutors ndash and it is expensive to hire more human tutors Thisresults in a longer waiting period for students until their problemscan be considered and processed In general all student questionscan be grouped into three main categories Organizational questions(ie course certificate) contextual questions (ie content theorem) andmathematical questions (exercises solutions) To assist a student witha mathematical question a tutor has to know the following regularinformation What kind of topic (or subtopic) a student has a problemwith At which examination mode (quiz chapter level training or exer-cise section level training or exercise or final examination) a studentis working right now And finally the exact question number and exactproblem formulation This means that a tutor has to ask the same on-boarding questions every time a new conversation starts This is how-ever very time-consuming and could be successfully solved within anIntelligent Process Automation (IPA) system

The work presented in Chapter 3 and Chapter 4 was enabled by thecollaboration with MUMIE and Online Mathematik Bruumlckenkurs (OMB+)team and addresses the abovementioned problem Within Chapter 3we implemented an Intelligent Process Automation (IPA) dialogue sys-tem with rule- and retrieval-based algorithms that interacts with stu-dents and performs the onboarding This system is multipurposeon the one hand it eases the assistance process for human tutors byreducing repetitive and time-consuming activities and therefore al-lows workers to focus on more challenging tasks Second interacting

5Mumie httpswwwmumienet6httpswwwombplusde

[ September 4 2020 at 1017 ndash classicthesis ]

24 summary 19

with students it augments the resources with structured and labeledtraining data The latter in turn allowed us to re-train specific sys-tem components utilizing machine and deep learning methods Wedescribe this procedure in Chapter 4

We report that the earlier collected human-to-human data from theOMB+ platform could not be used for training a machine learning sys-tem since it is highly unstructured and contains no direct question-answer matching which is crucial for trainable conversational algo-rithms We analyzed these conversations to get insight from themthat we used to define the dialogue flow

We also do not consider the automatization of mathematical ques-tions in this work since this task would require not only preciselanguage understanding and extensive commonsense knowledge butalso the accurate perception of mathematical notations The tutor-ing process at OMB+ differentiates a lot from the standard tutoringmethods Here instructors attempt to navigate students by givinghints and tips instead of providing an out-of-the-box solution Exist-ing human-to-human dialogues revealed that such kind of task couldbe even challenging for human tutors since it is not always directlyperceptible what precisely the student does not understand or hasdifficulties with Furthermore dialogues containing mathematical ex-planations could last for a long time and the topic (ie intent andfinal goal) can shift multiple times during the conversation

To our knowledge this task would not be feasible to sufficientlysolve with current machine learning algorithms especially consider-ing the missing training data A rule-based system would not bethe right solution for this purpose as well due to a large number ofcomplex rules which would need to be defined Thus in this workwe focus on the automatization of less cognitively demanding taskswhich are still relevant and must be solved in the first instance to easethe tutoring process for instructors

24 summary

Conversational agents are a highly demanding solution for indus-tries due to their potential ability to support a large number of usersmulti-threaded while performing a flawless natural language conver-sation Many messenger platforms are created and custom chatbotsin various constellations emerge daily Unfortunately most of thecurrent conversational agents are often disappointing to use due tothe lack of substantial natural language understanding and reason-ing Furthermore Natural Language Processing (NLP) and ArtificialIntelligence (AI) techniques have limitations when applied to domain-specific and task-oriented problems especially if no structured train-ing data is available

Despite that conversational agents could be successfully applied tosolve less cognitive but still highly relevant tasks ndash that is within the

[ September 4 2020 at 1017 ndash classicthesis ]

20 foundations

Intelligent Process Automation (IPA) domain Such IPA-bots wouldundertake tedious and time-consuming responsibilities to free up theknowledge worker for more complex and intricate tasks IPA-botscould be seen as a member of goal-oriented dialogue family and arefor the most part performed either in a rule-based manner or throughhybrid models when it comes to a real-world setting

[ September 4 2020 at 1017 ndash classicthesis ]

3M U LT I P U R P O S E I PA V I A C O N V E R S AT I O N A LA S S I S TA N T

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research describedin this chapter was carried out in its entirety by the au-thor of this thesis The other authors of the publicationacted as advisers At the time of writing this thesis thesystem described in this work was deployed at the OnlineMathematik Bruumlckenkurs (OMB+) platform

As we outlined in the introduction Intelligent Process Automa-tion (IPA) is an emerging technology with the primary goal of assist-ing the knowledge worker by taking care of repetitive routine andlow-cognitive tasks Conversational assistants that interact with usersin a natural language are a potential application for the IPA domainSuch smart virtual agents can support the user by responding to var-ious inquiries and performing routine tasks carried out in a naturallanguage (ie customer support)

In this work we tackle the challenge of implementing an IPA con-versational agent in a real-world industrial context and conditions ofmissing training data Our system has two meaningful benefits Firstit decreases monotonous and time-consuming tasks and hence letsworkers concentrate on more intellectual processes Second interact-ing with users it augments the resources with structured and labeledtraining data The latter in turn can be used to retrain a systemutilizing machine and deep learning methods

31 introduction

Conversational dialogue systems can give attention to several userssimultaneously while supporting natural communication They arethus exceptionally needed for practical assistance applications wherea high number of users are involved at the same time Regularly di-alogue agents require a lot of natural language understanding andcommonsense knowledge to hold a conversation reasonably and in-telligently Therefore nowadays it is still challenging to substitutehuman assistance with an intelligent agent completely However adialogue system could be successfully employed as a part of the In-telligent Process Automation (IPA) application In this case it wouldtake care of simple but routine and time-consuming tasks performed

21

[ September 4 2020 at 1017 ndash classicthesis ]

22 multipurpose ipa via conversational assistant

in a natural language while allowing a human worker to focus onmore cognitive demanding processes

Recent research in the dialogue generation domain is conducted byemploying Artificial Intelligence (AI) techniques like machine- anddeep learning (Lowe et al 2017 Wen et al 2016) However thosemethods require high-quality data for training that is often not di-rectly available for domain-specific and industrial problems Eventhough companies gather and store vast volumes of data it is of-ten low-quality with regards to machine learning approaches (Daniel2015) Especially if it concerns dialogue data which has to be prop-erly structured as well annotated Whereas if trained on artificiallycreated datasets (which is often the case in the research setting) thesystems can solve only specific problems and are hardly adaptableto other domains Hence despite the popularity of deep learningend-to-end models one still needs to rely on traditional pipelines inpractical dialogue engineering mainly while introducing a new do-main

311 Outline and Contributions

This work addresses the challenge of implementing a dialogue systemfor IPA purposes within the practical e-learning domain under condi-tions of missing training data The chapter is structured as followsIn Section 32 we introduce our method and describe the design ofthe rule-based dialogue architecture with its main components suchas Natural Language Understanding (NLU) Dialogue Manager (DM)and Natural Language Generation (NLG) units In Section 33 wepresent the results of our dual-focused evaluation and Section 34 de-scribes the format of collected structured data Finally we concludein Section 35

Our contributions within this work are as follows

bull We implemented a robust conversational system for IPA pur-poses within a practical e-learning domain and under the con-ditions of missing training (ie dialogue) data

bull The system has two main objectives

ndash First it reduces repetitive and time-consuming activitiesand allows workers of the e-learning platform to focussolely on mathematical inquiries

ndash Second by interacting with users it augments the resourceswith structured and partially labeled training data for fur-ther implementation of trainable dialogue components

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 23

32 model

The central purpose of the introduced system is to interact with stu-dents at the beginning of every chat conversation and gather informa-tion on the topic (and sub-topic) examination mode and level questionnumber and exact problem formulation Such onboarding process savestime for human tutors and allows them to handle mathematical ques-tions solely Furthermore the system is realized in a way such that itaccumulates labeled and structured dialogues in the background

Figure 4 displays the intact conversation flow After the systemaccepts user input it analyzes it at different levels and extracts infor-mation if such is provided If some of the information required fortutoring is missing the system asks the student to provide it Whenall the information points are collected they are automatically vali-dated and forwarded to a human tutor The latter can then directlyproceed with the assistance process In the following we explain thekey components of the system

321 OMB+ Design

Figure 3 illustrates the internal structure and design of the OMB+ plat-form It has topics and sub-topics as well as four various examinationmodes Each topic (Figure 3 tag 1) correspond to a chapter level and al-ways has sub-topics (Figure 3 tag 2) that correspond to a section levelExamination modes training and exercise are ambiguous because theycorrespond to either a chapter (Figure 3 tag 3) or a section (Figure 3tag 5) level and it is important to differentiate between them sincethey contain different types of content The mode final examination(Figure 3 tag 4) always corresponds to a chapter level whereas quiz(Figure 3 tag 5) can belong only to a section level According to thedesign of the OMB+ platform there are several ways of how a possibledialogue flow can proceed

322 Preprocessing

In a natural language conversation one may respond in various formsmaking the extraction of data from user-generated text challengingdue to potential misspellings or confusable spellings (ie Aufgabe 1aAufgabe 1 (a)) Hence to facilitate a substantial recognition of intentsand extraction of entities we normalize (eg correct misspellingsdetect synonyms) and preprocess every user input before moving for-ward to the Natural Language Understanding (NLU) module

[ September 4 2020 at 1017 ndash classicthesis ]

24 multipurpose ipa via conversational assistant

Figure 3 OMB+ Online Learning Platform where 1 is the Topic (corre-sponds to a chapter level) 2 is a Sub-Topic (corresponds to a sectionlevel) 3 is chapter level examination mode 4 is the Final Exam-ination Mode (available only for chapter level) 5 are the Examina-tion Modes Exercise Training (available at section levels) and Quiz(available only at section level) and 6 is the OMB+ Chat

We perform both linguistic and domain-specific preprocessing whichinclude the following steps

bull lowercasing and stemming of words in the input query

bull removal of stop words and punctuation

bull all mentions of X in mathematical formulations are removed toavoid confusion with roman number 10 (ldquoXrdquo)

bull in a combination of the type word ldquoKapitelAufgaberdquo + digitwritten as a word (ie ldquoeinsrdquo ldquozweitesrdquo etc) word is replacedwith a digit (ldquoKapitel einsrdquo rarr ldquoKapitel 1rdquo ldquoim fuumlnften Kapitelrdquo rarrldquoim Kapitel 5rdquo) roman numbers are replaced with a digit as well(ldquoKapitel IVrdquorarr ldquoKapitel 4rdquo)

bull detected ambiguities are normalized (eg ldquoTrainingsaufgaberdquorarrldquoTrainingrdquo)

bull recognized misspellings resp type errors are corrected (egldquoDifeernzialrechnungrdquorarr ldquoDifferentialrechnungrdquo)

bull permalinks are parsed and analyzed From each permalink it ispossible to extract topic examination mode and question num-ber

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 25

323 Natural Language Understanding

We implement the Natural Language Understanding (NLU) unit uti-lizing handcrafted rules Regular Expressions (RegEx) and Elasctic-search1 API

intent classification According to the design of the OMB+platform all student questions can be classified into three cate-gories organizational questions (ie course certificate) contextualquestions (ie content theorem) and mathematical questions (exer-cises solutions) To classify the input query by its intent (ie cat-egory) we employ weighted key-word information in the formof handcrafted rules We assume that specific words are explic-itly associated with a corresponding intent (eg theorem or rootdenote the mathematical question feedback or certificate denoteorganizational inquire) If no intent could be ordered then itis assumed that the NLU unit was not capable of understand-ing and the intent is unknown In this case the virtual agentrequests the user to provide an intent manually by picking onefrom the mentioned three options The queries from organiza-tional and theoretical categories are directly handed over to ahuman tutor while mathematical questions are analyzed by theautomated agent for further information extraction

entity extraction On the next step the system retrieves the en-tities from a user message We specify five following prerequi-site entities which have to be collected before the dialogue canbe handed over to a human tutor topic sub-topic examina-tion mode and level and question number This part is imple-mented using ElasticSearch (ES) and RegEx To facilitate the useof ES we indexed the OMB+ web-page to an internal databaseBesides indexing the titles of topics and sub-topics we also pro-vided supplementary information on possible synonyms andwriting styles We additionally filed OMB+ permalinks whichdirect to the site pages To query the resulting database we em-ploy the internal Elasticsearch multi match function and set theminimum should match parameter to 20 This parameter spec-ifies the number of terms that must match for a document tobe considered relevant Besides that we adopted fuzziness withthe maximum edit distance set to 2 characters The fuzzy queryuses similarity based on Levenshtein edit distance (Levenshtein1966) Finally the system produces a ranked list of potentialmatching entries found in the database within the predefinedrelevance threshold (we set it to θ=15) We then pick the mostprobable entry as the right one and select the correspondingentity from the user input

Overall the Natural Language Understanding (NLU) module re-ceives the user input as a preprocessed text and examines it across all

1httpswwwelasticcoproductselasticsearch

[ September 4 2020 at 1017 ndash classicthesis ]

26 multipurpose ipa via conversational assistant

predefined RegEx statements and for a match in the ElasticSearch (ES)database We use ES to retrieve entities for the topic sub-topic andexamination mode whereas RegEx is used to extract the informationon a question number Every time the entity is extracted it is filled inthe Informational Dictionary (ID) The ID has the following six slotsto be filled in topic sub-topic examination level examination modequestion number and exact problem formulation (this information isrequested on further steps)

324 Dialogue Manager

A conventional Dialogue Manager (DM) consists of two interactingcomponents The first component is the Dialogue State Tracker (DST)that maintains a representation of the current conversational stateand the second is the Policy Learning (PL) that defines the next systemaction (ie response)

In our model every agentrsquos next action is determined by the stateof the previously obtained information accumulated in the Informa-tional Dictionary (ID) For instance if the system recognizes that thestudent works on the final examination it also knows (defined by hand-coded logic in the predefined rules) that there is no necessity to askfor sub-topic because the final examination always corresponds to achapter level (due to the layout of OMB+ platform) If the system rec-ognizes that the user struggles in solving quiz it has to ask for boththe corresponding topic and sub-topic because the quiz always relatesto a section level

To discover all of the potential conversational flows we implementMutually Exclusive Rules (MER) which indicate that two events e1and e2 are mutually exclusive or disjoint since they cannot both oc-cur at the same time (ie the intersection of these events is emptyP(A cap B) = 0) Additionally we defined transition and mapping rulesHence only particular entities can co-occur while others must beexcluded from the specific rule combination Assume a topic t =

Geometry According to MER this topic can co-occur with one of thepertaining sub-topics defined at the OMB+ (ie s = Angel) Thus thelevel of the examination would be l = Section (but l 6= Chapter)because the student already reported a sub-section he or she is work-ing on In this specific case the examination mode could either bee isin [TrainingExerciseQuiz] but not e = Final_ExaminationThis is because e = Final_Examination can co-occur only with atopic (chapter-level) but not with a sub-topic (section-level) The ques-tion number q could be arbitrary and has to co-occur with every othercombination

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 27

This allows the formal solution to be found Assume a list of alltheoretically possible dialogue states S = [topic sub-topic trainingexercise chapter level section level quiz final examination questionnumber] and for each element sn in S is true that

sn =

1 if information is given

0 otherwise(1)

This gives all general (resp possible) dialogue states without ref-erence to the design of the OMB+ platform However to make thedialogue states fully suitable for the OMB+ from the general stateswe select only those which are valid To define the validness of thestate we specify the following five MERrsquos

R1 Topic

not T T

Table 2 Rule 1 ndash Admissible topic configurations

Rule (R1) in Table 2 denotes admissible configurations for topic andmeans that the topic could be either given (T ) or not (notT )

R2 Examination Mode

not TR not E not Q not FE

TR not E not Q not FE

not TR E not Q not FE

not TR not E Q not FE

not TR not E not Q FE

Table 3 Rule 2 ndash Admissible examination mode configurations

Rule (R2) in Table 3 denotes that either no information on the ex-amination mode is given or examination mode is Training (TR) orExercise (E) or Quiz (Q) or Final Examination (FE) but not more thanone mode at the same time

R3 Level

not CL not SL

CL not SL

not CL SL

Table 4 Rule 3 ndash Admissible level configurations

Rule (R3) in Table 4 indicates that either no level information isprovided or the level corresponds to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

[ September 4 2020 at 1017 ndash classicthesis ]

28 multipurpose ipa via conversational assistant

R4 Examination amp Level

not TR not E not SL not CL

TR not E SL not CL

TR not E not SL CL

TR not E not SL not CL

not TR E SL not CL

not TR E not SL CL

not TR E not SL not CL

Table 5 Rule 4 ndash Admissible examination mode (only for Training and Exer-cise) and corresponding level configurations

Rule (R4) in Table 5 means that Training (TR) and Exercise (E) ex-amination modes can either belong to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

R5 Topic amp Sub-Topic

not T not ST

T not ST

not T ST

T ST

Table 6 Rule 5 ndash Admissible topic and corresponding sub-topic configura-tions

Rule (R5) in Table 6 symbolizes that we could be either given onlya topic (T ) or the combination of topic and sub-topic (ST ) at the sametime or only sub-topic or no information on this point at all

We then define a valid dialogue state as a dialogue state that meetsall requirements of the abovementioned rules

foralls(State(s)and R1minus5(s))rarr Valid(s) (2)

Once we have the valid states for dialogues we perform a mappingfrom each valid dialogue state to the next possible systems action Forthat we first define five transition rules The order of transition rulesis important

T1 = notT (3)

means that no topic (T ) is found in the ID (ie could not be ex-tracted from user input)

T2 = notEM (4)

indicates that no examination mode (EM) is found in the ID

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 29

T3 = EM where EM isin [TRE] (5)

denotes that the extracted examination mode (EM) is either Train-ing (TR) or Exercise (E)

T4 =

T notST TR SL

T notST E SL

T notST Q

(6)

means that no sub-topic (ST ) is provided by a user but ID either al-ready contains the combination of topic (T ) training (TR) and sectionlevel (SL) or the combination of topic exercise (E) and section levelor the combination of topic and quiz (Q)

T5 = notQNR (7)

indicates that no question number (QNR) was provided by a stu-dent (or could not be successfully extracted)

Finally we assumed the list of possible next actions for the system

A =

Ask for Topic

Ask for Examination Mode

Ask for Level

Ask for Sub-Topic

Ask for Question Nr

(8)

Following the transition rules we map each valid dialogue state tothe possible next action am in A

existaforalls(Valid(s)and T1(s))rarr a(s T) (9)

in the case where we do not have any topic provided the nextaction is to ask for the topic (T )

existaforalls(Valid(s)and T2(s))rarr a(s EM) (10)

if no examination mode is provided by a user (or it could not besuccessfully extracted from the user query) the next action is definedas ask for examination mode (EM)

existaforalls(Valid(s)and T3(s))rarr a(s L) (11)

in case where we know the examination mode EM isin [TrainingExercise] we have to ask about the level (ie training at chapter levelor training at section level) thus the next action is ask for level (L)

[ September 4 2020 at 1017 ndash classicthesis ]

30 multipurpose ipa via conversational assistant

existaforalls(Valid(s)and T4(s))rarr a(s ST) (12)

if no sub-topic is provided but the examination mode EM isin [Train-ing Exercise] at section level the next action is defined as ask for sub-topic (ST )

existaforalls(Valid(s)and T5(s))rarr a(s QNR) (13)

if no question number is provided by a user then the next action isask for question number (QNR)

Following this scheme we generate 56 state transitions which spec-ify the next system actions Being on a new conversation state wecompare the extracted (or updated) data in the Informational Dictio-nary (ID) with the valid dialogue states and select the mapped actionas the next systemrsquos action

flexibility of dst The DST transitions outlined above are meantto maintain the current configuration of the OMB+ learning platformStill further MERs could be effortlessly generated to create new tran-sitions Examining this we performed tests with the previous de-sign of the platform where all the topics except for the first onehad sub-topics Whereas the first topic contained both sub-topics andsub-sub-topics We could produce the missing transitions with the de-scribed approach The number of permissible transitions in this caseincreased from 56 to 117

325 Meta Policy

Since the proposed system is expected to operate in a real-world set-ting we had to develop additional policies that manage the dialogueflow and verify the systemrsquos accuracy We explain these policies be-low

completeness of informational dictionary The model au-tomatically verifies the completeness of the Informational Dic-tionary (ID) which is determined by the number of essentialslots filled in the informational dictionary There are 6 discretecases when the ID is deemed to be complete For example ifa user works on a final examination mode the agent does nothas to request a sub-topic or examination level Hence the ID

has to be filled only with data for a topic examination typeand question number Whereas if the user works on trainingmode the system has to collect information about the topic sub-topic examination level examination mode and the questionnumber Once the ID is intact it is provided to the verificationstep Otherwise the agent proceeds according to the next ac-tion The system extracts the information in each dialogue stateand therefore if the user gives updated information on any sub-

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 31

ject later in the dialogue history the corresponding slot will beupdated in the ID

Below are examples of two final cases (out of six) where Infor-mational Dictionary (ID) is considered to be complete

Case1 =

intent = Math

topic 6= None

exam mode = Final Examination

level = None

question nr 6= None

(14)

Table 7 Case 1 ndash Any topic examination mode is final examination exami-nation level does not matter any question number

Case2 =

intent = Math

topic 6= None

sub-topic 6= None

exam mode isin [Training Exercise]

level = Section

question nr 6= None

(15)

Table 8 Case 2 Any topic any related sub-topic examination mode is ei-ther training or exercise examination level is section any questionnumber

verification step After the system has collected all the essentialdata (ie ID is complete) it continues to the final verification stepHere the obtained data is presented to the current student inthe dialogue session The student is asked to review the exactnessof the data and if any entries are faulty to correct them TheInformational Dictionary (ID) is where necessary updated withthe user-provided data This procedure repeats until the userconfirms the correctness of the compiled data

fallback policy In the cases where the system fails to retrieveinformation from a studentrsquos query it re-asks a user and at-tempts to retrieve information from a follow-up response Themaximum number of re-ask trials is a parameter and set to threetimes (r = 3) If the system is still incapable of extracting in-formation the user input is considered as the ground truth andfilled to the appropriate slot in ID An exception to this ruleapplies where the user has to specify the intent manually Afterthree unclassified trials a dialogues session is directly handedover to a human tutor

[ September 4 2020 at 1017 ndash classicthesis ]

32 multipurpose ipa via conversational assistant

human request In every dialogue state a user can shift to a hu-man tutor For this a user can enter the ldquohumanrdquo (or ldquoMenschrdquoin German language) key-word Therefore every user messageis investigated for the presence of this key-word

326 Response Generation

The semantic representation of the systems next action is transformedinto a natural language utterance Hence each possible action ismapped to precisely one response that is predefined in the templateSome of the responses are fixed (ie ldquoWelches Kapitel bearbeitest dugeraderdquo) 2 Others have placeholders for custom values In the lattercase the utterance can be formulated dependent on the InformationalDictionary (ID)

Assume predefined response ldquoDeine Angaben waren wie folgt a)Kapitel K b) Abschnitt A c) Aufgabentyp M d) Aufgabe Q e)Ebene E Habe ich dich richtig verstanden Bitte antworte mit Ja oderNeinrdquo3 The slots [KAMQE] are the placeholders for the informa-tion stored in the ID During the conversation the system selects thetemplate according to the next action and fills the slots with extractedvalues if required After the extracted information is filled in the cor-responding placeholders in the utterance the final response could beas follows ldquoDeine Angaben waren wie folgt a) Kapitel I ElementaresRechnen b) Abschnitt Rechenregel fuumlr Potenzen c) Aufgabentyp Quiz d)Aufgabe 1(a) e) Ebene Abschnitt Habe ich dich richtig verstanden Bitteantworte mit Ja oder Neinrdquo4

Below we demonstrate five sample dialogues in original Germanlanguage with variable flows

bull short dialogue where certain information is already providedin the first studentrsquos input

bull organizational question case

bull contextual question case

bull example with the manual intent classification

bull example with fallback and human policies

bull long flow with entry correction on the final verification phase

2Translates to ldquoWhich chapter are you currently working onrdquo3Translates to ldquoYour details were as follows a) Chapter b) Section c) Examina-

tion Mode d) Question Number e) Level Did I understand you correctly Pleaseanswer yes or no

4Translates to ldquoYour details were as follows a) Elementary Calculus b) PowerLaws c) Quiz d) Question 1 (a) e) Section Level Did I understand you correctlyPlease answer yes or no

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 33

Student Hallo ich habe Schwierigkeiten beim Kapitel I Elementares Rech-nen mit der Uumlbungsaufgabe 1 a

Next-Action [Ask for a Level Chapter or Section]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel ldquoGeometrierdquo) oder um eine Aufgabe auf der Ebene einesAbschnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt ldquoWinkelrdquoim Kapitel ldquoGeometrierdquo) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlrAbschnitt

Student Kapitel

Next-Action [Final-Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Aufgabentyp Exercisec) Aufgabe 1ad) Ebene KapitelHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yep

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 9 Showcase 1 Short flow most of the information was extracted fromthe initial user input In bold are the extracted entities

Student Hallo ich habe eine Frage bzgl eines Zertifikates und zwar wiekann ich es austellen lassen

Next-Action [Organisational Question]Bot Es scheint sich wohl um eine organisatorische Frage zu handeln Bittefasse deine Frage in einem kurzen Text zusammen damit ich sie an meinenmenschlichen Kollegen weiterleiten kann

Student Wie bekomme ich ein Zertifikat

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 10 Showcase 2 Organisational Question Underlined are the key-words which point on the organisational intent

[ September 4 2020 at 1017 ndash classicthesis ]

34 multipurpose ipa via conversational assistant

Student hallo

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student ma

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student math

Next-Action [Ask for Topic]Bot Welches Kapitel bearbeitest du gerade

Student Lineare Gleichungssysteme

Next-Action [Ask for Examination Mode]Bot Bearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfung

Student human

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 11 Showcase 3 Manual intent classification Fallback and Human Re-quest policies (in bold type)

Student Halli-Hallo koumlnnte mir bitte jemand die Regel 72 (vertikale streck-ung und stauchung des graphen) erklaumlren Danke im Voraus

Next-Action [Contextual Question]Bot Es scheint sich wohl um eine inhaltliche Frage zu handeln Bitte fassedeine Frage in einem kurzen Text zusammen damit ich sie an meinen men-schlichen Kollegen weiterleiten kann

Student Ich habe Schwierigkeiten beim Verstehen der Regel 72 Was istgenau die vertikale Streckung des Graphen

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 12 Showcase 4 Contextual Question Underlined are the key-wordswhich point on the contextual intent

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 35

Student Hallo ich kann beim Kapitel 1 nicht weiterkommen Trainingsauf-gabe 1 (a)

Next-Action [Ask for Level]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel Geometrie) oder um eine Aufgabe auf der Ebene eines Ab-schnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt Winkel imKapitel Geometrie) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlr Ab-schnitt

Student ich glaube es ist eine Sektion

Next-Action [Ask for Sub-Topic]Bot Wie heiszligt der Abschnitt in dem sich die Aufgabe befindet auf die sichdeine Frage bezieht

Student ich bearbeite rechenregeln und potenzen

Next-Action [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Quizd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student nope

Next-Action [Ask_For]Bot Welche der folgenden Punkten habe ich falsch erkannt Schreibe bittewelche der folgenden Punkte es sind a b c d e

Student c

State-Transition [Correct-Fields]Bot Gib bitte die korrekte Information fuumlr c) ein

Student Training

State-Transition [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Trainingd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yes

State-Transition [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 13 Showcase 5 Long Flow Correction of entries In bold type are theextracted entities

[ September 4 2020 at 1017 ndash classicthesis ]

36 multipurpose ipa via conversational assistant

Figure 4 Dialogue Flow Abbreviations UNK - unknown ID - informa-tional dictionary RegEx - regular expressions

[ September 4 2020 at 1017 ndash classicthesis ]

33 evaluation 37

33 evaluation

In order to get the feedback on the quality functionality and use-fulness of the proposed model we evaluated it in two steps firstwith an automated method (Section 331) utilizing 130 manually an-notated conversations to verify the robustness of the system Secondwith the help of human tutors (Section 332) from OMB+ to exam-ine the user experience We describe the details as well as the mostcommon errors below

331 Automated Evaluation

To manage an automated evaluation we manually assembled a datasetwith 130 conversational flows We predefined viable initial user ques-tions user answers as well as gold system responses These flowscover most frequent dialogues that we previously observed in human-to-human dialogue data We examined our system by implementinga self-chatting evaluation bot The evaluation cycle is as follows

bull The system accepts the first question from a predefined dia-logue via an API request preprocesses and analyzes it to de-termine the intent and extract entities

bull Then it estimates the appropriate next action and responds ac-cordingly

bull This response is then compared to the systems gold answerIf the predicted result is correct then the system receives thenext predefined user input from the templated dialogue and re-sponds again as defined above This procedure proceeds untilthe dialogue terminates (ie ID is complete) Otherwise thesystem reports an unsuccessful case number

The system successfully passed this evaluation for all 130 cases

332 Human Evaluation amp Error Analysis

To evaluate the system on irregular cases we carried out experimentswith human tutors from the OMB+ platform The tutors are experi-enced regarding rare or complex issues ambiguous responses mis-spellings and other infrequent but still relevant problems which oc-cur during a natural language dialogue In the following we reviewsome common errors and describe additional observations

misspellings and confusable spelling frequently befall in theuser-generated text Since we attempt to let the conversationremain natural to provide better user experience we do notinstruct students for formal writing Therefore we have to ac-count for various writing issues One of the prevalent problems

[ September 4 2020 at 1017 ndash classicthesis ]

38 multipurpose ipa via conversational assistant

are misspellings German words are commonly long and canbe complex and because users often type quickly it can lead tothe wrong order of characters within a given word To tacklethis challenge we utilized fuzzy match within ES Though themaximum allowed edit distance in ElasticSearch (ES) is set to 2characters which means that all the misspellings beyond thisthreshold could not be correctly identified by ES (eg Differenzial-rechnung vs Differnetialrechnung) Another representative exam-ple would be the formulation of the section or question num-ber The equivalent information can be communicated in sev-eral discrete ways which has to be considered by RegEx unit(eg Aufgabe 5 a Aufgabe V a Aufgabe 5 (a)) A similar prob-lem occurs with confusable spelling (ie Differenzialrechnung vsDifferenzialgleichung)

We analyzed the cases stated above and improved our modelby adding some of the most common misspellings to the Elas-ticSearch (ES) database or handling them with RegEx during thepreprocessing step

elasticsearch threshold We observed both cases where thesystem failed to extract information although the user providedit and examples where ES extracted information not mentionedin a user query According to our understanding these errorsoccur due to the relevancy scoring algorithm of ElasticSearch (ES)where a documentrsquos score is a combination of textual similarityand other metadata based scores Our examination revealedthat ES mostly fails to extract the information if the user mes-sage is rather short (eg about 5 tokens) To overcome this diffi-culty we coupled the current input ut with the dialogue history(h = utmiddotmiddotmiddot tminus2) That eliminated the problem and improved theretrieval quality

To solve the opposite problem where ES extracts false infor-mation (or information that was not mentioned in a query)was more challenging We learned that the problem comesfrom short words or sub-words (ie suffix prefix) which ES

locates in a database and considers them credible enough TheES documentation suggests getting rid of stop words to elim-inate this behavior However this did not improve the searchAlso fine-tuning of ES parameters such as the relevance thresholdprefix length5 and minimum should match6 parameter did not re-sult in notable improvements To cope with this problem weimplemented a verification step where a user can edit the erro-neously retrieved data

5The amount of fuzzified characters This parameter helps to decrease the num-ber of terms which must be reviewed

6Indicates a number of terms that must match for a document to be consideredrelevant

[ September 4 2020 at 1017 ndash classicthesis ]

34 structured dialogue acquisition 39

34 structured dialogue acquisition

As we stated the implemented system not only supports the humantutor by assisting students but also collects structured and labeledtraining data in the background In a trial run of the rule-based sys-tem we accumulated a toy-dataset with dialogues The assembleddataset includes the following

bull Plain dialogues with unique dialogue-indexes

bull Plain ID information collected for the whole conversation

bull Pairs of questions (ie user requests) and responses (ie botreplies) with the unique dialogue- and turn-indexes

bull Triples in the form of (Question Next Action Response) Informa-tion on the next systemrsquos action could be employed to train aDM unit with (deep-) machine learning algorithms

bull On every conversational state we keep the entities that the sys-tem was able to extract along with their position in the utter-ance This information could be used to train a custom domain-specific NER model

35 summary

In this work we realized a conversational agent for IPA purposesthat addresses two problems First it decreases repetitive and time-consuming activities which allows workers of the e-learning plat-form to focus solely on mathematical and hence cognitively demand-ing questions Second by interacting with users it augments theresources with structured and labeled training data The latter canbe utilized for further implementation of learnable dialogue compo-nents

The realization of such a system was connected with multiple chal-lenges Among others were missing structured conversational dataambiguous or erroneous user-generated text and the necessity todeal with existing corporate tools and their design

The proposed conversational agent enabled us to accumulate struc-tured labeled data without any special efforts from the human (ietutors) side (eg manual annotation and post-processing of existingdata change of the conversational structure within the tutoring pro-cess) Once we collected structured dialogues we were able to re-train particular segments of the system with deep learning methodsand achieved consistent performance for both of the proposed tasksWe report the re-implemented elements in Chapter 4

At the time of writing this thesis the system described in this workwas deployed at the OMB+ platform

[ September 4 2020 at 1017 ndash classicthesis ]

40 multipurpose ipa via conversational assistant

36 future work

Possible extensions of our work are

bull In this work we implemented a rule-based dialogue modelfrom scratch and adjusted it for the specific domain (ie e-learning) The core of the model ndash is a dialogue manager thatdetermines the current state of the conversation and the possi-ble next action Rule-based systems are generally consideredto be hardly adaptable to new domains however our dialoguemanager proved to be flexible to slight modifications in a work-flow One of the possible directions of future work would bethus the investigation of the general adaptability of the dialoguemanager core to other scenarios and domains (eg differentcourse)

bull A further possible extension would we the introduction of dif-ferent languages The current model is implemented for the Ger-man language but the OMB+ platform provides this course alsoin English and Chinese Therefore it would be interesting toinvestigate whether it is possible to adjust the existing systemto other languages without drastic changes in the model

[ September 4 2020 at 1017 ndash classicthesis ]

4D E E P L E A R N I N G R E - I M P L E M E N TAT I O N O F U N I T S

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research described inthis chapter was carried out in its entirety by the authorof this thesis The other authors of the publication actedas advisors

In this chapter we address the challenge of re-implementation oftwo central components of our IPA dialogue system described in Chap-ter 3 by employing deep learning techniques Since the data for thetraining was collected in a (short) trial-run of the rule-based systemit is not sufficient to train a neural network from scratch Thereforewe utilize Transfer Learning (TL) methods and employ the previouslypre-trained Bidirectional Encoder Representations from Transform-ers (BERT) model that we fine-tune for two of our custom tasks

41 introduction

Data mining and machine learning technologies have gained notableadvancement in many research fields including but not limited tocomputer vision robotics advanced analytics and computational lin-guistics (Pan and Yang 2009 Trautmann et al 2019 Werner et al2015) Besides that in recent years deep neural networks with theirability to accurately map from inputs to outputs (ie labels imagessentences) while analyzing patterns in data became a potential solu-tion for many industrial automatization tasks (eg fake news detec-tion sentiment analysis)

Nevertheless conventional supervised models still have limitationswhen applied to real-world data and industrial tasks The first chal-lenge here is a training phase since a robust model requires an im-mense amount of structured and labeled data Still there are just afew cases where such data is publicly available (eg research datasets)but for most of the tasks domains and languages the data has tobe gathered over a long time The second hurdle is that if trainedon existing data from a distinct domain a supervised model cannotgeneralize to conditions that are different from those faced in thetraining dataset Most of the machine learning methods perform cor-rectly only under a general assumption the training and test dataare mapped to the same feature space and the same distribution (Panand Yang 2009) When the distribution changes most statistical mod-els need to be retrained from scratch using newly obtained training

41

[ September 4 2020 at 1017 ndash classicthesis ]

42 deep learning re-implementation of units

samples It is especially crucial if applying such approaches to real-world data that is usually very noisy and contains a large numberof scenarios In this case a model would be ill-trained to make deci-sions about novel patterns Furthermore for most of the real-worldapplications it is expensive or even not feasible at all to recollect therequired training data to retrain the model Hence the possibility totransfer the knowledge obtained during the training on existing datato the new unseen domain is one of the possible solutions for indus-trial problems Such a technique is called Transfer Learning (TL) andit allows dealing with the abovementioned scenarios by leveragingexisting labeled data of some related tasks or domains

411 Outline and Contributions

This work addresses the challenge of re-implementing two out ofthree central components in our rule-based IPA conversational assis-tant with deep learning methods As we discussed in Chapter 2 hy-brid dialogue models (ie consisting of both rule-based and trainablecomponents) appear to replace the established rule- and template-based methods which are to the most part employed in the in-dustrial setting To investigate this assumption and examine currentstate-of-the-art deep learning techniques we conducted experimentson our custom domain-specific data Since the available data werecollected in a trial-run and the obtained dataset was relatively smallto train a conventional machine learning model from scratch we uti-lized the Transfer Learning (TL) approach and fine-tuned the existingpre-trained model for our target domain

For the experiments we defined two tasks

bull First we considered the NER problem in a custom domain set-ting We defined a sequence labeling task and employed a Bidirec-tional Encoder Representations from Transformers (BERT) (De-vlin et al 2018) as one of the transfer learning methods widelyutilized in the NLP area We applied the model on our datasetand fine-tuned it for seven (7) domain-specific (ie e-learning)entities

bull Second we examined the effectiveness of transfer learning forthe dialogue manager core The DM is one of the fundamentalparts of the conversational agent and requires compelling learn-ing techniques to hold the conversation intelligently For thatexperiment we defined a classification task and applied the BERT

model to predict the systems next action for every given userinput in a conversation We then computed the macro F1-scorefor 14 possible classes (ie actions) and an average dialogueaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

42 related work 43

The main characteristics of both settings are

bull the set of named entities and labels (ie actions) between sourceand target domain is different

bull for both settings we utilized the BERT model in German andmultilingual versions with additional parameters

bull the dataset used to train the target model is small and consistsof highly domain-specific labels The latter is often appearingin industrial scenarios

We finally verified that the selected method performed reasonablywell on both tasks We reached the performance of 093 macro F1

points for Named Entity Recognition (NER) and 075 macro F1 pointsfor the Next Action Prediction (NAP) task Therefore we concludethat both NER and NAP components could be employed to substituteor extend the existing rule-based modules

42 related work

As discussed in the previous section the main benefit of TransferLearning (TL) is that it allows the domains and tasks used duringtraining and testing phases to be different Initial steps in the for-mulation of transfer learning were made after NeurIPS-951 workshopthat was focused on the demand for lifelong machine-learning meth-ods that preserve and reuse previously learned knowledge (Chen etal 2018) Research on transfer learning has attracted more and moreattention since then especially in the computer vision domain andin particular for image recognition tasks (Donahue et al 2014 SharifRazavian et al 2014)

Recent studies have shown that TL can also be successfully appliedwithin the Natural Language Processing (NLP) domain especially onsemantically equivalent tasks (Dai et al 2019 Devlin et al 2018Mou et al 2016) Several research works were carried out specificallyon the Named Entity Recognition (NER) task and explored the capa-bilities of transfer learning when applied to different named entitycategories (ie different output spaces) For instance Qu et al (2016)pre-trained a linear-chain Conditional Random Fields (CRF) on a largeamount of labeled data in the source domain Then they introduceda two linear layer neural network that learns the difference betweenthe source and target label distributions Finally the authors initial-ized another CRF with parameters learned in linear layers to predictthe labels of the target domain Kim et al (2015) conducted experi-ments with transferring features and model parameters between re-lated domains where the label classes are different but may have asemantic similarity Their primary approach was to construct labelembeddings to automatically map the source and target label classesto improve the transfer (Chen et al 2018)

1Conference on Neural Information Processing Systems Formerly NIPS

[ September 4 2020 at 1017 ndash classicthesis ]

44 deep learning re-implementation of units

However there are also models trained in a manner such that theycould be applied to multiple NLP tasks simultaneously Among sucharchitectures are ULMFit (Howard and Ruder 2018) BERT (Devlin etal 2018) GPT2 (Radford et al 2019) and TransformerXL (Dai et al2019) ndash powerful models that were recently proposed for transferlearning within the NLP domain These architectures are called lan-guage models since they attempt to learn language structure from largetextual collections in an unsupervised manner and then predict thenext word in a sequence based on previously observed context wordsThe Bidirectional Encoder Representations from Transformers (BERT)(Devlin et al 2018) model is one of the most popular architectureamong the abovementioned and it is also one of the first that showeda close to human-level performance on the General Language Un-derstanding Evaluation (GLUE) benchmark2 (Wang et al 2018) thatincludes tasks on sentiment analysis question answering semanticsimilarity and language inference The architecture of BERT consistsof stacked transformer blocks that are pre-trained on a large corpusconsisting of 800M words from modern English books (ie BooksCor-pus by Zhu et al) and 2 500M words of text from pre-processed (iewithout markup) English Wikipedia articles Overall BERT advancedthe state-of-the-art performance for eleven (11) NLP tasks We describethis approach in detail in the subsequent section

43 bert bidirectional encoder representations from

transformers

Bidirectional Encoder Representations from Transformers (BERT) ar-chitecture is divided into two steps pre-training and fine-tuning Thepre-training step was enabled through two following tasks

bull Masked Language Model ndash the idea behind this approach is totrain a deep bidirectional representation that is different fromconventional language models which are either trained left-to-right or right-to-left To train a bidirectional model the authorsmask out a certain number of tokens in a sequence The modelis then asked to predict the original words for masked tokens Itis essential that in contrast to denoising auto-encoders (Vincentet al 2008) the model does not need to reconstruct the entireinput instead it attempts to predict the masked words Sincethe model does not know which words exactly it will be askedabout it learns a representation for every token in a sequence(Devlin et al 2018)

bull Next Sentence Prediction ndash aims to capture relationships be-tween two sentences A and B In order to train the model theauthors pre-trained a binarized next sentence prediction taskwhere pairs of sequences were labeled according to the criteriaA and B either follow each other directly in the corpus (label

2httpsgluebenchmarkcom

[ September 4 2020 at 1017 ndash classicthesis ]

43 bert bidirectional encoder representations from transformers 45

IsNext) or are both taken from random places (label NotNext)The model is then asked to predict whether the former or thelatter is the case The authors demonstrated that pre-trainingtowards this task is extremely useful for both question answeringand natural language inference tasks (Devlin et al 2018)

Furthermore BERT uses WordPiece tokenization for embeddings(Wu et al 2016) which segments words into subword-level tokensThis approach allows the model to make additional inferences basedon word structure (ie -ing ending denotes similar grammatical cat-egories)

The fine-tuning step follows the pre-training process For that theBERT model is initialized with parameters obtained during the pre-training step and a task-specific fully-connected layer is used aftertransformer blocks which maps the general representations to cus-tom labels (employing softmax operation)

self-attention The key operation of BERT architecture is theself-attention which is a sequence-to-sequence operation with conven-tional encoder-decoder structure (Vaswani et al 2017) The encodermaps an input sequence (x1 x2 xn) to a sequence of continuousrepresentations z = (z1 z2 zn) Given z the decoder then gener-ates an output sequence (y1y2 ym) of symbols one element ata time (Vaswani et al 2017) To produce output vector yi the selfattention operation takes a weighted average over all input vectors

yi =sumj

wijxj (16)

where j is an index over the entire sequence and the weights sumto one (1) over all j The weight wij is derived from a dot productfunction over xi and xj (Bloem 2019 Vaswani et al 2017)

w primeij = xTi xj (17)

The dot product returns a value between negative and positive in-finity and a softmax maps these values to [0 1] to ensure that theysum to one (1) over the entire sequence (Bloem 2019 Vaswani et al2017)

wij =expwlsquoijsumj expwlsquoij

(18)

A self-attention is not the only operation in transformers but is theessential one since it propagates information between vectors Otheroperations in the transformer are applied to every vector in the inputsequence without interactions between vectors (Bloem 2019 Vaswaniet al 2017)

[ September 4 2020 at 1017 ndash classicthesis ]

46 deep learning re-implementation of units

44 data amp descriptive statistics

The benchmark that we accumulated through the trial-run includes300 structured and partially labeled dialogues with the average lengthof dialogue being six (6) utterances All the conversations were heldin the German language Detailed general statistics can be found inTable 14

Value

Max Len Dialogue (in utterances) 15

Avg Len Dialogue (in utterances) 6

Max Len Utterance (in tokens) 100

Avg Len Utterance (in tokens) 9

Overall Unique Action Labels 13

Overall Unique Entity Labels 7

Train ndash Dialogues ( Utterances) 200 (1161)

Eval ndash Dialogues ( Utterances) 50 (279)

Test ndash Dialogues ( Utterances) 50 (300)

Table 14 General statistics for conversational dataset

The dataset covers 14 unique next actions and seven (7) uniquenamed entities Detailed information on next actions and entities canbe found in Table 15 and Table 16 Below we listed possible actionswith the corresponding explanation and predefined mapped state-ments

unk ndash requests to define the intent of the question mathematicalorganizational or contextual (ie fallback policy in case if theautomated intent recognition unit fails) rarr ldquoHast du eine Fragezu einer Aufgabe (MATH) zu einem Text im Kurs (TEXT) oder eineorganisatorische Frage (ORG)rdquo

org ndash handovers discussion to a human tutor in case of organiza-tional question rarr ldquoEs scheint sich wohl um eine organisatorischeFrage zu handeln Bitte fasse deine Frage in einem kurzen Text zusam-men damit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

text ndash handovers discussion to a human tutor in case of contextualquestion rarr ldquoEs scheint sich wohl um eine inhaltliche Frage zuhandeln Bitte fasse deine Frage in einem kurzen Text zusammendamit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

[ September 4 2020 at 1017 ndash classicthesis ]

44 data amp descriptive statistics 47

topic ndash requests information on the topic rarr ldquoWelches Kapitel bear-beitest du gerade Antworte bitte mit der Kapitelnummer zB IA IB II oder mit dem Kapitelnamen zB Geometrierdquo

examination ndash requests information about the examination moderarrldquoBearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfungrdquo

subtopic ndash requests information about the subtopic in case of quizor section-level trainingexercise modes rarr ldquoWie heiszligt der Ab-schnitt in dem sich die Aufgabe befindet auf die sich deine Fragebeziehtrdquo

level ndash requests information about the level of examination modechapter or section rarr ldquoGeht es um eine Aufgabe auf Kapitel-Ebene(zB um eine Trainingsaufgabe im Kapitel Geometrie) oder um eineAufgabe auf der Ebene eines Abschnitts in einem Kapitel (zB umeine Aufgabe zum Abschnitt Winkel im Kapitel Geometrie) Antwortebitte mit KAP fuumlr Kapitel und SEK fuumlr Abschnittrdquo

question number ndash requests information about the question num-ber rarr ldquoWelche Aufgabe bearbeitest du gerade Wenn du zB beiTeilaufgabe c in der zweiten Trainingsaufgabe (oder der zweiten Uumlbungs-aufgabe) bist antworte mit 2c Bearbeitest du gerade einen Quiz odereine Schlusspruumlfung gib bitte nur die Teilaufgabe anrdquo

final request ndash considers the Informational Dictionary (ID) to becomplete and initiates the verification step rarr ldquoDeine Angabenwaren wie folgt Habe ich dich richtig verstanden Bitte antwortemit Ja oder Neinrdquo

verify request ndash requests the student to verify the assembled in-formation rarr ldquoWelche der folgenden Punkte habe ich falsch erkanntSchreibe bitte welche der folgenden Punkte es sind [a b c ]3 rdquo

correct request ndash requests the student to provide a correct infor-mation if there is an error in the assembled data rarr ldquoGib bittedie korrekte Information fuumlr [a]4 einrdquo

exact question ndash requests the student to provide the detailed ex-planation of the problem rarr ldquoBitte fasse deine Frage in einemkurzen Text zusammen damit ich sie an meinen menschlichen Kolle-gen weiterleiten kannrdquo

human handover ndash finalizes the conversation rarr ldquoVielen Dankunsere menschlichen Kollegen melden sich gleich bei dirrdquo

3Variables that depend on the state of Informational Dictionary (ID)4Variable that was selected as erroneous and needs to be corrected

[ September 4 2020 at 1017 ndash classicthesis ]

48 deep learning re-implementation of units

Action Counts Action Counts

Final Request 321 Unk 80

Human Handover 300 Subtopic 55

Exact Question 286 Correct Request 40

Question Number 176 Verify Request 34

Examination 175 Org 17

Topic 137 Text 13

Level 130

Table 15 Detailed statistics on possible systems actions Columns Countsdenote the number of occurrences of each action in the entiredataset

Entity Counts

Question Nr 317

Chapter 311

Examination 303

Subtopic 198

Level 80

Intent 70

Table 16 Detailed statistics on possible named entities Column Counts de-note the number of occurrences of each entity in the entire dataset

45 named entity recognition

We defined a sequence labeling task to extract custom entities fromuser input We considered seven (7) viable entities (see Table 16) tobe recognized by the model topic subtopic examination mode and levelquestion number intent as well as the entity other for remaining wordsin the utterance Since the data collected from the rule-based sys-tem already includes information on the entities we able to train adomain-specific NER unit However since the original user-input wasinformal the same information could be provided in different writ-ing styles That means that a single entity could have different surfaceforms (eg synonyms writing styles) Therefore entities that we ex-tracted from the rule-based system were transformed into a universalstandard (eg official chapter names) To consider all of the variableentity forms while post-labeling the original dataset we determinedgeneric entity names (eg chapter question nr) and mapped vari-ations of entities from the user input (eg Chapter = [ElementaryCalculus Chapter I ]) to them The overall dataset consists of 300labeled dialogues where 200 (with 1161 utterances) were employedfor training and 100 for evaluation and test sets (50 dialogues withca 300 utterances for each set respectively)

[ September 4 2020 at 1017 ndash classicthesis ]

46 next action prediction 49

451 Model Settings

We conducted experiments with German and multilingual BERT im-plementations5 The capitalization of words is significant for the Ger-man language therefore we run the experiments on the capitalizeddata while preserving the original punctuation We employed the avail-able base model for both multilingual and German BERT implemen-tations in the cased version We initiated the learning rate for bothmodels to 1e minus 4 and the maximum length of the tokenized inputwas set to 128 tokens We conducted the experiments multiple timeswith different seeds for a maximum of 50 epochs with the trainingbatch size set to 32 We utilized AdamW as the optimizer and appliedearly stopping if the performance did not improve significantly after5 epochs

46 next action prediction

We defined a classification problem where we aim to predict the sys-temrsquos next action according to the given user input We assumed 14custom actions (see Table 15) that we considered being our classesWhile collecting the toy dataset every input was automatically la-beled by the rule-based system with the corresponding next actionand the dialogue-id Thus no additional post-labeling was requiredon this step

We investigated two settings for this task

bull Default Setting Utilizing a user input and the correspondingclass (ie next action) without any supplementary context Bydefault we conduct all of our experiments in this setting

bull Extended Setting Employing a user input corresponding nextaction and previous systems action as a source of supplementarycontext For this setting we experimented with the best per-forming model from the default setting

The overall dataset consists of 300 labeled dialogues where 200(with 1161 utterances) were employed for training and 100 for evalu-ation and test sets (50 dialogues with ca 300 utterances for each setrespectively)

461 Model Settings

For the NAP task we carried out experiments with German and multi-lingual BERT implementations as well We examined the performanceof both capitalized and lower-cased inputs as well as plain and prepro-cessed data For the multilingual BERT we used the base model inboth cased and uncased modifications For the German BERT we

5httpsgithubcomhuggingfacetransformers

[ September 4 2020 at 1017 ndash classicthesis ]

50 deep learning re-implementation of units

utilized the base model in the cased modification only6 For bothmodels we initiated the learning rate to 4e minus 5 and the maximumlength of the tokenized input was set to 128 tokens We conductedthe experiments multiple times with different seeds for a maximumof 300 epochs with the training batch size set to 32 We utilized AdamW

as the optimizer and applied early stopping if the performance didnot improve significantly after 15 epochs

47 evaluation and results

To evaluate the models we computed word-level macro F1 score forthe Named Entity Recognition (NER) task and utterance-level macro F1

score for the Next Action Prediction (NAP) task The word-level F1

is estimated as the average of the F1 scores per class each computedfrom all words in the evaluation and test sets The outcomes for theNER task are represented in Table 17 For utterance-level F1 a singleclass label (ie next action) is taken for the whole utterance Theresults for the NAP task are shown in Table 18

We additionally computed average dialogue accuracy for the best per-forming NAP models This score indicates how well the predictednext actions compose the conversational flow The average dialogueaccuracy was estimated for 50 conversations in the evaluation andtest sets respectively The results are displayed in Table 19

The obtained results for the NER task revealed that German BERT

performed significantly better than the multilingual BERT model Theperformance of the custom NER unit is at 093 macro F1 points for allpossible named entities (see Table 17) In contrast for the NAP taskthe multilingual BERT model obtained better performance than theGerman BERT model The best performing method in the default set-ting gained a macro F1 of 0677 points for 14 possible classes whereasthe model in the extended setting performed better ndash its best macroF1 score is 0752 for the equal number of classes (see Table 18)

Regarding the dialogue accuracy the extended system fine-tunedwith multilingual BERT obtained better outcomes as the default one(see Table 19) The overall conclusion for the NAP is that the capi-talized setting increased the performance of the model whereas theinclusion of punctuation has not influenced the results

6Uncased pre-trained modification of model was not available

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 51

Task Model Cased Punct F1

Eval Test

NER GER X X 0971 0930

NER Mult X X 0926 0905

Table 17 Word-level F1 for the Named Entity Recognition (NER) task Inbold best performance for evaluation and test sets

Task Model Cased Punct Extended F1

Eval Test

NAP GER X times times 0711 0673

NAP GER X X times 0701 0606

NAP Mult X X times 0688 0625

NAP Mult X times times 0769 0677

NAP Mult X times X 0810 0752

NAP Mult times X times 0664 0596

NAP Mult times times times 0742 0502

Table 18 Utterance-level F1 for the Next Action Prediction (NAP) task Un-derlined best performance for evaluation and test sets for defaultsetting (without previous action context) In bold best perfor-mance for evaluation and test sets on extended setting (with previ-ous action context)

Model Accuracy

Eval Test

NAP default 0765 0724

NAP extended 0813 0801

Table 19 Average dialogue accuracy computed for the Next Action Predic-tion (NAP) task for best preforming models In bold best perfor-mance for evaluation and test sets

[ September 4 2020 at 1017 ndash classicthesis ]

52 deep learning re-implementation of units

Figure5M

acroF1

evaluationN

AP

ndashdefault

modelC

omparison

between

Germ

anand

multilingual

BERT

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 53

Figu

re6

Mac

roF1

test

NA

Pndash

defa

ult

mod

elC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

54 deep learning re-implementation of units

Figure7M

acroF1

evaluationN

AP

ndashextended

model

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 55

Figu

re8

Mac

roF1

test

NA

Pndash

exte

nded

mod

el

[ September 4 2020 at 1017 ndash classicthesis ]

56 deep learning re-implementation of units

Figure9M

acroF1

evaluationN

ERndash

Com

parisonbetw

eenG

erman

andm

ultilingualBER

T

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 57

Figu

re1

0M

acroF1

test

NER

ndashC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

58 deep learning re-implementation of units

48 error analysis

We then investigated the cases where models failed to predict thecorrect action or labeled the named entity span erroneously Belowwe outline the most common errors

next action prediction One of the prevalent flaws in the de-fault model was the mismatch between two consecutive actionsndash the action question number and subtopic That is due to theposition of these actions in the conversational flow The occur-rence of both actions in the dialogue is not strict and largelydepends on the previous action To explain this we illustratethe following possible conversational scenarios

bull The system can either directly proceed with the request forthe question number if the examination mode is the finalexamination

bull If the examination mode is training or exercise the systemhas to ask about the level first (ie chapter or section)

bull If it is a chapter-level the system can again directly pro-ceed with the request for the question number

bull In the case of section-level the system has to ask aboutthe subsection first (if this information was not providedbefore)

The examination of the extended model showed that the inductionof supplementary context (ie previous action) improved theperformance of the system by about 60

named entity recognition The cases where the system failedcover mismatches between the labels chapter and other andthe labels question number and other This failure arose dueto the imperfectly labeled span of a multi-word named entity(eg ldquoElementary Calculus Proportionality Percentagerdquo) The firstor last words in the named entity were excluded from the spanand erroneously labeled with the other label

49 summary

In this chapter we re-implemented two of three fundamental unitsof our IPA conversational agent ndash Named Entity Recognition (NER)and Dialogue Manager (DM) ndash using the methods of Transfer Learn-ing (TL) specifically the Bidirectional Encoder Representations fromTransformers (BERT) model

We believe that the obtained results are reasonably high consider-ing a comparatively little amount of training data we employed tofine-tune the models Therefore we conclude that both NAP and NER

segments could be used to replace or extend the existing rule-based

[ September 4 2020 at 1017 ndash classicthesis ]

410 future work 59

modules Rule-based components as well as ElasticSearch (ES) arelimited in their capacities and not flexible to novel patterns Whereasthe trainable units generalize better and could possibly decrease theamount of inaccurate predictions in case of accidental dialogue be-havior Furthermore to increase the overall robustness rule-basedand trainable components could be used synchronously when onesystem fails (eg ElasticSearch (ES) module can not extract an entity)the conversation continues on the prediction obtained from the othermodel

410 future work

There are possible directions for future work

bull At the time of writing of this thesis both units were trained andinvestigated on its own ndash that means without being incorpo-rated into the initial IPA conversational system Thus one of thepotential directions of future work would be the embodimentof these units into the original system and the examination ofpossible failure cases Furthermore the resulting hybrid sys-tem could then imply the constraint of additional meta policiesto prevent an unexpected systems behavior

bull Further investigation could be towards the multi-language mo-dality for the re-implemented units Since the Online Mathe-matik Bruumlckenkurs (OMB+) platform also supports English andChinese it would be interesting to examine whether the sim-ple translation from target language (ie English Chinese) tosource language (ie German) would be sufficient to employalready-assembled dataset and pre-trained units Presumablyit should be achievable in the case of the Next Action Predic-tion (NAP) unit but could lead to a noticeable performance dropfor the Named Entity Recognition (NER) task since there couldbe a considerable gap between translated and original entitiesspans

bull Last but not least one of the further extensions for the IPA

would be the eventual applicability for more cognitively de-manding tasks For instance it is of interest to extend the modelfor more complex domains like handling of mathematical ques-tions Would it be possible to automate the human assistance atleast by less complicated mathematical inquiries or is it still un-achievable with currently existing NLP techniques and machinelearning methods

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

Part II

C L U S T E R I N G - B A S E D T R E N D A N A LY Z E R

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

5F O U N D AT I O N S

As discussed in Chapter 1 predictive analytics is one of the poten-tial Intelligent Process Automation (IPA) tasks due to its capabilityto forecast future events by using current and historical data andtherethrough assist knowledge workers with for instance sales or in-vestments Modern machine learning algorithms allow an intelligentand fast analysis of large amounts of data However the evaluationof such algorithms and models remains one of the grave problemsin this domain This is mostly due to the lack of publicly availablebenchmarks for the evaluation of topic modeling and trend detectionalgorithms Hence in this part of the thesis we are addressing thischallenge and assembling the benchmark for detecting both trendsand downtrends (ie topics that become less frequent overtime) Tothe best of our knowledge the task of downtrend detection has notbeen well addressed before

In this chapter we introduce the concepts required by the secondpart of the thesis We begin with the enlightenment to the topic mod-eling task and define the emerging trend in Section 51 Next weexamine the related work existing approaches to topic modeling andtrend detection and their limitations in Section 52 In Chapter 6we introduce our TRENDNERT benchmark for trend and downtrenddetection and describe the method we employed for its assembly

51 motivation and challenges

Science changes and evolves rapidly novel research areas emergewhile others fade away Keeping pace with these changes is challeng-ing Therefore recognizing and forecasting emerging research trends isof significant importance for researchers academic publishers as wellas for funding agencies stakeholders and innovation companies

Previously the task of trend detection was mostly solved by do-main experts who used specialized and cumbersome tools to inves-tigate the data to get useful insights from it (eg through data visu-alization or statistics) However manual analysis of large amountsof data could be time-consuming and hiring domain experts is verycostly Furthermore the overall increase of research data (ie GoogleScholar PubMed DBLP) in the past decade makes the approachbased on human domain experts less and less scalable In contrastautomated approaches become a more desirable solution for the taskof emerging trend identification (Kontostathis et al 2004 Salatino2015) Due to a large number of electronic documents appearing on

63

[ September 4 2020 at 1017 ndash classicthesis ]

64 foundations

the web daily several machine learning techniques were developed tofind patterns of word occurrences in those documents using hierarchi-cal probabilistic models These approaches are called ndash topic modelsand they allow the analysis of extensive textual collections and valu-able insights from them The main advantage of topic models is theirability to discover hidden patterns of word-use and connecting docu-ments containing similar patterns In other words the idea of a topicmodel is that documents are a mixture of topics where a topic is de-fined as a probability distribution over words (Alghamdi and Alfalqi2015)

Topics have a property of being able to evolve thus modeling top-ics without considering time may confuse the precise topic identifi-cation Whereas modeling topics by considering time is called topicevolution modeling and it can reveal important hidden informationin the document collection allowing recognizing topics with the ap-pearance of time This approach also provides the ability to check theevolution of topics during a given time window and examine whethera given topic is gaining interest and popularity over time (ie becom-ing an emerging trend) or if its development is stable Considering thelatter approach we turn to the task of emerging trend identification

How is an emerging trend defined An emerging trend is a topicthat is gaining interest and utility over time and it has two attributesnovelty and growth (Small Boyack and Klavans 2014 Tu and Seng2012) Rotolo Hicks and Martin (2015) explain the correlations ofthese two attributes in the following way Up to the point of theemergence a research topic is specified by a high level of novelty Atthat time it does not attract any considerable attention from the sci-entific community and due to the limited impact its growth is nearlyflat After some turning points (ie the appearance of significant sci-entific publications) the research topic starts to grow faster and maybecome a trend if the growth is fast however the level of a noveltydecreases continuously once the emergence becomes clear Then atthe final phase the topic becomes well-established while its noveltylevels out (He and Chen 2018)

To detect trends in textual data one employs computational anal-ysis techniques and Natural Language Processing (NLP) In the sub-sequent section we give an overview of the existing approaches fortopic modeling and trend detection

52 detection of emerging research trends

The task of trend detection is closely associated with the topic mod-eling task since in the first instance it detects the latent topics in thesubstantial collection of data It secondarily employs techniques toinvestigate the evolution of the topic and whether it has the potentialto become a trend

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 65

Topic modeling has an extensive history of applications in the scien-tific domain including but not limited to studies of temporal trends(Griffiths and Steyvers 2004 Wang and McCallum 2006) and investi-gation of the impact prediction (Yogatama et al 2011) Below we givea general overview of the most significant methods used in this do-main explain their importance and if any the potential limitations

521 Methods for Topic Modeling

In order to extract a topic from textual documents (ie research pa-pers blog posts) the precise definition of the model for topic repre-sentation is crucial

latent semantic analysis or formerly called Latent SemanticIndexing is one of the fundamental methods in the topic model-ing domain proposed by Deerwester et al in 1990 Its primarygoal is to generate vector-based representations for documentsand terms and then employ these representations to computethe similarity between documents (resp terms) in order to se-lect the most related ones In this regard Latent Semantic Anal-ysis (LSA) takes a matrix of documents and terms and then de-composes it into a separate document-topic matrix and a topic-term matrix Before turning to the central task of finding latenttopics that capture the relationships among the words and docu-ments it also performs dimensionality reduction utilizing trun-cated Singular Value Decomposition This technique is usuallyapplied to reduce the sparseness noisiness and redundancyacross dimensions in the document-topic matrix Finally usingthe resulting document and term vectors one applies measuressuch as cosine similarity to evaluate the similarity of differentdocuments words or queries

One of the extensions to traditional LSA is probabilistic LatentSemantic Analysis (pLSA) that was introduced by Hofmann in1999 to fix several shortcomings The foremost advantage ofthe follow-up method was that it could successfully automatedocument indexing which in turn improved the LSA in a prob-abilistic sense by using a generative model (Alghamdi and Al-falqi 2015) Furthermore the pLSA can differentiate betweendiverse contexts of word usage without utilizing dictionariesor a thesaurus That affects better polysemy disambiguationand discloses typical similarities by grouping words that sharea similar context Among the works that employ pLSA is theapproach proposed by Gohr et al (2009) where authors applypLSA in a window that slides across the stream of documents toanalyze the evolution of topics Also Mei et al (2008) uses thepLSA in order to create a network of topics

[ September 4 2020 at 1017 ndash classicthesis ]

66 foundations

latent dirichlet allocation was developed by Blei Ng andJordan (2003) to overcome the limitations of both LSA and pLSA

models Nowadays Latent Dirichlet Allocation (LDA) is ex-pected to be one of the most utilized and robust probabilistictopic models The fundamental concept of LDA is the assump-tion that every document is represented as a mixture of topicswhere each topic is a discrete probability distribution that de-fines the probability of each word to appear in a given topicThese topic probabilities provide a compressed representationof a document In other words LDA models all of d documentsin the collection as a mixture over n latent topics where eachof the topics describes a multinomial distribution over awwordin the vocabulary (Alghamdi and Alfalqi 2015)

LDA fixes such weaknesses of the pLSA like the incapacity toassign a probability to previously unseen documents (becausepLSA learns the topic mixtures only for documents seen in thetraining phase) and the overfitting problem (ie the number ofparameters in pLSA increases linearly with the size of the corpus)(Salatino 2015)

An extension of the conventional LDA is the hierarchical LDA

(Griffiths et al 2004) where topics are grouped in hierarchiesEach node in the hierarchy is associated with a topic and a topicis a distribution across words Another comparable approach isthe Relational Topic Model (Chang and Blei 2010) which is amixture of a topic model and a network model for groups oflinked documents The attribute of every document is its text(ie discrete observations taken from a fixed vocabulary) andthe links between documents are hyperlinks or citations

LDA is a prevalent method for discovering topics (Blei and Laf-ferty 2006 Bolelli Ertekin and Giles 2009 Bolelli et al 2009Hall Jurafsky and Manning 2008 Wang and Blei 2011 Wangand McCallum 2006) However it was explored that to trainand fine-tune a stable LDA model could be very challenging andtime-consuming (Agrawal Fu and Menzies 2018)

correlated topic model One of the main drawbacks of the LDA

is its incapability of capturing dependencies among the topics(Alghamdi and Alfalqi 2015 He et al 2017) Therefore Bleiand Lafferty (2007) proposed the Correlated Topic Model as anextension of the LDA approach In this model the authors re-placed the Dirichlet distribution with a logistic-normal distribu-tion that models pairwise topic correlations with the Gaussiancovariance matrix (He et al 2017)

However one of the shortcomings of this model is the compu-tational cost The number of parameters in the covariance ma-trix increases quadratic to the number of topics and parameterestimation for the full-rank matrix can be inaccurate in high-dimensional space (He et al 2017) Therefore He et al (2017)

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 67

proposed their method to overcome this problem The authorsintroduced the model which learns dense topic embeddingsand captures topic correlations through the similarity betweenthe topic vectors According to He et al their method allowsefficient inference in the low-dimensional embedding space andsignificantly reduces previous cubic or quadratic time complex-ity to linear concerning the topic number

522 Topic Evolution of Scientific Publications

Although the topic modeling task is essential it has a significant off-spring task namely the modeling of topic evolution Methods able toidentify topics within the context of time are highly applicable in thescientific domain They allow understanding of the topic lineage andreveal how research on one topic influences another Generally thesemethods could be classified according to the primary sources of in-formation they employ text citations or key phrases

text-based Extensive textual collections have dynamic co-occurre-nces of word patterns that change over time Thus models thattrack topics evolution over time should consider both word co-occurrence patterns and the timestamps Blei and Lafferty 2006

In 2006 Wang and McCallum proposed a Non-Markov Continu-ous Time model for the detection of topical trends in the col-lection of NeurIPS publications (the overall timestamp of 17years was considered) The authors made the following as-sumption If the pattern of the word co-occurrence exists fora short time then the system creates a narrow-time-distributiontopic Whereas for the long-term patterns system generated abroad- time-distribution topic The principal objective of this ap-proach is that it models topic evolution without discretizing thetime ie the state at time t + t1 is independent of the state attime t (Wang and McCallum 2006)

Another work done by Blei and Lafferty (2006) proposed the Dy-namic Topic Models The authors assumed that the collectionof documents is organized based on certain time spans and thedocuments belonging to every time span are represented witha K-component model Thereby the topics are associated withtime span t and emerge from topics corresponding to span timetminus 1 One of the main differences of this model compared toother existing approaches is that it utilizes Gaussian distribu-tion for the topic parameters instead of the conventional Dirich-let distribution and therefore can capture the evolution overvarious time spans

[ September 4 2020 at 1017 ndash classicthesis ]

68 foundations

citation-based analysis is the second popular direction that isconsidered to be effective for trend identification He et al 2009Le Ho and Nakamori 2005 Shibata Kajikawa and Takeda2009 Shibata et al 2008 Small 2006 This method assumes thatcitations in their various forms (ie bibliographic citations co-citation networks citation graphs) indicate the meaningful rela-tionship between topics and uses citations to model the topicevolution in the scientific domain Nie and Sun (2017) utilizethis approach along with the Latent Dirichlet Allocation (LDA)and k-means clustering to identify research trends The authorsfirst use LDA to extract features and determine the optimal num-ber of topics Then they use k-means to obtain thematic clus-ters and finally they compute citation functions to identify thechanges of clusters over time

Though the citation-based approachrsquos main drawback is that aconsistent collection of publications associated with a specificresearch topic is required for these techniques to detect the clus-ter and novelty of the particular topic He and Chen 2018 Fur-thermore there are also many trend detection scenarios (egresearch news articles or research blog posts) in which citationsare not readily available That makes this approach not applica-ble to this kind of data

keyphrase-based Another standard solution is based on the useof keyphrase information extracted from research papers In thiscase every keyword is representative of a single research topicSystems like Saffron (Monaghan et al 2010) as well as Saffron-based models use this approach Saffron is a research toolkitthat provides algorithms for keyphrase extraction entity link-ing and taxonomy extraction Asooja et al (2016) utilize Saffronto extract the keywords from LREC proceedings and then pro-posed trend forecast based on regression models as an extensionto Saffron

However this method raises many questions including the fol-lowing how to deal with the noisiness of keywords (ie notrelevant keywords) and how to handle keywords that have theirhierarchies (ie sub-areas) Other drawbacks of this approachinclude the ambiguity of keywords (ie ldquoJavardquo as an island andldquoJavardquo as a programming language) or necessity to deal withsynonyms which could be treated as different topics

53 summary

An emerging trend detection algorithm aims to recognize topics thatwere earlier inappreciable but now gaining the importance and in-terest within the specific domain Knowledge of emerging trends isespecially essential for stakeholders researchers academic publish-ers and funding organizations Assume a business analyst in the

[ September 4 2020 at 1017 ndash classicthesis ]

53 summary 69

biotech company who is interested in the analysis of technical arti-cles for recent trends that could potentially impact the companiesinvestments A manual review of all the available information in aspecific domain would be extremely time-consuming or even not fea-sible However this process could be solved through Intelligent Pro-cess Automation (IPA) algorithms Such software would assist humanexperts who are tasked with identifying emerging trends through au-tomatic analysis of extensive textual collections and visualization ofoccurrences and tendencies of a given topic

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

6B E N C H M A R K F O R S C I E N T I F I C T R E N D D E T E C T I O N

This chapter covers work already published at interna-tional peer-reviewed conferences The relevant publica-tion is Moiseeva and Schuumltze 2020 The research describedin this chapter was carried out in its entirety by the authorof the thesis The other author of the publication acted asan advisor

Computational analysis and modeling of the evolution of trendsis a significant area of research because of its socio-economic impactHowever no large publicly available benchmark for trend detectioncurrently exists making a comparative evaluation of methods impos-sible We remedy this situation by publishing the benchmark TREND-NERT consisting of a set of gold trends (resp downtrends) and doc-ument labels that is available as an unlimited download and a largeunderlying document collection that can also be obtained for free Wepropose Mean Average Precision (MAP) as an evaluation measure fortrend detection and apply this measure in an investigation of severalbaselines

61 motivation

What is an emerging research topic and how is it defined in the litera-ture At the time of emergence a research topic often does not attractmuch recognition from the scientific society and is exemplified by justa few publications He and Chen (2018) associate the appearance ofa trend to some triggering event eg the publication of an article ina high-impact journal Later the research topic starts to grow fasterand becomes a trend (He and Chen 2018)

We adopt this as our key representation of trend in this paper atrend is a research topic with a strongly increasing number of publi-cations in a particular time interval In opposite to trends there arealso downtrends which refer to the topics that move lower in theirdemand or growth as they evolve Therefore we define a downtrendas the converse to a trend a downtrend is a research topic that has astrongly decreasing number of publications in a particular time inter-val Downtrends can be as important to detect as trends eg for afunding agency planning its budget

To detect both types of topics (ie trends and downtrends) intextual data one employs computational analysis techniques and NLPThese methods enable the analysis of extensive textual collectionsand gain valuable insights into topical trendiness and evolution over

71

[ September 4 2020 at 1017 ndash classicthesis ]

72 benchmark for scientific trend detection

time Despite the overall importance of trend analysis there is to ourknowledge no large publicly available benchmark for trend detectionThis makes a comparative evaluation of methods and algorithms im-possible Most of the previous work has used proprietary or moder-ate datasets or evaluation measures were not directly bound to theobjectives of trend detection (Gollapalli and Li 2015)

We remedy this situation by publishing the benchmark TREND-NERT1 The benchmark consists of the following

1 A set of gold trends compiled based on an extensive analysis ofthe meta-literature

2 A set of labeled documents (based on more than one millionscientific publications) where each document is assigned to atrend a downtrend or the class of flat topics and supported bythe topic-title

3 An evaluation script to compute Mean Average Precision (MAP)as the measure for the accuracy of trend detection

We make the benchmark available as unlimited downloads2 Theunderlying research corpus can be obtained for free from SemanticScholar3 (Ammar et al 2018)

611 Outline and Contributions

In this work we address the challenge of the benchmark creationfor (down)trend detection This chapter is structured as follows Sec-tion 62 introduces our model for corpus creation and its fundamen-tal components In Section 63 we present the crowdsourcing proce-dure for the benchmark labeling Section 65 outlines the proposedbaseline for trend detection our experimental setup and outcomesFinally we illustrate the details on the distributed benchmark in Sec-tion 66 and conclude in Section 67

Our contributions within this work are as follows

bull TRENDNERT is the first publicly available benchmark for (down)-trend detection

bull TRENDNERT is based on a collection of more than one milliondocuments It is among the largest that has been used for trenddetection and therefore offers a realistic setting for developingtrend detection algorithms

bull TRENDNERT addresses the task of detecting both trends anddowntrends To the best of our knowledge the task of down-trend detection has not been addressed before

1The name is a concatenation of trend and its ananym dnert Because it supportsthe evaluation of both trends and downtrends

2httpsdoiorg1017605OSFIODHZCT3httpsapisemanticscholarorgcorpus

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 73

62 corpus creation

In this section we explain the methodology we used to create theTRENDNERT benchmark

621 Underlying Data

We employ a corpus provided by Semantic Scholar4 It includes morethan one million papers published mostly between 2000 and 2015

in about 5000 computer science journals and conference proceedingsEvery record in the collection consists of a title key phrases abstractfull text and metadata

622 Stratification

The distribution of documents over time in the entire original datasetis skewed (see Figure 11) We discovered that clustering the entire col-lection as well as a random sample produces corrupt results becausemore weight is given to later years than to earlier years Thereforewe first generated a stratified sample of the original document collec-tion To this end we randomly pick 10 000 documents for each yearbetween 2000 and 2016 for an overall sample size of 160 000

Figure 11 Overall distribution of papers in the entire dataset Years 1975 to2016

4httpswwwsemanticscholarorg

[ September 4 2020 at 1017 ndash classicthesis ]

74 benchmark for scientific trend detection

623 Document representations amp Clustering

As we mentioned this workrsquos primary focus was the creation of abenchmark not the development of new algorithms or models fortrend detection Therefore for our experiments we have selected analgorithm based on k-means clustering as a simple baseline methodfor trend detection

In conventional document clustering documents are usually repre-sented as bag-of-words (BOW) feature vectors (Manning Raghavanand Schuumltze 2008) Those however have a major weakness theyneglect the semantics of words (Le and Mikolov 2014) Still the re-cent work in representation learning domain and particularly doc2vecndash the method proposed by Le and Mikolov (2014) ndash can provide rep-resentations to document clustering that overcome this weakness

The doc2vec5 algorithm is inspired by techniques for learning wordvectors (Mikolov et al 2013) and is capable of capturing semantic reg-ularities in textual collections This is an unsupervised approach forlearning continuous distributed vector representations for text (ieparagraphs documents) This approach maps documents into a vec-tor space such that semantically related documents are assigned sim-ilar vector representations (eg an article about ldquogenomicsrdquo is closerto an article about ldquogene expressionrdquo than to an article about ldquofuzzysetsrdquo) Formally each paragraph is mapped to the individual vectorrepresented by a column in matrix D and every word is also mappedto the individual vector represented by a column in matrix W Theparagraph vector and word vectors are concatenated to predict thenext word in a context (Le and Mikolov 2014) Other work has al-ready successfully employed this type of document representationsfor topic modeling combining them with both Latent Dirichlet Allo-cation (LDA) and clustering approaches (Curiskis et al 2019 DiengRuiz and Blei 2019 Moody 2016 Xie and Xing 2013)

In our work we run doc2vec on the stratified sample of 160 000papers and represent each document as a length-normalized vector(ie document embedding) These vectors are then clustered intok = 1000 clusters using the scikit-learn6 implementation of k-means(MacQueen 1967) with default parameters The combination of doc-ument representations with a clustering algorithm is conceptuallysimple and interpretable Also the comprehensive comparative eval-uation of topic modeling methods utilizing document embeddingsperformed by Curiskis et al (2019) showed that doc2vec feature repre-sentations with k-means clustering outperform several other methods7

on three evaluation measures (Normalized Mutual Information Ad-justed Mutual Information and Adjusted Rand Index)

5We use the implementation provided by Gensim httpsradimrehurekcom

gensimmodelsdoc2vechtml6httpsscikit-learnorgstable7hierarchical clustering k-medoids NMF and LDA

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 75

We run 10 trials of stratification and clustering resulting in 10 dif-ferent clusterings We do this to protect against the variability ofclustering and because we do not want to rely on a single clusteringfor proposing (down)trends for the benchmark

624 Trend amp Downtrend Estimation

Recalling our definitions of trend and downtrend A trend (resp down-trend) is defined as a research topic that has a strongly increasing(resp decreasing) number of publications in a particular time inter-val

Linear trend estimation is a widely used technique in predictive(time-series) analysis that proves statements about tendencies in thedata by correlating the measurements to the times at which they oc-curred (Hess Iyer and Malm 2001) Considering the definition oftrend (resp downtrend) and the main concept of linear trend esti-mation models we utilize the linear regression to identify trend anddowntrend candidates in resulting clustering sets

Specifically we count the number of documents per year for a clus-ter and then estimate the parameters of the best fitting line (ie slope)Clusters are then ranked according to the resulting slope and then = 150 clusters with the largest (resp smallest) slope are selectedas trend (resp downtrend) candidates Thus our definition of a single-clustering (down)trend candidate is a cluster with extreme ascendingor descending slope Figure 12 and Figure 13 give two examples of(down)trend candidates

625 (Down)trend Candidates Validation

As detailed in Section 623 we run ten series of clustering result-ing in ten discrete clustering sets We then estimated the (down)trendcandidates according to the procedure described in Section 624 Con-sequently for each of the ten clustering sets we obtained 150 trendand 150 downtrend candidates

Nonetheless for the benchmark among all the (down)trend can-didates across the ten clustering sets we want to sort out only thosethat are consistent overall sets To obtain consistent (down)trend candi-dates we apply a Jaccard coefficient metric that is used for measuringthe similarity and diversity of sample sets We define an equivalencerelation simR two candidates (ie clusters) are equivalent if their Jac-card coefficient is gt τsim where τsim = 05 Following this notion wecompute the equivalence of (down)trend candidates across the tensets

Finally we define an equivalence class as an equivalence class trendcandidate (resp equivalence class downtrend candidate) if it contains trend(resp downtrends) candidates in at least half of the clusterings This

[ September 4 2020 at 1017 ndash classicthesis ]

76 benchmark for scientific trend detection

Figure 12 A trend candidate (positive slope) Topic Sentiment and EmotionAnalysis (using Social Media Channels)

Figure 13 A downtrend candidate (negative slope) Topic XML

[ September 4 2020 at 1017 ndash classicthesis ]

63 benchmark annotation 77

procedure gave us in total 110 equivalence class trend candidates and107 equivalence class downtrend candidates that we then annotatefor our benchmark8

63 benchmark annotation

To annotate the equivalence class (down)trend candidates obtainedfrom our model we used the Figure-Eight9 platform It provides anintegrated quality-control mechanism and a training phase before theactual annotation (Kolhatkar Zinsmeister and Hirst 2013) whichminimizes the rate of poorly qualified annotators

We design our annotation task as follows A worker is shown adescription of a particular (down)trend candidate along with the list ofpossible (down)trend tags Descriptions are mixed and presented inrandom order The temporal presentation order of candidates is alsoarbitrary The worker is asked to pick from the list with (down)trendtags an item that matches the cluster (resp candidate) best Evenif there are several items that are a possible match the worker mustchoose only one If there is no matching item the worker can selectthe option other Three different workers evaluated each cluster andthe final label is the majority label If there is no majority label wepresent the cluster repeatedly to the workers until there is a majority

The descriptors of candidates are collected through the followingprocedure

bull First we review the documents assigned to a candidate cluster

bull Then we extract keywords and titles of the papers attributed tothe cluster Semantic Scholar provides keywords and titles as apart of the metadata

bull Finally we compute the top 15 most frequent keywords andrandomly select 15 titles These keywords and titles are thedescriptor of the cluster candidate presented to crowd-workers

We create the (down)trend tags based on keywords titles and meta-data such as publication venue The cluster candidates were manu-ally labeled by graduate employees (ie domain experts in the com-puter science domain) considering the abovementioned items

631 Inter-annotator Agreement

To ensure the quality of obtained annotations we computed the InterAnnotator Agreement (IAA) - a measure of how well two (or more)

8Note that the overall number of 217 refers to the number of the (down)trendcandidates and not to the number of documents Each candidate contains hundredsof documents

9Formerly called CrowdFlower

[ September 4 2020 at 1017 ndash classicthesis ]

78 benchmark for scientific trend detection

annotators can make the same decision for a particular category Todo that we used the following metrics

krippendorffrsquos α is a reliability coefficient developed to measurethe agreement among observers or measuring instruments draw-ing distinctions among typically unstructured phenomena orassign computable values to them (Krippendorff 2011) Wechoose Krippendorffrsquos α as an inter-annotator agreement mea-sure because it ndash unlike other specialized coefficients ndash is a gen-eralization of several reliability criteria and applies to situationslike ours where we have more than two annotators and a givenannotator only labels a subset of the data (ie we have to dealwith missing values)

Krippendorff (2011) has proposed several metric variations eachof which is associated with a specific set of weights We useKrippendorffrsquos α considering any number of observers and miss-ing data In this case Krippendorffrsquos α for the overall numberof 217 annotated documents is α = 0798 According to Landisand Koch (1977) this value corresponds to a substantial level ofagreement

figure-eight agreement rate Figure-Eight10 provides an agree-ment score c for each annotated unit u which is based onthe majority vote of the trusted workers The score has beenproven to perform well compared to the classic metrics (Kol-hatkar Zinsmeister and Hirst 2013) We computed the averageC for the 217 annotated documents C is 0799

Since both Krippendorffrsquos α and internal Figure Eight IAA are ratherhigh we consider the obtained annotations for the benchmark to beof good quality

632 Gold Trends

We compiled a list of computer science trends for the years 2016based on the analysis of the twelve survey publications (Ankerholz2016 Augustin 2016 Brooks 2016 Frot 2016 Harriet 2015 IEEEComputer Society 2016 Markov 2015 Meyerson and Mariette 2016Nessma 2015 Reese 2015 Rivera 2016 Zaino 2016) For this yearwe identified a total of 31 trends see Table 20

Since there is a variation as to the name of a specific trend wecreated a unique name for each of them Out of the 31 trends 28(90 3) are instantiated as trends by our model that we confirmedin the crowdsourcing procedure11 We searched for the three missingtrends using a variety of techniques (ie inspection of non-trendsdowntrends random browsing of documents not assigned to trend

10Former CrowdFlower platform11The author verified this based on her judgment of equivalence of one of the 31

gold trend names in the literature with one of the trend tags

[ September 4 2020 at 1017 ndash classicthesis ]

64 evaluation measure for trend detection 79

candidates and keyword searches) Two of the missing trends donot seem to occur in the collection at all ldquoHuman Augmentationrdquo andldquoFood and Water Technologyrdquo The third missing trend ldquoCryptocurrencyand Cryptographyrdquo does occur in the collection but the occurrencerate is very small (ca 001) We do not consider these three trendsin the rest of the paper

Computer Science Trend Computer Science Trend

Autonomous Agents and Systems Natural Language Processing

Autonomous Vehicles Open Source

Big Data Analytics Privacy

Bioinformatics and (Bio) Neuroscience Quantum Computing

Biometrics and Personal Identification Recommender Systems and Social Networks

Cloud Computing and Software as a Service Reinforcement Learning

Cryptocurrency and Cryptographylowast Renewable Energy

Cyber Security Robotics

E-business Semantic Web

Food and Water Technologylowast Sentiment and Emotion Analysis

Game-based Learning Smart Cities

Games and (Virtual) Augmented Reality Supply Chains and RFIDs

Human Augmentationlowast Technology for Climate Change

MachineDeep Learning Transportation and Energy

Medical Advances and DNA Computing Wearables

Mobile Computing

Table 20 31 areas identified as computer science trends for year 2016 in themedia 28 of these trends are instantiated by our model as welland thus covered in our benchmark Topics marked with asterisk(lowast) were not found in our underlying data collection

In summary based on our analysis of meta-literature we identified28 gold trends that also occur in our collection We will use them asthe gold standard that trend detection algorithms should aim to discover

64 evaluation measure for trend detection

Whether something is a trend is a graded concept For this reasonwe adopt an evaluation measure based on a ranking of candidatesspecifically Mean Average Precision (MAP) The score denotes theaverage of the precision values of ranks of correctly identified andnon-redundant trends

We approach trend detection as computing a ranked list of sets ofdocuments where each set is a trend candidate We refer to docu-ments as trendy downtrendy and flat depending on which class theyare assigned to in our benchmark We consider a trend candidate

[ September 4 2020 at 1017 ndash classicthesis ]

80 benchmark for scientific trend detection

c as a correct recognition of gold trend t if it satisfies the followingrequirements

bull c is trendy

bull t is the largest trend in c

bull |tcap c||c| gt ρ where ρ is a coverage parameter

bull t was not earlier recognized in the list

These criteria give us a true positive (ie recognition of a gold trend)or false positive (otherwise) for every position of the ranked list anda precision score for each trend Precision for a trend that was notobserved is 0 We then compute MAP see Table 21 for an example

Gold Trends Gold Downtrends

Cloud Computing Compilers

Bioinformatics Petri Nets

Sentiment Analysis Fuzzy Sets

Privacy Routing Protocols

Trend Candidate |tcap c||c| tpfp P

c1 Cloud Computing 060 tp 100

c2 Privacy 020 fp 050

c3 Cloud Computing 070 fp 033

c4 Bioinformatics 055 tp 050

c5 Sentiment Analysis 095 tp 060

c6 Sentiment Analysis 059 fp 050

c7 Bioinformatics 041 fp 043

c8 Compilers 060 fp 038

c9 Petri Nets 080 fp 033

c10 Privacy 033 fp 030

Table 21 Example of proposed MAP evaluation (with ρ = 05) MAP is theaverage of the precision values 10 (Cloud Computing) 05 (Bioin-formatics) 06 (Sentiment Analysis) and 00 (Privacy) ie MAP =214 = 0525 True positive tp False positive fp Precision P

65 trend detection baselines

In this section we investigate the impact of four different configura-tion choices on the performance of trend detection by way of ablation

[ September 4 2020 at 1017 ndash classicthesis ]

65 trend detection baselines 81

651 Configurations

abstracts vs full texts The underlying data collection containsboth abstracts and full texts12 Our initial expectation was thatthe full text of a document comprises more information thanthe abstract and it should be a more reliable basis for trend de-tection This is one of the configurations we test in the ablationWe employ our proposed method on full text (solely documenttexts without abstracts) and abstract (only abstract texts) collec-tion separately and observe the results

stratification Due to the uneven distribution of documents overthe years the last years (with increased volume of publications)may get too much impact on results compared to earlier yearsTo investigate this we conduct an ablation experiment wherewe compare (i) randomly sampled 160k documents from theentire collection and (ii) stratified sampling In stratified sam-pling we select 10k documents for each of the years 2000 ndash 2015resulting in a sampled collection of 160k

length Lt of interval Clusters are ranked by trendiness and theresulting ranked list is evaluated To measure the trendiness orgrowth of topics over time we fit a line to an interval (ini|i0 6i lt i0 + Lt by linear regression where Lt is the length of theinterval i is one of the 16 years (2000 2015) and ni is thenumber of documents that were assigned to cluster c in thatyear As a simple default baseline we apply the regression to ahalf of an entire interval ie Lt = 8 There are nine such inter-vals in our 16-year period As the final measure of growth forthe cluster we take the maximal of the nine individual slopesTo determine how much this configuration choice affects our re-sults we also test a linear regression over the entire time spanie Lt = 16 In this case there is a single interval

clustering method We consider Gaussian Mixture Models (GMM)and k-means We use GMM with spherical type of covariancewhere each component has the same variance For both weproceed as described in Section 623

652 Experimental Setup and Results

We conducted experiments on the Semantic Scholar corpus and eval-uated a ranked list of trend candidates against the benchmark con-sidering the configurations mentioned above We adopted Mean Av-erage Precision (MAP) as the principal evaluation measure As a sec-ondary evaluation measure we computed recall at 50 (R50) whichestimates the percentage of gold trends found in the list of 50 highest-ranked trend candidates We found that the configuration (0) in Ta-

12Semantic Scholar provided the underlying dataset at the beginning of our workin 2016

[ September 4 2020 at 1017 ndash classicthesis ]

82 benchmark for scientific trend detection

ble 22 works best We then conducted ablation experiments to dis-cover the importance of the configuration choices

(0) (1) (2) (3) (4)

document part A F A A A

stratification yes yes no yes yes

Lt 8 8 8 16 8

clustering GMMsph GMMsph GMMsph GMMsph kmicro

MAP 36 (03) 07 (19) 30 (03) 32 (04) 34 (21)

R50 avg 50 (012) 25 (018) 50 (016) 46 (018) 53(014)

R50 max 61 37 53 61 61

Table 22 Ablation results Standard deviations are in parentheses GMMclustering performed with the spherical (sph) type of covarianceK-means clustering is denoted as kmicro in the ablation table

comparing (0) and (1) 13 we observed that abstracts (A) are amore reliable representation for our trend detection baselinethan full texts (F) The possible explanation for that observa-tion is that an abstract is a summary of a scientific paper thatcovers only the central points and is semantically very conciseand rich In opposite the full text contains numerous parts thatare secondary (ie future work related work) and thus mayskew the documentrsquos overall meaning

comparing (0) and (2) we recognized that stratification (yes) im-proves results compared to the setting with non-stratified (no)randomly sampled data We would expect the effect of stratifi-cation to be even more substantial for collections in which thedistribution of documents over time is more skewed than in our

comparing (0) and (3) the length of the interval is a vital config-uration choice As we assumed the 16 year interval is ratherlong and 8 year interval could be a better choice primarily ifone aims to find short-term trends

comparing (0) and (4) we recognized that topics we obtain fromk-means have a similar nature to those from GMM HoweverGMM that take variance and covariance into account and esti-mate a soft assignment still perform slightly better

66 benchmark description

Below we report the contents of the TRENDNERT benchmark14

1 Two records containing information on documents One filewith the documents assigned to (down)trends and the otherfile with documents assigned to flat topics

13Numbers (0) (1) (2) (3) (4) are the configuration choices in the ablation table14httpsdoiorg1017605OSFIODHZCT

[ September 4 2020 at 1017 ndash classicthesis ]

67 summary 83

2 A script to produce document hash codes

3 A file to map every hash code to internal IDs in the benchmark

4 A script to compute MAP (default setting is ρ = 25)

5 READMEmd a file with guidance notes

Every document in the benchmark contains the following informa-tion

bull Paper ID internal ID of documents assigned to each documentin the original Semantic Scholar collection

bull Cluster ID unique ID of meta-cluster where the document wasobserved

bull LabelTag Name of the gold (down)trend assigned by crowd-workers (eg Cyber Security Fuzzy Sets and Systems)

bull Type of (down)trend candidate trend (T ) downtrend (D) or flattopic (F)

bull Hash ID15 of each paper from the original Semantic Scholarcollection

67 summary

Emerging trend detection is a promising task for Intelligent ProcessAutomation (IPA) that allows companies and funding agencies to au-tomatically analyze extensive textual collections in order to detectemerging fields and forecast the future employing machine learningalgorithms Thus the availability of publicly available benchmarksfor trend detection is of significant importance as only thereby canone evaluate the performance and robustness of the techniques em-ployed

Within this work we release TRENDNERT ndash the first publicly availablebenchmark for (down)trend detection that offers a realistic setting fordeveloping trend detection algorithms TRENDNERT also supportsdowntrend detection ndash a significant problem that was not addressedbefore We also present several experimental findings on trend detec-tion First stratification improves trend detection if the distribution ofdocuments is skewed over time Third abstract-based trend detectionperforms better than full text if straightforward models are used

68 future work

There are possible directions for future work

bull The presented approach to estimate the trendiness of the topicby means of linear regression is straightforward and can be

15MD5-hash of First Author + Title + Year

[ September 4 2020 at 1017 ndash classicthesis ]

84 benchmark for scientific trend detection

replaced by more sophisticated versions like Kendallrsquos τ cor-relation coefficient or the Least Squares Regression Smoother(LOESS) (Cleveland 1979)

bull It also would be challenging to replace the k-means clustering al-gorithm that we adopted within this work with the LDA modelwhile retaining the document representations as it was donein Dieng Ruiz and Blei 2019 The case to investigate wouldbe then whether the topic modeling outperforms the simpleclustering algorithm in this particular setting And would theresulting system benefit from the topic modeling algorithm

bull Furthermore it would be of interest to conduct more researchon the following What kind of document representation canmake effective use of the information in full texts Recentlyproposed contextualised word embeddings could be a possibledirection for these experiments

[ September 4 2020 at 1017 ndash classicthesis ]

7D I S C U S S I O N A N D F U T U R E W O R K

As we observed in the two parts of this thesis Intelligent Process Au-tomation (IPA) is a challenging research area due to its multifaceted-ness and appliance in a real-world scenario Its main objective is toautomatize time-consuming and routine tasks and free up the knowl-edge worker for more cognitively demanding tasks However thistask addresses many difficulties such as a lack of resources for bothtraining and evaluation of algorithms as well as an insufficient per-formance of machine learning techniques when applied to real-worlddata and practical problems In this thesis we investigated two IPA

applications within the Natural Language Processing (NLP) domainndash conversational agents and predictive analytics ndash as well as addressedsome of the most common issues

bull In the first part of the thesis we have addressed the challenge ofimplementing a robust conversational system for IPA purposeswithin the practical e-learning domain considering the condi-tions of missing training data The primary purpose of the re-sulting system is to perform the onboarding process betweenthe tutors and students This onboarding reduces repetitiveand time-consuming activities while allowing tutors to focuson solely mathematical questions which is the core task withthe most impact on students Besides that by interacting withusers the system augments the resources with structured andlabeled training data that we used to implement learnable dia-logue components

The realization of such a system was associated with severalchallenges Among others were missing structured data andambiguous user-generated content that requires a precise lan-guage understanding Also the system that simultaneouslycommunicates with a large number of users has to maintainadditional meta policies such as fallback and check-up policiesto hold the conversation intelligently

bull Moreover we have shown the rule-based dialogue systemrsquos po-tential extension using transfer learning We utilized a smalldataset of structured dialogues obtained in a trial-run of therule-based system and applied the current state-of-the-art Bidi-rectional Encoder Representations from Transformers (BERT) ar-chitecture to solve the domain-specific tasks Specifically weretrained two of three components in the conversational systemndash Named Entity Recognition (NER) and Dialogue Manager (DM)(ie NAP task) ndash and confirmed the applicability of such algo-

85

[ September 4 2020 at 1017 ndash classicthesis ]

86 discussion and future work

rithms in a real-world low-resource scenario by showing thereasonably high performance of resulting models

bull Last but not least in the second part of the thesis we addressedthe issue of missing publicly available benchmarks for the eval-uation of machine learning algorithms for the trend detectiontask To the best of our knowledge we released the first openbenchmark for this purpose which offers a realistic setting fordeveloping trend detection algorithms Besides that this bench-mark supports downtrend detection ndash a significant problemthat was not sufficiently addressed before We additionally pro-posed the evaluation measure and presented several experimen-tal findings on trend detection

The ultimate objective of Intelligent Process Automation (IPA) shouldbe a creation of robust algorithms in low-resource and domain-specificconditions This is still a challenging task however some of the theexperiments presented in this thesis revealed that this goal could bepotentially achieved by means of Transfer Learning (TL) techniques

[ September 4 2020 at 1017 ndash classicthesis ]

B I B L I O G R A P H Y

Agrawal Amritanshu Wei Fu and Tim Menzies (2018) bdquoWhat iswrong with topic modeling And how to fix it using search-basedsoftware engineeringldquo In Information and Software Technology 98pp 74ndash88 (cit on p 66)

Alan M (1950) bdquoTuringldquo In Computing machinery and intelligenceMind 59236 pp 433ndash460 (cit on p 7)

Alghamdi Rubayyi and Khalid Alfalqi (2015) bdquoA survey of topicmodeling in text miningldquo In Int J Adv Comput Sci Appl(IJACSA)61 (cit on pp 64ndash66)

Ammar Waleed Dirk Groeneveld Chandra Bhagavatula Iz BeltagyMiles Crawford Doug Downey Jason Dunkelberger Ahmed El-gohary Sergey Feldman Vu Ha et al (2018) bdquoConstruction ofthe literature graph in semantic scholarldquo In arXiv preprint arXiv1805 02262 (cit on p 72)

Ankerholz Amber (2016) 2016 Future of Open Source Survey Says OpenSource Is The Modern Architecture httpswwwlinuxcomnews2016-future-open-source-survey-says-open-source-modern-

architecture (cit on p 78)Asooja Kartik Georgeta Bordea Gabriela Vulcu and Paul Buitelaar

(2016) bdquoForecasting emerging trends from scientific literatureldquoIn Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) pp 417ndash420 (cit on p 68)

Augustin Jason (2016) Emergin Science and Technology Trends A Syn-thesis of Leading Forecast httpwwwdefenseinnovationmarket-placemilresources2016_SciTechReport_16June2016pdf (citon p 78)

Blei David M and John D Lafferty (2006) bdquoDynamic topic modelsldquoIn Proceedings of the 23rd international conference on Machine learn-ing ACM pp 113ndash120 (cit on pp 66 67)

Blei David M John D Lafferty et al (2007) bdquoA correlated topic modelof scienceldquo In The Annals of Applied Statistics 11 pp 17ndash35 (citon p 66)

Blei David M Andrew Y Ng and Michael I Jordan (2003) bdquoLatentdirichlet allocationldquo In Journal of machine Learning research 3Janpp 993ndash1022 (cit on p 66)

Bloem Peter (2019) Transformers from Scratch httpwwwpeterbloemnlblogtransformers (cit on p 45)

Bocklisch Tom Joey Faulkner Nick Pawlowski and Alan Nichol(2017) bdquoRasa Open source language understanding and dialoguemanagementldquo In arXiv preprint arXiv171205181 (cit on p 15)

Bolelli Levent Seyda Ertekin and C Giles (2009) bdquoTopic and trenddetection in text collections using latent dirichlet allocationldquo InAdvances in Information Retrieval pp 776ndash780 (cit on p 66)

87

[ September 4 2020 at 1017 ndash classicthesis ]

88 Bibliography

Bolelli Levent Seyda Ertekin Ding Zhou and C Lee Giles (2009)bdquoFinding topic trends in digital librariesldquo In Proceedings of the 9thACMIEEE-CS joint conference on Digital libraries ACM pp 69ndash72

(cit on p 66)Bordes Antoine Y-Lan Boureau and Jason Weston (2016) bdquoLearning

end-to-end goal-oriented dialogldquo In arXiv preprint arXiv160507683(cit on pp 16 17)

Brooks Chuck (2016) 7 Top Tech Trends Impacting Innovators in 2016httpinnovationexcellencecomblog201512267-top-

tech-trends-impacting-innovators-in-2016 (cit on p 78)Burtsev Mikhail Alexander Seliverstov Rafael Airapetyan Mikhail

Arkhipov Dilyara Baymurzina Eugene Botvinovsky NickolayBushkov Olga Gureenkova Andrey Kamenev Vasily Konovalovet al (2018) bdquoDeepPavlov An Open Source Library for Conver-sational AIldquo In (cit on p 15)

Chang Jonathan David M Blei et al (2010) bdquoHierarchical relationalmodels for document networksldquo In The Annals of Applied Statis-tics 41 pp 124ndash150 (cit on p 66)

Chen Hongshen Xiaorui Liu Dawei Yin and Jiliang Tang (2017)bdquoA survey on dialogue systems Recent advances and new fron-tiersldquo In Acm Sigkdd Explorations Newsletter 192 pp 25ndash35 (citon pp 14ndash16)

Chen Lingzhen Alessandro Moschitti Giuseppe Castellucci AndreaFavalli and Raniero Romagnoli (2018) bdquoTransfer Learning for In-dustrial Applications of Named Entity Recognitionldquo In NL4AIAI IA pp 129ndash140 (cit on p 43)

Cleveland William S (1979) bdquoRobust locally weighted regression andsmoothing scatterplotsldquo In Journal of the American statistical asso-ciation 74368 pp 829ndash836 (cit on p 84)

Collobert Ronan Jason Weston Leacuteon Bottou Michael Karlen KorayKavukcuoglu and Pavel Kuksa (2011) bdquoNatural language pro-cessing (almost) from scratchldquo In Journal of machine learning re-search 12Aug pp 2493ndash2537 (cit on p 16)

Cuayaacutehuitl Heriberto Simon Keizer and Oliver Lemon (2015) bdquoStrate-gic dialogue management via deep reinforcement learningldquo InarXiv preprint arXiv151108099 (cit on p 14)

Cui Lei Shaohan Huang Furu Wei Chuanqi Tan Chaoqun Duanand Ming Zhou (2017) bdquoSuperagent A customer service chat-bot for e-commerce websitesldquo In Proceedings of ACL 2017 SystemDemonstrations pp 97ndash102 (cit on p 8)

Curiskis Stephan A Barry Drake Thomas R Osborn and Paul JKennedy (2019) bdquoAn evaluation of document clustering and topicmodelling in two online social networks Twitter and Redditldquo InInformation Processing amp Management (cit on p 74)

Dai Zihang Zhilin Yang Yiming Yang William W Cohen Jaime Car-bonell Quoc V Le and Ruslan Salakhutdinov (2019) bdquoTransformer-xl Attentive language models beyond a fixed-length contextldquo InarXiv preprint arXiv190102860 (cit on pp 43 44)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 89

Daniel Ben (2015) bdquoBig Data and analytics in higher education Op-portunities and challengesldquo In British journal of educational tech-nology 465 pp 904ndash920 (cit on p 22)

Deerwester Scott Susan T Dumais George W Furnas Thomas K Lan-dauer and Richard Harshman (1990) bdquoIndexing by latent seman-tic analysisldquo In Journal of the American society for information sci-ence 416 pp 391ndash407 (cit on p 65)

Deoras Anoop Kaisheng Yao Xiaodong He Li Deng Geoffrey Ger-son Zweig Ruhi Sarikaya Dong Yu Mei-Yuh Hwang and Gre-goire Mesnil (2015) Assignment of semantic labels to a sequence ofwords using neural network architectures US Patent App 14016186

(cit on p 13)Devlin Jacob Ming-Wei Chang Kenton Lee and Kristina Toutanova

(2018) bdquoBert Pre-training of deep bidirectional transformers forlanguage understandingldquo In arXiv preprint arXiv181004805 (citon pp 42ndash45)

Dhingra Bhuwan Lihong Li Xiujun Li Jianfeng Gao Yun-NungChen Faisal Ahmed and Li Deng (2016) bdquoTowards end-to-end re-inforcement learning of dialogue agents for information accessldquoIn arXiv preprint arXiv160900777 (cit on p 16)

Dieng Adji B Francisco JR Ruiz and David M Blei (2019) bdquoTopicModeling in Embedding Spacesldquo In arXiv preprint arXiv190704907(cit on pp 74 84)

Donahue Jeff Yangqing Jia Oriol Vinyals Judy Hoffman Ning ZhangEric Tzeng and Trevor Darrell (2014) bdquoDecaf A deep convolu-tional activation feature for generic visual recognitionldquo In Inter-national conference on machine learning pp 647ndash655 (cit on p 43)

Eric Mihail and Christopher D Manning (2017) bdquoKey-value retrievalnetworks for task-oriented dialogueldquo In arXiv preprint arXiv 170505414 (cit on p 16)

Fang Hao Hao Cheng Maarten Sap Elizabeth Clark Ari HoltzmanYejin Choi Noah A Smith and Mari Ostendorf (2018) bdquoSoundingboard A user-centric and content-driven social chatbotldquo In arXivpreprint arXiv180410202 (cit on p 10)

Friedrich Fabian Jan Mendling and Frank Puhlmann (2011) bdquoPro-cess model generation from natural language textldquo In Interna-tional Conference on Advanced Information Systems Engineering pp 482

ndash496 (cit on p 2)Frot Mathilde (2016) 5 Trends in Computer Science Research https

wwwtopuniversitiescomcoursescomputer-science-info-

rmation-systems5-trends-computer-science-research (citon p 78)

Glasmachers Tobias (2017) bdquoLimits of end-to-end learningldquo In arXivpreprint arXiv170408305 (cit on pp 16 17)

Goddeau David Helen Meng Joseph Polifroni Stephanie Seneffand Senis Busayapongchai (1996) bdquoA form-based dialogue man-ager for spoken language applicationsldquo In Proceeding of FourthInternational Conference on Spoken Language Processing ICSLPrsquo96Vol 2 IEEE pp 701ndash704 (cit on p 14)

[ September 4 2020 at 1017 ndash classicthesis ]

90 Bibliography

Gohr Andre Alexander Hinneburg Rene Schult and Myra Spiliopou-lou (2009) bdquoTopic evolution in a stream of documentsldquo In Pro-ceedings of the 2009 SIAM International Conference on Data MiningSIAM pp 859ndash870 (cit on p 65)

Gollapalli Sujatha Das and Xiaoli Li (2015) bdquoEMNLP versus ACLAnalyzing NLP research over timeldquo In EMNLP pp 2002ndash2006

(cit on p 72)Griffiths Thomas L and Mark Steyvers (2004) bdquoFinding scientific top-

icsldquo In Proceedings of the National academy of Sciences 101suppl 1pp 5228ndash5235 (cit on p 65)

Griffiths Thomas L Michael I Jordan Joshua B Tenenbaum andDavid M Blei (2004) bdquoHierarchical topic models and the nestedChinese restaurant processldquo In Advances in neural information pro-cessing systems pp 17ndash24 (cit on p 66)

Hall David Daniel Jurafsky and Christopher D Manning (2008) bdquoStu-dying the history of ideas using topic modelsldquo In Proceedingsof the conference on empirical methods in natural language processingAssociation for Computational Linguistics pp 363ndash371 (cit onp 66)

Harriet Taylor (2015) Privacy will hit tipping point in 2016 httpwwwcnbccom20151109privacy-will-hit-tipping-point-

in-2016html (cit on p 78)He Jiangen and Chaomei Chen (2018) bdquoPredictive Effects of Novelty

Measured by Temporal Embeddings on the Growth of ScientificLiteratureldquo In Frontiers in Research Metrics and Analytics 3 p 9

(cit on pp 64 68 71)He Junxian Zhiting Hu Taylor Berg-Kirkpatrick Ying Huang and

Eric P Xing (2017) bdquoEfficient correlated topic modeling with topicembeddingldquo In Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ACM pp 225ndash233 (cit on pp 66 67)

He Qi Bi Chen Jian Pei Baojun Qiu Prasenjit Mitra and Lee Giles(2009) bdquoDetecting topic evolution in scientific literature How cancitations helpldquo In Proceedings of the 18th ACM conference on In-formation and knowledge management ACM pp 957ndash966 (cit onp 68)

Henderson Matthew Blaise Thomson and Jason D Williams (2014)bdquoThe second dialog state tracking challengeldquo In Proceedings of the15th Annual Meeting of the Special Interest Group on Discourse andDialogue (SIGDIAL) pp 263ndash272 (cit on p 15)

Henderson Matthew Blaise Thomson and Steve Young (2013) bdquoDeepneural network approach for the dialog state tracking challengeldquoIn Proceedings of the SIGDIAL 2013 Conference pp 467ndash471 (cit onp 14)

Hess Ann Hari Iyer and William Malm (2001) bdquoLinear trend analy-sis a comparison of methodsldquo In Atmospheric Environment 3530Visibility Aerosol and Atmospheric Optics pp 5211 ndash5222 issn1352-2310 doi httpsdoiorg101016S1352- 2310(01)00342-9 url httpwwwsciencedirectcomsciencearticlepiiS1352231001003429 (cit on p 75)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 91

Hofmann Thomas (1999) bdquoProbabilistic latent semantic analysisldquo InProceedings of the Fifteenth conference on Uncertainty in artificial in-telligence Morgan Kaufmann Publishers Inc pp 289ndash296 (cit onp 65)

Honnibal Matthew and Ines Montani (2017) bdquospacy 2 Natural lan-guage understanding with bloom embeddings convolutional neu-ral networks and incremental parsingldquo In To appear 7 (cit onp 15)

Howard Jeremy and Sebastian Ruder (2018) bdquoUniversal languagemodel fine-tuning for text classificationldquo In arXiv preprint arXiv180106146 (cit on p 44)

IEEE Computer Society (2016) Top 9 Computing Technology Trends for2016 httpswwwscientificcomputingcomnews201601top-9-computing-technology-trends-2016 (cit on p 78)

Ivancic Lucija Dalia Suša Vugec and Vesna Bosilj Vukšic (2019) bdquoRobo-tic Process Automation Systematic Literature Reviewldquo In In-ternational Conference on Business Process Management Springerpp 280ndash295 (cit on pp 1 2)

Jain Mohit Pratyush Kumar Ramachandra Kota and Shwetak NPatel (2018) bdquoEvaluating and informing the design of chatbotsldquoIn Proceedings of the 2018 Designing Interactive Systems ConferenceACM pp 895ndash906 (cit on p 8)

Khanpour Hamed Nishitha Guntakandla and Rodney Nielsen (2016)bdquoDialogue act classification in domain-independent conversationsusing a deep recurrent neural networkldquo In Proceedings of COL-ING 2016 the 26th International Conference on Computational Lin-guistics Technical Papers pp 2012ndash2021 (cit on p 13)

Kim Young-Bum Karl Stratos Ruhi Sarikaya and Minwoo Jeong(2015) bdquoNew transfer learning techniques for disparate label setsldquoIn Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference onNatural Language Processing (Volume 1 Long Papers) pp 473ndash482

(cit on p 43)Kolhatkar Varada Heike Zinsmeister and Graeme Hirst (2013) bdquoAn-

notating anaphoric shell nouns with their antecedentsldquo In Pro-ceedings of the 7th Linguistic Annotation Workshop and Interoperabil-ity with Discourse pp 112ndash121 (cit on pp 77 78)

Kontostathis April Leon M Galitsky William M Pottenger SomaRoy and Daniel J Phelps (2004) bdquoA survey of emerging trend de-tection in textual data miningldquo In Survey of text mining Springerpp 185ndash224 (cit on p 63)

Krippendorff Klaus (2011) bdquoComputing Krippendorffrsquos alpha- relia-bilityldquo In Annenberg School for Communication (ASC) DepartmentalPapers 43 (cit on p 78)

Kurata Gakuto Bing Xiang Bowen Zhou and Mo Yu (2016) bdquoLever-aging sentence-level information with encoder lstm for semanticslot fillingldquo In arXiv preprint arXiv160101530 (cit on p 13)

Landis J Richard and Gary G Koch (1977) bdquoThe measurement ofobserver agreement for categorical dataldquo In biometrics pp 159ndash174 (cit on p 78)

[ September 4 2020 at 1017 ndash classicthesis ]

92 Bibliography

Le Minh-Hoang Tu-Bao Ho and Yoshiteru Nakamori (2005) bdquoDe-tecting emerging trends from scientific corporaldquo In InternationalJournal of Knowledge and Systems Sciences 22 pp 53ndash59 (cit onp 68)

Le Quoc V and Tomas Mikolov (2014) bdquoDistributed Representationsof Sentences and Documentsldquo In ICML Vol 14 pp 1188ndash1196

(cit on p 74)Lee Ji Young and Franck Dernoncourt (2016) bdquoSequential short-text

classification with recurrent and convolutional neural networksldquoIn arXiv preprint arXiv160303827 (cit on p 13)

Leopold Henrik Han van der Aa and Hajo A Reijers (2018) bdquoIden-tifying candidate tasks for robotic process automation in textualprocess descriptionsldquo In Enterprise Business-Process and Informa-tion Systems Modeling Springer pp 67ndash81 (cit on p 2)

Levenshtein Vladimir I (1966) bdquoBinary codes capable of correctingdeletions insertions and reversalsldquo In Soviet physics doklady 8pp 707ndash710 (cit on p 25)

Liao Vera Masud Hussain Praveen Chandar Matthew Davis Yasa-man Khazaeni Marco Patricio Crasso Dakuo Wang Michael Mu-ller N Sadat Shami Werner Geyer et al (2018) bdquoAll Work andNo Playldquo In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems ACM p 3 (cit on p 8)

Lowe Ryan Thomas Nissan Pow Iulian Vlad Serban Laurent Char-lin Chia-Wei Liu and Joelle Pineau (2017) bdquoTraining end-to-enddialogue systems with the ubuntu dialogue corpusldquo In Dialogueamp Discourse 81 pp 31ndash65 (cit on p 22)

Luger Ewa and Abigail Sellen (2016) bdquoLike having a really bad PAthe gulf between user expectation and experience of conversa-tional agentsldquo In Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems ACM pp 5286ndash5297 (cit on p 8)

MacQueen James (1967) bdquoSome methods for classification and analy-sis of multivariate observationsldquo In Proceedings of the fifth Berkeleysymposium on mathematical statistics and probability Vol 1 OaklandCA USA pp 281ndash297 (cit on p 74)

Manning Christopher D Prabhakar Raghavan and Hinrich Schuumltze(2008) Introduction to Information Retrieval Web publication at in-formationretrievalorg Cambridge University Press (cit on p 74)

Markov Igor (2015) 13 Of 2015rsquos Hottest Topics In Computer ScienceResearch httpswwwforbescomsitesquora2015042213-of-2015s-hottest-topics-in-computer-science-research

fb0d46b1e88d (cit on p 78)Martin James H and Daniel Jurafsky (2009) Speech and language pro-

cessing An introduction to natural language processing computationallinguistics and speech recognition PearsonPrentice Hall Upper Sad-dle River

Mei Qiaozhu Deng Cai Duo Zhang and ChengXiang Zhai (2008)bdquoTopic modeling with network regularizationldquo In Proceedings ofthe 17th international conference on World Wide Web ACM pp 101ndash110 (cit on p 65)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 93

Meyerson Bernard and DiChristina Mariette (2016) These are the top10 emerging technologies of 2016 httpwww3weforumorgdocsGAC16_Top10_Emerging_Technologies_2016_reportpdf (cit onp 78)

Mikolov Tomas Kai Chen Greg Corrado and Jeffrey Dean (2013)bdquoEfficient estimation of word representations in vector spaceldquo InarXiv preprint arXiv13013781 (cit on p 74)

Miller Alexander Adam Fisch Jesse Dodge Amir-Hossein KarimiAntoine Bordes and Jason Weston (2016) bdquoKey-value memorynetworks for directly reading documentsldquo In (cit on p 16)

Mohanty Soumendra and Sachin Vyas (2018) bdquoHow to Compete inthe Age of Artificial Intelligenceldquo In (cit on pp III V 1 2)

Moiseeva Alena and Hinrich Schuumltze (2020) bdquoTRENDNERT A Bench-mark for Trend and Downtrend Detection in a Scientific DomainldquoIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial In-telligence (cit on p 71)

Moiseeva Alena Dietrich Trautmann Michael Heimann and Hin-rich Schuumltze (2020) bdquoMultipurpose Intelligent Process Automa-tion via Conversational Assistantldquo In arXiv preprint arXiv200102284 (cit on pp 21 41)

Monaghan Fergal Georgeta Bordea Krystian Samp and Paul Buite-laar (2010) bdquoExploring your research Sprinkling some saffron onsemantic web dog foodldquo In Semantic Web Challenge at the Inter-national Semantic Web Conference Vol 117 Citeseer pp 420ndash435

(cit on p 68)Monostori Laacuteszloacute Andraacutes Maacuterkus Hendrik Van Brussel and E West-

kaumlmpfer (1996) bdquoMachine learning approaches to manufactur-ingldquo In CIRP annals 452 pp 675ndash712 (cit on p 15)

Moody Christopher E (2016) bdquoMixing dirichlet topic models andword embeddings to make lda2vecldquo In arXiv preprint arXiv160502019 (cit on p 74)

Mou Lili Zhao Meng Rui Yan Ge Li Yan Xu Lu Zhang and ZhiJin (2016) bdquoHow transferable are neural networks in nlp applica-tionsldquo In arXiv preprint arXiv160306111 (cit on p 43)

Mrkšic Nikola Diarmuid O Seacuteaghdha Tsung-Hsien Wen Blaise Thom-son and Steve Young (2016) bdquoNeural belief tracker Data-drivendialogue state trackingldquo In arXiv preprint arXiv160603777 (citon p 14)

Nessma Joussef (2015) Top 10 Hottest Research Topics in Computer Sci-ence http www pouted com top - 10 - hottest - research -

topics-in-computer-science (cit on p 78)Nie Binling and Shouqian Sun (2017) bdquoUsing text mining techniques

to identify research trends A case study of design researchldquo InApplied Sciences 74 p 401 (cit on p 68)

Pan Sinno Jialin and Qiang Yang (2009) bdquoA survey on transfer learn-ingldquo In IEEE Transactions on knowledge and data engineering 2210pp 1345ndash1359 (cit on p 41)

Qu Lizhen Gabriela Ferraro Liyuan Zhou Weiwei Hou and Tim-othy Baldwin (2016) bdquoNamed entity recognition for novel types

[ September 4 2020 at 1017 ndash classicthesis ]

94 Bibliography

by transfer learningldquo In arXiv preprint arXiv161009914 (cit onp 43)

Radford Alec Jeffrey Wu Rewon Child David Luan Dario Amodeiand Ilya Sutskever (2019) bdquoLanguage models are unsupervisedmultitask learnersldquo In OpenAI Blog 18 (cit on p 44)

Reese Hope (2015) 7 trends for artificial intelligence in 2016 rsquoLike 2015on steroidsrsquo httpwwwtechrepubliccomarticle7-trends-for-artificial-intelligence-in-2016-like-2015-on-steroids

(cit on p 78)Rivera Maricel (2016) Is Digital Game-Based Learning The Future Of

Learning httpselearningindustrycomdigital-game-based-

learning-future (cit on p 78)Rotolo Daniele Diana Hicks and Ben R Martin (2015) bdquoWhat is an

emerging technologyldquo In Research Policy 4410 pp 1827ndash1843

(cit on p 64)Salatino Angelo (2015) bdquoEarly detection and forecasting of research

trendsldquo In (cit on pp 63 66)Serban Iulian V Alessandro Sordoni Yoshua Bengio Aaron Courville

and Joelle Pineau (2016) bdquoBuilding end-to-end dialogue systemsusing generative hierarchical neural network modelsldquo In Thirti-eth AAAI Conference on Artificial Intelligence (cit on p 11)

Sharif Razavian Ali Hossein Azizpour Josephine Sullivan and Ste-fan Carlsson (2014) bdquoCNN features off-the-shelf an astoundingbaseline for recognitionldquo In Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops pp 806ndash813 (citon p 43)

Shibata Naoki Yuya Kajikawa and Yoshiyuki Takeda (2009) bdquoCom-parative study on methods of detecting research fronts using dif-ferent types of citationldquo In Journal of the American Society for in-formation Science and Technology 603 pp 571ndash580 (cit on p 68)

Shibata Naoki Yuya Kajikawa Yoshiyuki Takeda and KatsumoriMatsushima (2008) bdquoDetecting emerging research fronts basedon topological measures in citation networks of scientific publica-tionsldquo In Technovation 2811 pp 758ndash775 (cit on p 68)

Small Henry (2006) bdquoTracking and predicting growth areas in sci-enceldquo In Scientometrics 683 pp 595ndash610 (cit on p 68)

Small Henry Kevin W Boyack and Richard Klavans (2014) bdquoIden-tifying emerging topics in science and technologyldquo In ResearchPolicy 438 pp 1450ndash1467 (cit on p 64)

Su Pei-Hao Nikola Mrkšic Intildeigo Casanueva and Ivan Vulic (2018)bdquoDeep Learning for Conversational AIldquo In Proceedings of the 2018Conference of the North American Chapter of the Association for Com-putational Linguistics Tutorial Abstracts pp 27ndash32 (cit on p 12)

Sutskever Ilya Oriol Vinyals and Quoc V Le (2014) bdquoSequence tosequence learning with neural networksldquo In Advances in neuralinformation processing systems pp 3104ndash3112 (cit on pp 16 17)

Trautmann Dietrich Johannes Daxenberger Christian Stab HinrichSchuumltze and Iryna Gurevych (2019) bdquoRobust Argument Unit Recog-nition and Classificationldquo In arXiv preprint arXiv190409688 (citon p 41)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 95

Tu Yi-Ning and Jia-Lang Seng (2012) bdquoIndices of novelty for emerg-ing topic detectionldquo In Information processing amp management 482pp 303ndash325 (cit on p 64)

Turing Alan (1949) London Times letter to the editor (cit on p 7)Ultes Stefan Lina Barahona Pei-Hao Su David Vandyke Dongho

Kim Inigo Casanueva Paweł Budzianowski Nikola Mrkšic Tsu-ng-Hsien Wen et al (2017) bdquoPydial A multi-domain statistical di-alogue system toolkitldquo In Proceedings of ACL 2017 System Demon-strations pp 73ndash78 (cit on p 15)

Vaswani Ashish Noam Shazeer Niki Parmar Jakob Uszkoreit LlionJones Aidan N Gomez Łukasz Kaiser and Illia Polosukhin (2017)bdquoAttention is all you needldquo In Advances in neural information pro-cessing systems pp 5998ndash6008 (cit on p 45)

Vincent Pascal Hugo Larochelle Yoshua Bengio and Pierre-AntoineManzagol (2008) bdquoExtracting and composing robust features withdenoising autoencodersldquo In Proceedings of the 25th internationalconference on Machine learning ACM pp 1096ndash1103 (cit on p 44)

Vodolaacuten Miroslav Rudolf Kadlec and Jan Kleindienst (2015) bdquoHy-brid dialog state trackerldquo In arXiv preprint arXiv151003710 (citon p 14)

Wang Alex Amanpreet Singh Julian Michael Felix Hill Omer Levyand Samuel R Bowman (2018) bdquoGlue A multi-task benchmarkand analysis platform for natural language understandingldquo InarXiv preprint arXiv180407461 (cit on p 44)

Wang Chong and David M Blei (2011) bdquoCollaborative topic modelingfor recommending scientific articlesldquo In Proceedings of the 17thACM SIGKDD international conference on Knowledge discovery anddata mining ACM pp 448ndash456 (cit on p 66)

Wang Xuerui and Andrew McCallum (2006) bdquoTopics over time anon-Markov continuous-time model of topical trendsldquo In Pro-ceedings of the 12th ACM SIGKDD international conference on Knowl-edge discovery and data mining ACM pp 424ndash433 (cit on pp 65ndash67)

Wang Zhuoran and Oliver Lemon (2013) bdquoA simple and generic be-lief tracking mechanism for the dialog state tracking challengeOn the believability of observed informationldquo In Proceedings ofthe SIGDIAL 2013 Conference pp 423ndash432 (cit on p 14)

Weizenbaum Joseph et al (1966) bdquoELIZAmdasha computer program forthe study of natural language communication between man andmachineldquo In Communications of the ACM 91 pp 36ndash45 (cit onp 7)

Wen Tsung-Hsien Milica Gasic Nikola Mrksic Pei-Hao Su DavidVandyke and Steve Young (2015) bdquoSemantically conditioned lstm-based natural language generation for spoken dialogue systemsldquoIn arXiv preprint arXiv150801745 (cit on p 14)

Wen Tsung-Hsien David Vandyke Nikola Mrksic Milica Gasic LinaM Rojas-Barahona Pei-Hao Su Stefan Ultes and Steve Young(2016) bdquoA network-based end-to-end trainable task-oriented dia-logue systemldquo In arXiv preprint arXiv160404562 (cit on pp 1116 17 22)

[ September 4 2020 at 1017 ndash classicthesis ]

96 Bibliography

Werner Alexander Dietrich Trautmann Dongheui Lee and RobertoLampariello (2015) bdquoGeneralization of optimal motion trajecto-ries for bipedal walkingldquo In pp 1571ndash1577 (cit on p 41)

Williams Jason D Kavosh Asadi and Geoffrey Zweig (2017) bdquoHybridcode networks practical and efficient end-to-end dialog controlwith supervised and reinforcement learningldquo In arXiv preprintarXiv170203274 (cit on p 17)

Williams Jason Antoine Raux and Matthew Henderson (2016) bdquoThedialog state tracking challenge series A reviewldquo In Dialogue ampDiscourse 73 pp 4ndash33 (cit on p 12)

Wu Chien-Sheng Richard Socher and Caiming Xiong (2019) bdquoGlobal-to-local Memory Pointer Networks for Task-Oriented DialogueldquoIn arXiv preprint arXiv190104713 (cit on pp 15ndash17)

Wu Yonghui Mike Schuster Zhifeng Chen Quoc V Le Moham-mad Norouzi Wolfgang Macherey Maxim Krikun Yuan CaoQin Gao Klaus Macherey et al (2016) bdquoGooglersquos neural ma-chine translation system Bridging the gap between human andmachine translationldquo In arXiv preprint arXiv160908144 (cit onp 45)

Xie Pengtao and Eric P Xing (2013) bdquoIntegrating Document Cluster-ing and Topic Modelingldquo In Proceedings of the 30th Conference onUncertainty in Artificial Intelligence (cit on p 74)

Yan Zhao Nan Duan Peng Chen Ming Zhou Jianshe Zhou andZhoujun Li (2017) bdquoBuilding task-oriented dialogue systems foronline shoppingldquo In Thirty-First AAAI Conference on Artificial In-telligence (cit on p 14)

Yogatama Dani Michael Heilman Brendan OrsquoConnor Chris DyerBryan R Routledge and Noah A Smith (2011) bdquoPredicting a scien-tific communityrsquos response to an articleldquo In Proceedings of the Con-ference on Empirical Methods in Natural Language Processing Asso-ciation for Computational Linguistics pp 594ndash604 (cit on p 65)

Young Steve Milica Gašic Blaise Thomson and Jason D Williams(2013) bdquoPomdp-based statistical spoken dialog systems A reviewldquoIn Proceedings of the IEEE 1015 pp 1160ndash1179 (cit on p 17)

Zaino Jennifer (2016) 2016 Trends for Semantic Web and Semantic Tech-nologies http www dataversity net 2017 - predictions -

semantic-web-semantic-technologies (cit on p 78)Zhou Hao Minlie Huang et al (2016) bdquoContext-aware natural lan-

guage generation for spoken dialogue systemsldquo In Proceedingsof COLING 2016 the 26th International Conference on ComputationalLinguistics Technical Papers pp 2032ndash2041 (cit on pp 14 15)

Zhu Yukun Ryan Kiros Rich Zemel Ruslan Salakhutdinov RaquelUrtasun Antonio Torralba and Sanja Fidler (2015) bdquoAligningbooks and movies Towards story-like visual explanations by watch-ing movies and reading booksldquo In Proceedings of the IEEE interna-tional conference on computer vision pp 19ndash27 (cit on p 44)

[ September 4 2020 at 1017 ndash classicthesis ]

  • Abstract
  • Abstract
  • Acknowledgements
  • Contents
  • List of Figures
    • List of Figures
      • List of Tables
        • List of Tables
          • Acronyms
            • Acronyms
              • 1 Introduction
                • 11 Outline
                  • E-learning Conversational Assistant
                    • 2 Foundations
                      • 21 Introduction
                      • 22 Conversational Assistants
                        • 221 Taxonomy
                        • 222 Conventional Architecture
                        • 223 Existing Approaches
                          • 23 Target Domain amp Task Definition
                          • 24 Summary
                            • 3 Multipurpose IPA via Conversational Assistant
                              • 31 Introduction
                                • 311 Outline and Contributions
                                  • 32 Model
                                    • 321 OMB+ Design
                                    • 322 Preprocessing
                                    • 323 Natural Language Understanding
                                    • 324 Dialogue Manager
                                    • 325 Meta Policy
                                    • 326 Response Generation
                                      • 33 Evaluation
                                        • 331 Automated Evaluation
                                        • 332 Human Evaluation amp Error Analysis
                                          • 34 Structured Dialogue Acquisition
                                          • 35 Summary
                                          • 36 Future Work
                                            • 4 Deep Learning Re-Implementation of Units
                                              • 41 Introduction
                                                • 411 Outline and Contributions
                                                  • 42 Related Work
                                                  • 43 BERT Bidirectional Encoder Representations from Transformers
                                                  • 44 Data amp Descriptive Statistics
                                                  • 45 Named Entity Recognition
                                                    • 451 Model Settings
                                                      • 46 Next Action Prediction
                                                        • 461 Model Settings
                                                          • 47 Evaluation and Results
                                                          • 48 Error Analysis
                                                          • 49 Summary
                                                          • 410 Future Work
                                                              • Clustering-based Trend Analyzer
                                                                • 5 Foundations
                                                                  • 51 Motivation and Challenges
                                                                  • 52 Detection of Emerging Research Trends
                                                                    • 521 Methods for Topic Modeling
                                                                    • 522 Topic Evolution of Scientific Publications
                                                                      • 53 Summary
                                                                        • 6 Benchmark for Scientific Trend Detection
                                                                          • 61 Motivation
                                                                            • 611 Outline and Contributions
                                                                              • 62 Corpus Creation
                                                                                • 621 Underlying Data
                                                                                • 622 Stratification
                                                                                • 623 Document representations amp Clustering
                                                                                • 624 Trend amp Downtrend Estimation
                                                                                • 625 (Down)trend Candidates Validation
                                                                                  • 63 Benchmark Annotation
                                                                                    • 631 Inter-annotator Agreement
                                                                                    • 632 Gold Trends
                                                                                      • 64 Evaluation Measure for Trend Detection
                                                                                      • 65 Trend Detection Baselines
                                                                                        • 651 Configurations
                                                                                        • 652 Experimental Setup and Results
                                                                                          • 66 Benchmark Description
                                                                                          • 67 Summary
                                                                                          • 68 Future Work
                                                                                            • 7 Discussion and Future Work
                                                                                            • Bibliography
Page 8: Statistical Natural Language Processing Methods for

[ September 4 2020 at 1017 ndash classicthesis ]

C O N T E N T S

List of Figures XIList of Tables XIIAcronyms XIII1 introduction 1

11 Outline 3

i e-learning conversational assistant 5

2 foundations 7

21 Introduction 7

22 Conversational Assistants 9

221 Taxonomy 9

222 Conventional Architecture 12

223 Existing Approaches 15

23 Target Domain amp Task Definition 18

24 Summary 19

3 multipurpose ipa via conversational assistant 21

31 Introduction 21

311 Outline and Contributions 22

32 Model 23

321 OMB+ Design 23

322 Preprocessing 23

323 Natural Language Understanding 25

324 Dialogue Manager 26

325 Meta Policy 30

326 Response Generation 32

33 Evaluation 37

331 Automated Evaluation 37

332 Human Evaluation amp Error Analysis 37

34 Structured Dialogue Acquisition 39

35 Summary 39

36 Future Work 40

4 deep learning re-implementation of units 41

41 Introduction 41

411 Outline and Contributions 42

42 Related Work 43

43 BERT Bidirectional Encoder Representations from Trans-formers 44

44 Data amp Descriptive Statistics 46

45 Named Entity Recognition 48

451 Model Settings 49

46 Next Action Prediction 49

461 Model Settings 49

47 Evaluation and Results 50

48 Error Analysis 58

IX

[ September 4 2020 at 1017 ndash classicthesis ]

X contents

49 Summary 58

410 Future Work 59

ii clustering-based trend analyzer 61

5 foundations 63

51 Motivation and Challenges 63

52 Detection of Emerging Research Trends 64

521 Methods for Topic Modeling 65

522 Topic Evolution of Scientific Publications 67

53 Summary 68

6 benchmark for scientific trend detection 71

61 Motivation 71

611 Outline and Contributions 72

62 Corpus Creation 73

621 Underlying Data 73

622 Stratification 73

623 Document representations amp Clustering 74

624 Trend amp Downtrend Estimation 75

625 (Down)trend Candidates Validation 75

63 Benchmark Annotation 77

631 Inter-annotator Agreement 77

632 Gold Trends 78

64 Evaluation Measure for Trend Detection 79

65 Trend Detection Baselines 80

651 Configurations 81

652 Experimental Setup and Results 81

66 Benchmark Description 82

67 Summary 83

68 Future Work 83

7 discussion and future work 85

bibliography 87

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F F I G U R E S

Figure 1 Taxonomy of dialogue systems 10

Figure 2 Example of a conventional dialogue architecture 12

Figure 3 OMB+ online learning platform 24

Figure 4 Ruled-Based System Dialogue flow 36

Figure 5 Macro F1 evaluation NAP ndash default modelComparison between German and multilingualBERT 52

Figure 6 Macro F1 test NAP ndash default model Compari-son between German and multilingual BERT 53

Figure 7 Macro F1 evaluation NAP ndash extended model 54

Figure 8 Macro F1 test NAP ndash extended model 55

Figure 9 Macro F1 evaluation NER ndash Comparison be-tween German and multilingual BERT 56

Figure 10 Macro F1 test NER ndash Comparison between Ger-man and multilingual BERT 57

Figure 11 (Down)trend detection Overall distribution ofpapers in the entire dataset 73

Figure 12 A trend candidate (positive slope) Topic Sen-timent and Emotion Analysis (using Social Me-dia Channels) 76

Figure 13 A downtrend candidate (negative slope) TopicXML 76

XI

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F TA B L E S

Table 1 Natural language representation of a sentence 13

Table 2 Rule 1 Admissible configurations 27

Table 3 Rule 2 Admissible configurations 27

Table 4 Rule 3 Admissible configurations 27

Table 5 Rule 4 Admissible configurations 28

Table 6 Rule 5 Admissible configurations 28

Table 7 Case 1 ID completeness 31

Table 8 Case 2 ID completeness 31

Table 9 Showcase 1 Short flow 33

Table 10 Showcase 2 Organisational question 33

Table 11 Showcase 3 Fallback and human-request poli-cies 34

Table 12 Showcase 4 Contextual question 34

Table 13 Showcase 5 Long flow 35

Table 14 General statistics for conversational dataset 46

Table 15 Detailed statistics on possible systems actionsin conversational dataset 48

Table 16 Detailed statistics on possible named entitiesin conversational dataset 48

Table 17 F1-results for the Named Entity Recognition task 51

Table 18 F1-results for the Next Action Prediction task 51

Table 19 Average dialogue accuracy computed for theNext Action Prediction task 51

Table 20 Trend detection List of gold trends 79

Table 21 Trend detection Example of MAP evaluation 80

Table 22 Trend detection Ablation results 82

XII

[ September 4 2020 at 1017 ndash classicthesis ]

acronyms XIII

AI Artificial Intelligence

API Application Programming Interface

BERT Bidirectional Encoder Representations from Transformers

BiLSTM Bidirectional Long Short Term Memory

CRF Conditional Random Fields

DM Dialogue Manager

DST Dialogue State Tracker

DSTC Dialogue State Tracking Challenge

EC Equivalence Classes

ES ElasticSearch

GM Gaussian Models

IAA Inter Annotator Agreement

ID Informational Dictionary

IPA Intelligent Process Automation

ITS Intelligent Tutoring System

LDA Latent Dirichlet Allocation

LSA Latent Semantic Analysis

LSTM Long Short Term Memory

MAP Mean Average Precision

MER Mutually Exclusive Rules

NAP Next Action Prediction

NER Named Entity Recognition

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OMB+ Online Mathematik Bruumlckenkurs

PL Policy Learning

pLSA probabilistic Latent Semantic Analysis

REST Representational State Transfer

RNN Recurrent Neural Network

RPA Robotic Process Automation

RegEx Regular Expressions

TL Transfer Learning

ToDS Task-oriented Dialogue System

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

1I N T R O D U C T I O N

Robotic Process Automation (RPA) is a type of software bots that im-itates manual human activities like entering data into a system log-ging into accounts or executing simple but repetitive workflows (Mo-hanty and Vyas 2018) In other words RPA-systems allow businessapplications to work without human involvement by replicating hu-man worker activities A further benefit of RPA-bots is that they aremuch cheaper compared to the employment of a human worker andthey are also able to work around-the-clock Overall advantages ofsoftware bots are hugely appealing and include cost savings accuracyand compliance while executing processes improved responsiveness(ie bots are faster than human workers) and finally such bots areusually agile and multi-skilled (Ivancic Vugec and Vukšic 2019)

However one of the main benefits and at the same time ndash draw-backs of the RPA-systems is their ability to precisely fulfill the as-signed task On the one hand the bot can carry out the task accu-rately and diligently On the other hand it is highly susceptible tochanges in defined scenarios Being designed for a particular taskthe RPA-bot is often not adaptable to other domains or even simplechanges in a workflow (Mohanty and Vyas 2018) This inability toadapt to changing conditions gave rise to a further area of improve-ment for RPA-bots ndash Intelligent Process Automation (IPA) systems

IPA-bots combine Robotic Process Automation (RPA) with ArtificialIntelligence (AI) and thus can perform complex and more cognitivelydemanding tasks that require reasoning and language understand-ing Hence IPA-bots went beyond automating simple ldquoclick tasksrdquo (asin RPA) and can accomplish jobs more intelligently ndash employing ma-chine learning algorithms and advanced analytics Such IPA-systemsundertake time-consuming and routine tasks and thus enable smartworkflows and free up skilled workers to accomplish higher-valueactivities

Previously IPA techniques were primarily employed within the com-puter vision domain Starting with the automation of simple displayclick-tasks and going towards sophisticated self-driving cars withtheir ability to interpret a stream of situational data obtained fromsensors and cameras However in recent times Natural LanguageProcessing (NLP) became one of the potential applications for IPA aswell due to its ability to understand and interpret human languageNLP methods are used to analyze large amounts of textual data andto respond to various inquiries Furthermore if the available datais unstructured and does not have any predefined format (ie cus-tomer emails) or if it is available in a variable format (ie invoices

1

[ September 4 2020 at 1017 ndash classicthesis ]

2 introduction

legal documents) then the Natural Language Processing (NLP) tech-niques are applied to extract the multiple relevant information Thisdata then can be utilized to solve distinct problems (ie natural lan-guage understanding predictive analytics) (Mohanty and Vyas 2018)For instance in the case of predictive analytics such a bot wouldmake forecasts about the future using current and historical data andtherethrough assist with sales or investments

However NLP in the context of Intelligent Process Automation (IPA)is not limited to the extraction of relevant data and insights from tex-tual documents One of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots) that are used toenable human-to-machine interaction This type of software is com-monly used for customer support or within recommendation- andbooking systems Conversational agents gain enormous demand dueto their ability to simultaneously support a large number of userswhile communicating in a natural language In a conventional chat-bot system a user provides input in a natural language the bot thenprocesses this query to extract potentially useful information and re-sponds with a relevant reply This process comprises multiple stagesand involves different types of NLP subtasks starting with NaturalLanguage Understanding (NLU) (eg intent recognition named en-tity extraction) and going towards dialogue management (ie deter-mining the next possible bots action considering the dialogue his-tory) and response generation (eg converting the semantic repre-sentation of next bots action to a natural language utterance) Typicaldialogue system for Intelligent Process Automation (IPA) purposesusually undertakes shallow customer support requests (eg answer-ing of FAQs) allowing human workers to focus on more sophisticatedinquiries

Natural Language Processing (NLP) techniques may also be usedindirectly for IPA purposes There are attempts to automatically iden-tify and classify tasks extracted from textual process descriptions asmanual user or automated The goal of such an NLP applicationis to reduce the effort required to identify suitable candidates for fur-ther robotic process automation (Friedrich Mendling and Puhlmann2011 Leopold Aa and Reijers 2018)

Intelligent Process Automation (IPA) is an emerging technology andevolves mainly due to the recognized potential benefit of combiningRobotic Process Automation (RPA) and Artificial Intelligence (AI) tech-niques to solve industrial problems (Ivancic Vugec and Vukšic 2019Mohanty and Vyas 2018) Industries are typically interested in be-ing responsive to customers and markets (Mohanty and Vyas 2018)Thus most customer support applications need to be real-time activeand highly robust to achieve higher client satisfaction rates The man-ual accomplishment of such tasks is often time-consuming expensiveand error-prone Furthermore due to the massive amounts of textualdata available for analysis it seems not to be feasible to process thisdata entirely by human means

[ September 4 2020 at 1017 ndash classicthesis ]

11 outline 3

11 outline

This thesis covers two topics united by the broader theme which isIntelligent Process Automation (IPA) utilizing statistical Natural Lan-guage Processing (NLP) methods

the first block of the thesis (Chapter 2 ndash Chapter 4) addressesthe challenge of implementing a conversational agent for Intelli-gent Process Automation (IPA) purposes within the e-learningfield As mentioned above chatbots are one of the central ap-plications of the IPA domain since they can effectively performtime-consuming tasks requiring natural language understand-ing allowing human workers to concentrate on more valuabletasks The IPA conversational bot that was realized within thisthesis takes care of routine and time-consuming tasks performedby human tutors within an online mathematical course Thisbot is deployed in a real-world setting and interact with stu-dents within the OMB+ mathematical platform Conducting ex-periments we observed two possibilities to build the conversa-tional agent in industrial settings ndash first with purely rule-basedmethods considering the missing training data and individualaspects of the target domain (ie e-learning) Second we re-implemented two of the main system components (ie Natu-ral Language Understanding (NLU) and Dialogue Manager (DM)units) using the current state-of-the-art deep-learning algorithm(ie Bidirectional Encoder Representations from Transformers(BERT)) and investigated their performance and potential use asa part of a hybrid model (ie containing both rule-based andmachine learning methods)

the second block of the thesis (Chapter 5 ndash Chapter 6) con-siders an Intelligent Process Automation (IPA) problem withinthe predictive analytics domain and addresses the task of scien-tific trend detection Predictive analytics makes forecasts aboutfuture outcomes based on historical and current data Henceorganizations can benefit through the advanced analytics mod-els by detecting trends and their behaviors and then employ-ing this information while making crucial business decisions(ie investments) In this work we deal with the subproblemof trend detection task ndash specifically we address the lack ofpublicly available benchmarks for the evaluation of trend de-tection algorithms Therefore we assembled the benchmark forboth trends and downtrends (ie topics that become less frequentovertime) detection To the best of our knowledge the task ofdowntrend detection has not been well addressed before Fur-thermore the resulting benchmark is based on a collection ofmore than one million documents which is among the largestthat has been used for trend detection before and therefore of-fers a realistic setting for developing trend detection algorithms

[ September 4 2020 at 1017 ndash classicthesis ]

4 introduction

All main chapters contain their specific introduction contribution listrelated work section and summary

[ September 4 2020 at 1017 ndash classicthesis ]

Part I

E - L E A R N I N G C O N V E R S AT I O N A L A S S I S TA N T

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

2F O U N D AT I O N S

In this block we address the challenge of designing and implement-ing a conversational assistant for Intelligent Process Automation (IPA)purposes within the e-learning domain The system aims to supporthuman tutors of the Online Mathematik Bruumlckenkurs (OMB+) plat-form during the tutoring round by undertaking routine and time-consuming responsibilities Specifically the system intelligently auto-mates the process of information accumulation and validation thatprecede every conversation Therethrough the IPA-bot frees up tutorsand allows them to concentrate on more complicated and cognitivelydemanding tasks

In this chapter we introduce the fundamental concepts required bythe topic of the first part of the thesis We begin with the foundationsto conversational assistants in Section 22 where we present the taxon-omy of existing dialogue systems in Section 221 give an overview ofthe conventional dialogue architecture in Section 222 with its mostessential components and describe the existing approaches for build-ing a conversational agent in Section 223 We finally introduce thetarget domain for our experiments in Section 23

21 introduction

Conversational Assistants - a type of software that interacts with peo-ple via written or spoken natural language have become widely pop-ularized in recent times Todayrsquos conversational assistants support abroad range of everyday skills including but not limited to creatinga reminder answering factoid questions and controlling smart homedevices

Initially the notion of artificial companion capable of conversingin a natural language was anticipated by Alan Turing in 1949 (Tur-ing 1949) Still the first significant step in this direction was donein 1966 with the appearance of Weizenbaums system ELIZA (Weizen-baum 1966) Its successor was the ALICE (Artificial LinguisticInternet Computer Entity) ndash a natural language processing dialoguesystem that conversed with a human by applying heuristical patternmatching rules Despite being remarkably popular and technologi-cally advanced for their days both systems were unable to pass theTuring test (Alan 1950) Following a long silent period the nextwave of growth of intelligent agents was caused by the advances inNatural Language Processing (NLP) This was enabled by access tolow-cost memory higher processing speed vast amounts of available

7

[ September 4 2020 at 1017 ndash classicthesis ]

8 foundations

data and sophisticated machine learning algorithms Hence one ofthe most notable leaps in the history of conversational agents beganin the year 2011 with intelligent personal assistants In that time Ap-ples Siri followed by Microsofts Cortana and Amazons Alexa in 2014and then Google Assistant in 2016 came to the scene The main focusof these agents was not to mimic humans as in ELIZA or ALICEbut to assist a human by answering simple factoid questions and set-ting notification tasks across a broad range of content Another typeof conversational agent called Task-oriented Dialogue System (ToDS)became a focus of industries in the year 2016 For the most part thesebots aimed to support customer service (Cui et al 2017) and respondto Frequently Asked Questions (Liao et al 2018) Their conversa-tional style provided more depth than those of personal assistantsbut a narrower range since they were patterned for task solving andnot for open-domain discussions

One of the central advantages of conversational agents and the rea-son why they are attracting considerable interest is their ability togive attention to several users simultaneously while supporting nat-ural language communication Due to this conversational assistantsgain momentum in numerous kinds of practical support applications(ie customer support) where a high number of users are involved atthe same time Classical engagement of human assistants is very cost-efficient and such support is usually not full-time accessible There-fore to automate human assistance is desirable but also an ambi-tious task Due to the lack of solid Natural Language Understand-ing (NLU) advancing beyond primary tasks is still challenging (Lugerand Sellen 2016) and even the most prosperous dialogue systemsoften fail to meet userrsquos expectations (Jain et al 2018) Significantchallenges involved in the design and implementation of a conversa-tional agent varying from domain engineering to Natural LanguageUnderstanding (NLU) and designing of an adaptive domain-specificconversational flow The quintessential problems and how-to ques-tions while building conversational assistants include

bull How to deal with variability and flexibility of language

bull How to get high-quality in-domain data

bull How to build meaningful representations

bull How to integrate commonsense and domain knowledge

bull How to build a robust system

Besides that current Natural Language Processing (NLP) and Ar-tificial Intelligence (AI) techniques have weaknesses when applied todomain-specific task-oriented problems especially if underlying datacomes from an industrial source As machine learning techniquesheavily rely on structured and cleaned training data to deliver consis-tent performance typically noisy real-world data without access toadditional knowledge databases negatively impacts the classificationaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 9

Despite that conversational agents could be successfully employedto solve less cognitive but still highly relevant tasks In this case adialogue agent could be operated as a part of the Intelligent ProcessAutomation (IPA) system where it takes care of straightforward butroutine and time-consuming functions performed in a natural lan-guage while allowing a human worker to focus on more cognitivedemanding processes

22 conversational assistants

Development and design of a particular dialogue system usually varyby its objectives among the most fundamental are

bull Purpose Should it be an open-ended dialogue system (egchit-chat) task-oriented system (eg customer support) or apersonal assistant (eg Google Assistant)

bull Skills What kind of behavior should a system demonstrate

bull Data Which data is required to develop (resp train) specificblocks and how could it be assembled and curated

bull Deployment Will the final system be integrated into a partic-ular software platform What are the challenges in choosingsuitable tools How will the deployment happen

bull Implementation Should it be a rule-based information-retrievalmachine-learning or a hybrid system

In the following chapter we will take a look at the most commontaxonomies of dialogue systems and different design and implemen-tation strategies

221 Taxonomy

According to the goals conversational agents could be classified intotwo main categories task-oriented systems (eg for customer support)and social bots (eg for chit-chat purposes) Furthermore systemsalso vary regarding their implementation characteristics Below weprovide a detailed overview of the most common variations

task-oriented dialogue systems are famous for their abilityto lead a dialogue with the ultimate goal of solving a userrsquos problem(see Figure 1) A customer support agent or a flightrestaurant book-ing system exemplify this class of conversational assistants Suchsystems are more meaningful and useful as those with an entirely so-cial goal however they usually affect more complications during theimplementation stage Since Task-oriented Dialogue System (ToDS) re-quire a precise understanding of the userrsquos intent (ie goal) the Natu-ral Language Understanding (NLU) segment must perform flawlessly

[ September 4 2020 at 1017 ndash classicthesis ]

10 foundations

Besides that if applied in an industrial setting ToDS are usually builtmodular which means they mostly rely on handcrafted rules and pre-defined templates and are restricted in their abilities Personal assis-tants (ie Google Assistant) are members of the task-oriented familyas well Though compared to standard agents they involve more so-cial conversation and could be used for entertainment purposes (egldquoOk Google tell me a jokerdquo) The latter is typically not available inconventional task-oriented systems

social bots and chit-chat systems in contrast regularly donot hold any specific goal and are designed for small-talk and enter-tainment conversations (see Figure 1) Those agents are usually end-to-end trainable and highly data-driven which means they requirelarge amounts of training data However due to entirely learnabletraining such systems often produce inappropriate responses mak-ing them extremely unreliable for practical applications Alike chit-chat systems social bots are rich in social conversation however theypossess an ability to solve uncomplicated user requests and conducta more meaningful dialogue (Fang et al 2018) Those requests arenot the same as in task-oriented systems and are mostly pointed toanswer simple factoid questions

Figure 1 Taxonomy of dialogue systems according to the measure of theirtask accomplishment and social contribution Figure originatesfrom Fang et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 11

The aforementioned system types could be decomposed farther ac-cording to the implementation characteristics

open domain amp closed domain A Task-oriented Dialogue Sys-tem (ToDS) is typically built within a closed domain setting Thespace of potential inputs and outputs is restricted since the sys-tem attempts to achieve a specific goal Customer support orshopping assistants are cases of closed domain problems (Wenet al 2016) These systems do not need to be able to talk aboutoff-site topics but they have to fulfill their particular tasks asefficiently as possible

Whereas chit-chat systems do not assume a well-defined goalor intention and thus could be built within an open domainThis means the conversation can progress in any direction andwithout any or minimal functional purpose (Serban et al 2016)However the infinite number of various topics and the fact thata certain amount of commonsense is required to create reason-able responses makes an open-domain setting a hard problem

content-driven amp user-centric Social bots and chit-chat sys-tems are typically utilizing a content-driven dialogue mannerThat means that their responses are based on the large and dy-namic content derived from the daily web mining and knowl-edge graphs This approach builds a dialogue based on trendycontent from diverse sources and thus it is an ideal way forsocial bots to catch usersrsquo attention and bring a user into thedialogue

User-centric approaches in turn assume precise language un-derstanding to detect the sentiment of usersrsquo statements Ac-cording to the sentiment the dialogue manager learns usersrsquopersonalities handles rapid topic changes and tracks engage-ment In this case the dialogue builds on specific user interests

long amp short conversations Models with a short conversationstyle attempt to create a single response to a single input (egweather forecast) In the case with a long conversation a systemhas to go through multiple turns and to keep track of what hasbeen said before (ie dialogue history) The longer the conver-sation the more challenging it is to train a system Customersupport conversations are typically long conversational threadswith multiple questions

response generation One of the essential elements of a dialoguesystem is a response generation This could be either done in aretrieval-based manner or by using generative models (ie NaturalLanguage Generation (NLG))

The former usually utilizes predefined responses and heuristicsto select a relevant response based on the input and context Theheuristic could be a rule-based expression match (ie RegularExpressions (RegEx)) or an ensemble of machine learning clas-

[ September 4 2020 at 1017 ndash classicthesis ]

12 foundations

sifiers (eg entity extraction prediction of next action) Suchsystems do not generate any original text instead they select aresponse from a fixed set of predefined candidates These mod-els are usually used in an industrial setting where the systemmust show high robustness performance and interpretabilityof the outcomes

The latter in contrast do not rely on predefined replies Sincesuch models are typically based on (deep-) machine learningtechniques they attempt to generate new responses from scratchThis is however a complicated task and is mostly used forvanilla chit-chat systems or in a research domain where thestructured training data is available

222 Conventional Architecture

In this and the following subsection we review the pipeline and meth-ods for Task-oriented Dialogue System (ToDS) Such agents are typi-cally composed of several components (Figure 2) addressing diversetasks required to facilitate a natural language dialogue (WilliamsRaux and Henderson 2016)

Figure 2 Example of a conventional dialogue architecture with its maincomponents in the bounding box Language Understanding (NLU)Dialogue Manager (DM) Response Generation (NLG) Figure origi-nates from NAACL 2018 Tutorial Su et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 13

A conventional dialogue architecture implements this with the fol-lowing components

natural language understanding unit has the following ob-jectives it parses the user input classifies the domain (ie cus-tomer support restaurant booking not required for a singledomain systems) and intent (ie find-flight book-taxi) and ex-tracts entities (ie Italian food Berlin)

Generally NLU unit maps every user utterance into semanticframes that are predefined according to different scenarios Ta-ble 1 illustrates an example of sentence representation wherethe departure-location (Berlin) and the arrival-location (Munich)are specified as slot values and the domain (airline travel) andintent (find-flight) are specified as well Typically there are twokinds of possible representations The first one is the utterancecategory represented in the form of the userrsquos intent The sec-ond could be the word-level information extraction in the formof Named Entity Recognition (NER) or Slot Filling tasks

Sentence Find flight from Berlin to Munich today

SlotsConcepts O O O B-dept O B-arr B-date

Named Entity O O O B-city O B-city O

Intent Find_Flight

Domain Airline Travel

Table 1 An illustrative example of a sentence representation1

A dialog act classification sub-module also known as intent clas-sification is dedicated to detecting the primary userrsquos goal atevery dialogue state It labels each userrsquos utterance with oneof the predefined intents Deep neural networks can be directlyapplied to this conventional classification problem (KhanpourGuntakandla and Nielsen 2016 Lee and Dernoncourt 2016)

Slot filling is another sub-module for language understandingUnlike intent detection (ie classification problem) slot fillingis defined as a sequence labeling problem where words in thesequence are labeled with semantic tags The input is a se-quence of words and the output is a sequence of slots onefor every word Variations of Recurrent Neural Network (RNN)architectures (LSTM BiLSTM) (Deoras et al 2015) and (attention-based) encoder-decoder architectures (Kurata et al 2016) havebeen employed for this task

[ September 4 2020 at 1017 ndash classicthesis ]

14 foundations

dialogue manager unit is subdivided into two components ndashfirst is a Dialogue State Tracker (DST) that holds the conver-sational state and second is a Policy Learning (PL) that definesthe systems next action

Dialogue State Tracker (DST) is the core component to guaranteea robust dialogue flow since it manages the userrsquos intent at ev-ery turn The conventional methods which are broadly appliedin commercial implementations often adopt handcrafted rulesto pick the most probable candidate (Goddeau et al 1996 Wangand Lemon 2013) among the predefined Though a variety ofstatistical approaches have emerged since the Dialogue StateTracking Challenge (DSTC)2 was arranged by Jason D Williamsin year 2013 Among the deep-learning based approaches isthe Belief Tracker proposed by Henderson Thomson and Young(2013) It employs a sliding window to produce a sequence ofprobability distributions over an arbitrary number of possiblevalues (Chen et al 2017) In the work of Mrkšic et al (2016) au-thors introduced a Neural Belief Tracker to identify the slot-valuepairs The system used the user intents preceding the user in-put the user utterance itself and a candidate slot-value pairwhich it needs to decide about as the input Then the systemiterated overall candidate slot-value pairs to determine whichones have just been expressed by the user (Chen et al 2017)Finally Vodolaacuten Kadlec and Kleindienst proposed a Hybrid Di-alog State Tracker that achieved the state-of-the-art performancefor DSTC2 shared task Their model used a separate-learned de-coder coupled with a rule-based system

Based on the state representation from the Dialogue State Tracker(DST) the PL unit generates the next system action that is thendecoded by the Natural Language Generation (NLG) moduleUsually a rule-based agent is used to initialize the system (Yanet al 2017) and then supervised learning is applied to theactions generated by these rules Alternatively the dialoguepolicy can be trained end-to-end with reinforcement learningto manage the system making policies toward the final perfor-mance (Cuayaacutehuitl Keizer and Lemon 2015)

natural language generation unit transforms the next actioninto a natural language response Conventional approaches toNatural Language Generation (NLG) typically adopt a template-based style where a set of rules is defined to map every nextaction to a predefined natural language response Such ap-proaches are simple generally error-free and easy to controlhowever their implementation is time-consuming and the styleis repetitive and hardly scalable

Among the deep learning techniques is the work by Wen etal (2015) that introduced neural network-based approaches toNLG with an LSTM-based structure Later Zhou and Huang

2Currently Dialog System Technology Challenge

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 15

(2016) adopted this model along with the question informa-tion semantic slot values and dialogue act types to generatemore precise answers Finally Wu Socher and Xiong (2019)proposed Global to Local Memory Pointer Networks (GLMP) ndash anarchitecture that incorporates knowledge bases directly into alearning framework (due to the Memory Networks component)and is able to re-use information from the user input (due tothe Pointer Networks component)

In particular cases dialogue flow can include speech recognitionand speech synthesis units (if managing a spoken natural languagesystem) and connection to third-party APIs to request informationstored in external databases (eg weather forecast) (as depicted inFigure 2)

223 Existing Approaches

As stated above a conventional task-oriented dialogue system is im-plemented in a pipeline manner That means that once the systemreceives a user query it interprets it and acts according to the dia-logue state and the corresponding policy Additionally based on un-derstanding it may access the knowledge base to find the demandedinformation there Finally a system transforms a systemrsquos next ac-tion (and if given an extracted information) into its surface form as anatural language response (Monostori et al 1996)

Individual components (ie Natural Language Understanding (NLU)Dialogue Manager (DM) Natural Language Generation (NLG)) of aparticular conversational system could be implemented using differ-ent approaches starting with entirely rule- and template-based meth-ods and going towards hybrid approaches (using learnable componentsalong with handcrafted units) and end-to-end trainable machine learn-ing methods

rule-based approaches Though many of the latest researchapproaches handle Natural Language Understanding (NLU) and Nat-ural Language Generation (NLG) units by using statistical NaturalLanguage Processing (NLP) models (Bocklisch et al 2017 Burtsevet al 2018 Honnibal and Montani 2017) most of the industriallydeployed dialogue systems still utilise handcrafted features and rulesfor the state tracking action prediction intent recognition and slotfilling tasks (Chen et al 2017 Ultes et al 2017) So most of thePyDial3 framework modules offer rule-based implementations usingRegular Expressions (RegEx) rule-based dialogue trackers (Hender-son Thomson and Williams 2014) and handcrafted policies Alsoin order to map the next action to text Ultes et al (2017) suggest rulesalong with a template-based response generation

3PyDial Framework httpwwwcamdialorgpydial

[ September 4 2020 at 1017 ndash classicthesis ]

16 foundations

The rule-based approach ensures robustness and stable performancethat is crucial for industrial systems that communicate with a largenumber of users simultaneously However it is highly expensive andtime-consuming to deploy a real dialogue system built in that man-ner The major disadvantage is that the usage of handcrafted systemsis restricted to a specific domain and possible adaptation requiresextensive manual engineering

end-to-end learning approaches Due to the recent advanceof end-to-end neural generative models (Collobert et al 2011) manyefforts have been made to build an end-to-end trainable architecturefor conversational agents Rather than using the traditional pipeline(see Figure 2) the end-to-end model is conceived as a single module(Chen et al 2017)

Wen et al (2016) followed by Bordes Boureau and Weston (2016)were among the pioneers who adapted an end-to-end trainable ap-proach for conversational systems Their approach was to treat dia-logue learning as the problem of learning a mapping from dialoguehistories to system responses and to apply an encoder-decoder model(Sutskever Vinyals and Le 2014) to train the entire system The pro-posed model still had its significant limitations (Glasmachers 2017)such as missing policies to learn a dialogue flow and the inability toincorporate additional knowledge that is especially crucial for train-ing a task-oriented agent To overcome the first problem Dhingraet al (2016) introduced an end-to-end reinforcement learning approachthat jointly trains dialogue state tracker and policy learning unitsThe second weakness was addressed by Eric and Manning (2017)who augmented existing recurrent network architectures with a dif-ferentiable attention-based key-value retrieval mechanism over theentries of a knowledge base This approach was inspired by Key-Value Memory Networks (Miller et al 2016) Later on Wu Socherand Xiong (2019) proposed Global to Local Memory Pointer Networks(GLMP) where the authors incorporated knowledge bases directlyinto a learning framework In their model a global memory encoderand a local memory decoder are employed to share external knowl-edge The encoder encodes dialogue history modifies global con-textual representation and generates a global memory pointer Thedecoder first produces a sketch response with empty slots Next ittransfers the global memory pointer to filter the external knowledgefor relevant information and then instantiates the slots via the localmemory pointers

Despite having better adaptability compared to any rule-based sys-tem and being simpler to train end-to-end approaches remain unattain-able for commercial conversational agents that operating on real-world data A well and carefully constructed task-oriented dialoguesystem in an observed domain using handcrafted rules (in NLU andDST units) and predefined responses for NLG still outperforms the

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 17

end-to-end systems due to its robustness (Glasmachers 2017 WuSocher and Xiong 2019)

hybrid approaches Though end-to-end learning is an attrac-tive solution for conversational systems current techniques are data-intensive and require large amounts of dialogues to learn simple be-haviors To overcome this barrier Williams Asadi and Zweig (2017)introduce Hybrid Code Networks (HCNs) which is an ensemble ofretrieval and trainable units System utilizes an Recurrent NeuralNetwork (RNN) for Natural Language Understanding (NLU) blockdomain-specific knowledge that are encoded as software (ie API

calls) and custom action templates used for Natural Language Gen-eration (NLG) unit Compared to existing end-to-end methods theauthors report that their approach considerably reduces the amountof data required for training (Williams Asadi and Zweig 2017) Ac-cording to the authors HCNs achieve state-of-the-art performance onthe bAbI dialog dataset (Bordes Boureau and Weston 2016) and out-perform two commercial customer dialogue systems4

Along with this work Wen et al (2016) propose a neural network-based model that is end-to-end trainable but still modularly con-nected In their work authors treat dialogue as a sequence to se-quence mapping problem (Sutskever Vinyals and Le 2014) aug-mented with the dialogue history and the current database searchoutcome (Wen et al 2016) The authors explain the learning processas follows At every turn the system takes a user input representedas a sequence of tokens and converts it into two internal represen-tations a distributed representation of the intent and a probabilitydistribution over slot-value pairs called the belief state (Young et al2013) The database operator then picks the most probable values inthe belief state to form the search result At the same time the intentrepresentation and the belief state are transformed and combined bya Policy Learning (PL) unit to form a single vector representing thenext system action This system action is then used to generate asketch response token by the token The final system response iscomposed by substituting the actual values of the database entriesinto the sketch sentence structure (Wen et al 2016)

Hybrid models appear to replace the established rule- and template-based approaches currently utilized in an industrial setting The per-formance of the most Natural Language Understanding (NLU) com-ponents (ie Named Entity Recognition (NER) domain and intentclassification) is reasonably high to apply them for the developmentof commercial conversational agents Whereas the Natural LanguageGeneration (NLG) unit could still be realized as a template-based so-lution by employing predefined response candidates and predictingthe next system action employing deep learning systems

4Internal Microsoft Research datasets in customer support and troubleshootingdomains were employed for training and evaluation

[ September 4 2020 at 1017 ndash classicthesis ]

18 foundations

23 target domain amp task definition

MUMIE5 is an open-source e-learning platform for studying and teach-ing mathematics and computer science Online Mathematik Bruumlck-enkurs (OMB+) is one of the projects powered by MUMIE SpecificallyOnline Mathematik Bruumlckenkurs (OMB+)6 is a German online learningplatform that assists students who are preparing for an engineeringor computer science study at a university

The coursersquos central purpose is to assist students in reviving theirmathematical skills so that they can follow the upcoming universitycourses This online course is thematically segmented into 13 partsand includes free mathematical classes with theoretical and practi-cal content Every chapter begins with an overview of its sectionsand consists of explanatory texts with built-in videos examples andquick checks of understanding Furthermore various examinationmodes to practice the content (ie training exercises) and to checkthe understanding of the theory (ie quiz) are available Besidesthat OMB+ provides a possibility to get assistance from a human tu-tor Usually the students and tutors interact via a chat interface inwritten form The language of communication is German

The current dilemma of the OMB+ platform is that the number ofstudents grows significantly every year but it is challenging to findqualified tutors ndash and it is expensive to hire more human tutors Thisresults in a longer waiting period for students until their problemscan be considered and processed In general all student questionscan be grouped into three main categories Organizational questions(ie course certificate) contextual questions (ie content theorem) andmathematical questions (exercises solutions) To assist a student witha mathematical question a tutor has to know the following regularinformation What kind of topic (or subtopic) a student has a problemwith At which examination mode (quiz chapter level training or exer-cise section level training or exercise or final examination) a studentis working right now And finally the exact question number and exactproblem formulation This means that a tutor has to ask the same on-boarding questions every time a new conversation starts This is how-ever very time-consuming and could be successfully solved within anIntelligent Process Automation (IPA) system

The work presented in Chapter 3 and Chapter 4 was enabled by thecollaboration with MUMIE and Online Mathematik Bruumlckenkurs (OMB+)team and addresses the abovementioned problem Within Chapter 3we implemented an Intelligent Process Automation (IPA) dialogue sys-tem with rule- and retrieval-based algorithms that interacts with stu-dents and performs the onboarding This system is multipurposeon the one hand it eases the assistance process for human tutors byreducing repetitive and time-consuming activities and therefore al-lows workers to focus on more challenging tasks Second interacting

5Mumie httpswwwmumienet6httpswwwombplusde

[ September 4 2020 at 1017 ndash classicthesis ]

24 summary 19

with students it augments the resources with structured and labeledtraining data The latter in turn allowed us to re-train specific sys-tem components utilizing machine and deep learning methods Wedescribe this procedure in Chapter 4

We report that the earlier collected human-to-human data from theOMB+ platform could not be used for training a machine learning sys-tem since it is highly unstructured and contains no direct question-answer matching which is crucial for trainable conversational algo-rithms We analyzed these conversations to get insight from themthat we used to define the dialogue flow

We also do not consider the automatization of mathematical ques-tions in this work since this task would require not only preciselanguage understanding and extensive commonsense knowledge butalso the accurate perception of mathematical notations The tutor-ing process at OMB+ differentiates a lot from the standard tutoringmethods Here instructors attempt to navigate students by givinghints and tips instead of providing an out-of-the-box solution Exist-ing human-to-human dialogues revealed that such kind of task couldbe even challenging for human tutors since it is not always directlyperceptible what precisely the student does not understand or hasdifficulties with Furthermore dialogues containing mathematical ex-planations could last for a long time and the topic (ie intent andfinal goal) can shift multiple times during the conversation

To our knowledge this task would not be feasible to sufficientlysolve with current machine learning algorithms especially consider-ing the missing training data A rule-based system would not bethe right solution for this purpose as well due to a large number ofcomplex rules which would need to be defined Thus in this workwe focus on the automatization of less cognitively demanding taskswhich are still relevant and must be solved in the first instance to easethe tutoring process for instructors

24 summary

Conversational agents are a highly demanding solution for indus-tries due to their potential ability to support a large number of usersmulti-threaded while performing a flawless natural language conver-sation Many messenger platforms are created and custom chatbotsin various constellations emerge daily Unfortunately most of thecurrent conversational agents are often disappointing to use due tothe lack of substantial natural language understanding and reason-ing Furthermore Natural Language Processing (NLP) and ArtificialIntelligence (AI) techniques have limitations when applied to domain-specific and task-oriented problems especially if no structured train-ing data is available

Despite that conversational agents could be successfully applied tosolve less cognitive but still highly relevant tasks ndash that is within the

[ September 4 2020 at 1017 ndash classicthesis ]

20 foundations

Intelligent Process Automation (IPA) domain Such IPA-bots wouldundertake tedious and time-consuming responsibilities to free up theknowledge worker for more complex and intricate tasks IPA-botscould be seen as a member of goal-oriented dialogue family and arefor the most part performed either in a rule-based manner or throughhybrid models when it comes to a real-world setting

[ September 4 2020 at 1017 ndash classicthesis ]

3M U LT I P U R P O S E I PA V I A C O N V E R S AT I O N A LA S S I S TA N T

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research describedin this chapter was carried out in its entirety by the au-thor of this thesis The other authors of the publicationacted as advisers At the time of writing this thesis thesystem described in this work was deployed at the OnlineMathematik Bruumlckenkurs (OMB+) platform

As we outlined in the introduction Intelligent Process Automa-tion (IPA) is an emerging technology with the primary goal of assist-ing the knowledge worker by taking care of repetitive routine andlow-cognitive tasks Conversational assistants that interact with usersin a natural language are a potential application for the IPA domainSuch smart virtual agents can support the user by responding to var-ious inquiries and performing routine tasks carried out in a naturallanguage (ie customer support)

In this work we tackle the challenge of implementing an IPA con-versational agent in a real-world industrial context and conditions ofmissing training data Our system has two meaningful benefits Firstit decreases monotonous and time-consuming tasks and hence letsworkers concentrate on more intellectual processes Second interact-ing with users it augments the resources with structured and labeledtraining data The latter in turn can be used to retrain a systemutilizing machine and deep learning methods

31 introduction

Conversational dialogue systems can give attention to several userssimultaneously while supporting natural communication They arethus exceptionally needed for practical assistance applications wherea high number of users are involved at the same time Regularly di-alogue agents require a lot of natural language understanding andcommonsense knowledge to hold a conversation reasonably and in-telligently Therefore nowadays it is still challenging to substitutehuman assistance with an intelligent agent completely However adialogue system could be successfully employed as a part of the In-telligent Process Automation (IPA) application In this case it wouldtake care of simple but routine and time-consuming tasks performed

21

[ September 4 2020 at 1017 ndash classicthesis ]

22 multipurpose ipa via conversational assistant

in a natural language while allowing a human worker to focus onmore cognitive demanding processes

Recent research in the dialogue generation domain is conducted byemploying Artificial Intelligence (AI) techniques like machine- anddeep learning (Lowe et al 2017 Wen et al 2016) However thosemethods require high-quality data for training that is often not di-rectly available for domain-specific and industrial problems Eventhough companies gather and store vast volumes of data it is of-ten low-quality with regards to machine learning approaches (Daniel2015) Especially if it concerns dialogue data which has to be prop-erly structured as well annotated Whereas if trained on artificiallycreated datasets (which is often the case in the research setting) thesystems can solve only specific problems and are hardly adaptableto other domains Hence despite the popularity of deep learningend-to-end models one still needs to rely on traditional pipelines inpractical dialogue engineering mainly while introducing a new do-main

311 Outline and Contributions

This work addresses the challenge of implementing a dialogue systemfor IPA purposes within the practical e-learning domain under condi-tions of missing training data The chapter is structured as followsIn Section 32 we introduce our method and describe the design ofthe rule-based dialogue architecture with its main components suchas Natural Language Understanding (NLU) Dialogue Manager (DM)and Natural Language Generation (NLG) units In Section 33 wepresent the results of our dual-focused evaluation and Section 34 de-scribes the format of collected structured data Finally we concludein Section 35

Our contributions within this work are as follows

bull We implemented a robust conversational system for IPA pur-poses within a practical e-learning domain and under the con-ditions of missing training (ie dialogue) data

bull The system has two main objectives

ndash First it reduces repetitive and time-consuming activitiesand allows workers of the e-learning platform to focussolely on mathematical inquiries

ndash Second by interacting with users it augments the resourceswith structured and partially labeled training data for fur-ther implementation of trainable dialogue components

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 23

32 model

The central purpose of the introduced system is to interact with stu-dents at the beginning of every chat conversation and gather informa-tion on the topic (and sub-topic) examination mode and level questionnumber and exact problem formulation Such onboarding process savestime for human tutors and allows them to handle mathematical ques-tions solely Furthermore the system is realized in a way such that itaccumulates labeled and structured dialogues in the background

Figure 4 displays the intact conversation flow After the systemaccepts user input it analyzes it at different levels and extracts infor-mation if such is provided If some of the information required fortutoring is missing the system asks the student to provide it Whenall the information points are collected they are automatically vali-dated and forwarded to a human tutor The latter can then directlyproceed with the assistance process In the following we explain thekey components of the system

321 OMB+ Design

Figure 3 illustrates the internal structure and design of the OMB+ plat-form It has topics and sub-topics as well as four various examinationmodes Each topic (Figure 3 tag 1) correspond to a chapter level and al-ways has sub-topics (Figure 3 tag 2) that correspond to a section levelExamination modes training and exercise are ambiguous because theycorrespond to either a chapter (Figure 3 tag 3) or a section (Figure 3tag 5) level and it is important to differentiate between them sincethey contain different types of content The mode final examination(Figure 3 tag 4) always corresponds to a chapter level whereas quiz(Figure 3 tag 5) can belong only to a section level According to thedesign of the OMB+ platform there are several ways of how a possibledialogue flow can proceed

322 Preprocessing

In a natural language conversation one may respond in various formsmaking the extraction of data from user-generated text challengingdue to potential misspellings or confusable spellings (ie Aufgabe 1aAufgabe 1 (a)) Hence to facilitate a substantial recognition of intentsand extraction of entities we normalize (eg correct misspellingsdetect synonyms) and preprocess every user input before moving for-ward to the Natural Language Understanding (NLU) module

[ September 4 2020 at 1017 ndash classicthesis ]

24 multipurpose ipa via conversational assistant

Figure 3 OMB+ Online Learning Platform where 1 is the Topic (corre-sponds to a chapter level) 2 is a Sub-Topic (corresponds to a sectionlevel) 3 is chapter level examination mode 4 is the Final Exam-ination Mode (available only for chapter level) 5 are the Examina-tion Modes Exercise Training (available at section levels) and Quiz(available only at section level) and 6 is the OMB+ Chat

We perform both linguistic and domain-specific preprocessing whichinclude the following steps

bull lowercasing and stemming of words in the input query

bull removal of stop words and punctuation

bull all mentions of X in mathematical formulations are removed toavoid confusion with roman number 10 (ldquoXrdquo)

bull in a combination of the type word ldquoKapitelAufgaberdquo + digitwritten as a word (ie ldquoeinsrdquo ldquozweitesrdquo etc) word is replacedwith a digit (ldquoKapitel einsrdquo rarr ldquoKapitel 1rdquo ldquoim fuumlnften Kapitelrdquo rarrldquoim Kapitel 5rdquo) roman numbers are replaced with a digit as well(ldquoKapitel IVrdquorarr ldquoKapitel 4rdquo)

bull detected ambiguities are normalized (eg ldquoTrainingsaufgaberdquorarrldquoTrainingrdquo)

bull recognized misspellings resp type errors are corrected (egldquoDifeernzialrechnungrdquorarr ldquoDifferentialrechnungrdquo)

bull permalinks are parsed and analyzed From each permalink it ispossible to extract topic examination mode and question num-ber

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 25

323 Natural Language Understanding

We implement the Natural Language Understanding (NLU) unit uti-lizing handcrafted rules Regular Expressions (RegEx) and Elasctic-search1 API

intent classification According to the design of the OMB+platform all student questions can be classified into three cate-gories organizational questions (ie course certificate) contextualquestions (ie content theorem) and mathematical questions (exer-cises solutions) To classify the input query by its intent (ie cat-egory) we employ weighted key-word information in the formof handcrafted rules We assume that specific words are explic-itly associated with a corresponding intent (eg theorem or rootdenote the mathematical question feedback or certificate denoteorganizational inquire) If no intent could be ordered then itis assumed that the NLU unit was not capable of understand-ing and the intent is unknown In this case the virtual agentrequests the user to provide an intent manually by picking onefrom the mentioned three options The queries from organiza-tional and theoretical categories are directly handed over to ahuman tutor while mathematical questions are analyzed by theautomated agent for further information extraction

entity extraction On the next step the system retrieves the en-tities from a user message We specify five following prerequi-site entities which have to be collected before the dialogue canbe handed over to a human tutor topic sub-topic examina-tion mode and level and question number This part is imple-mented using ElasticSearch (ES) and RegEx To facilitate the useof ES we indexed the OMB+ web-page to an internal databaseBesides indexing the titles of topics and sub-topics we also pro-vided supplementary information on possible synonyms andwriting styles We additionally filed OMB+ permalinks whichdirect to the site pages To query the resulting database we em-ploy the internal Elasticsearch multi match function and set theminimum should match parameter to 20 This parameter spec-ifies the number of terms that must match for a document tobe considered relevant Besides that we adopted fuzziness withthe maximum edit distance set to 2 characters The fuzzy queryuses similarity based on Levenshtein edit distance (Levenshtein1966) Finally the system produces a ranked list of potentialmatching entries found in the database within the predefinedrelevance threshold (we set it to θ=15) We then pick the mostprobable entry as the right one and select the correspondingentity from the user input

Overall the Natural Language Understanding (NLU) module re-ceives the user input as a preprocessed text and examines it across all

1httpswwwelasticcoproductselasticsearch

[ September 4 2020 at 1017 ndash classicthesis ]

26 multipurpose ipa via conversational assistant

predefined RegEx statements and for a match in the ElasticSearch (ES)database We use ES to retrieve entities for the topic sub-topic andexamination mode whereas RegEx is used to extract the informationon a question number Every time the entity is extracted it is filled inthe Informational Dictionary (ID) The ID has the following six slotsto be filled in topic sub-topic examination level examination modequestion number and exact problem formulation (this information isrequested on further steps)

324 Dialogue Manager

A conventional Dialogue Manager (DM) consists of two interactingcomponents The first component is the Dialogue State Tracker (DST)that maintains a representation of the current conversational stateand the second is the Policy Learning (PL) that defines the next systemaction (ie response)

In our model every agentrsquos next action is determined by the stateof the previously obtained information accumulated in the Informa-tional Dictionary (ID) For instance if the system recognizes that thestudent works on the final examination it also knows (defined by hand-coded logic in the predefined rules) that there is no necessity to askfor sub-topic because the final examination always corresponds to achapter level (due to the layout of OMB+ platform) If the system rec-ognizes that the user struggles in solving quiz it has to ask for boththe corresponding topic and sub-topic because the quiz always relatesto a section level

To discover all of the potential conversational flows we implementMutually Exclusive Rules (MER) which indicate that two events e1and e2 are mutually exclusive or disjoint since they cannot both oc-cur at the same time (ie the intersection of these events is emptyP(A cap B) = 0) Additionally we defined transition and mapping rulesHence only particular entities can co-occur while others must beexcluded from the specific rule combination Assume a topic t =

Geometry According to MER this topic can co-occur with one of thepertaining sub-topics defined at the OMB+ (ie s = Angel) Thus thelevel of the examination would be l = Section (but l 6= Chapter)because the student already reported a sub-section he or she is work-ing on In this specific case the examination mode could either bee isin [TrainingExerciseQuiz] but not e = Final_ExaminationThis is because e = Final_Examination can co-occur only with atopic (chapter-level) but not with a sub-topic (section-level) The ques-tion number q could be arbitrary and has to co-occur with every othercombination

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 27

This allows the formal solution to be found Assume a list of alltheoretically possible dialogue states S = [topic sub-topic trainingexercise chapter level section level quiz final examination questionnumber] and for each element sn in S is true that

sn =

1 if information is given

0 otherwise(1)

This gives all general (resp possible) dialogue states without ref-erence to the design of the OMB+ platform However to make thedialogue states fully suitable for the OMB+ from the general stateswe select only those which are valid To define the validness of thestate we specify the following five MERrsquos

R1 Topic

not T T

Table 2 Rule 1 ndash Admissible topic configurations

Rule (R1) in Table 2 denotes admissible configurations for topic andmeans that the topic could be either given (T ) or not (notT )

R2 Examination Mode

not TR not E not Q not FE

TR not E not Q not FE

not TR E not Q not FE

not TR not E Q not FE

not TR not E not Q FE

Table 3 Rule 2 ndash Admissible examination mode configurations

Rule (R2) in Table 3 denotes that either no information on the ex-amination mode is given or examination mode is Training (TR) orExercise (E) or Quiz (Q) or Final Examination (FE) but not more thanone mode at the same time

R3 Level

not CL not SL

CL not SL

not CL SL

Table 4 Rule 3 ndash Admissible level configurations

Rule (R3) in Table 4 indicates that either no level information isprovided or the level corresponds to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

[ September 4 2020 at 1017 ndash classicthesis ]

28 multipurpose ipa via conversational assistant

R4 Examination amp Level

not TR not E not SL not CL

TR not E SL not CL

TR not E not SL CL

TR not E not SL not CL

not TR E SL not CL

not TR E not SL CL

not TR E not SL not CL

Table 5 Rule 4 ndash Admissible examination mode (only for Training and Exer-cise) and corresponding level configurations

Rule (R4) in Table 5 means that Training (TR) and Exercise (E) ex-amination modes can either belong to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

R5 Topic amp Sub-Topic

not T not ST

T not ST

not T ST

T ST

Table 6 Rule 5 ndash Admissible topic and corresponding sub-topic configura-tions

Rule (R5) in Table 6 symbolizes that we could be either given onlya topic (T ) or the combination of topic and sub-topic (ST ) at the sametime or only sub-topic or no information on this point at all

We then define a valid dialogue state as a dialogue state that meetsall requirements of the abovementioned rules

foralls(State(s)and R1minus5(s))rarr Valid(s) (2)

Once we have the valid states for dialogues we perform a mappingfrom each valid dialogue state to the next possible systems action Forthat we first define five transition rules The order of transition rulesis important

T1 = notT (3)

means that no topic (T ) is found in the ID (ie could not be ex-tracted from user input)

T2 = notEM (4)

indicates that no examination mode (EM) is found in the ID

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 29

T3 = EM where EM isin [TRE] (5)

denotes that the extracted examination mode (EM) is either Train-ing (TR) or Exercise (E)

T4 =

T notST TR SL

T notST E SL

T notST Q

(6)

means that no sub-topic (ST ) is provided by a user but ID either al-ready contains the combination of topic (T ) training (TR) and sectionlevel (SL) or the combination of topic exercise (E) and section levelor the combination of topic and quiz (Q)

T5 = notQNR (7)

indicates that no question number (QNR) was provided by a stu-dent (or could not be successfully extracted)

Finally we assumed the list of possible next actions for the system

A =

Ask for Topic

Ask for Examination Mode

Ask for Level

Ask for Sub-Topic

Ask for Question Nr

(8)

Following the transition rules we map each valid dialogue state tothe possible next action am in A

existaforalls(Valid(s)and T1(s))rarr a(s T) (9)

in the case where we do not have any topic provided the nextaction is to ask for the topic (T )

existaforalls(Valid(s)and T2(s))rarr a(s EM) (10)

if no examination mode is provided by a user (or it could not besuccessfully extracted from the user query) the next action is definedas ask for examination mode (EM)

existaforalls(Valid(s)and T3(s))rarr a(s L) (11)

in case where we know the examination mode EM isin [TrainingExercise] we have to ask about the level (ie training at chapter levelor training at section level) thus the next action is ask for level (L)

[ September 4 2020 at 1017 ndash classicthesis ]

30 multipurpose ipa via conversational assistant

existaforalls(Valid(s)and T4(s))rarr a(s ST) (12)

if no sub-topic is provided but the examination mode EM isin [Train-ing Exercise] at section level the next action is defined as ask for sub-topic (ST )

existaforalls(Valid(s)and T5(s))rarr a(s QNR) (13)

if no question number is provided by a user then the next action isask for question number (QNR)

Following this scheme we generate 56 state transitions which spec-ify the next system actions Being on a new conversation state wecompare the extracted (or updated) data in the Informational Dictio-nary (ID) with the valid dialogue states and select the mapped actionas the next systemrsquos action

flexibility of dst The DST transitions outlined above are meantto maintain the current configuration of the OMB+ learning platformStill further MERs could be effortlessly generated to create new tran-sitions Examining this we performed tests with the previous de-sign of the platform where all the topics except for the first onehad sub-topics Whereas the first topic contained both sub-topics andsub-sub-topics We could produce the missing transitions with the de-scribed approach The number of permissible transitions in this caseincreased from 56 to 117

325 Meta Policy

Since the proposed system is expected to operate in a real-world set-ting we had to develop additional policies that manage the dialogueflow and verify the systemrsquos accuracy We explain these policies be-low

completeness of informational dictionary The model au-tomatically verifies the completeness of the Informational Dic-tionary (ID) which is determined by the number of essentialslots filled in the informational dictionary There are 6 discretecases when the ID is deemed to be complete For example ifa user works on a final examination mode the agent does nothas to request a sub-topic or examination level Hence the ID

has to be filled only with data for a topic examination typeand question number Whereas if the user works on trainingmode the system has to collect information about the topic sub-topic examination level examination mode and the questionnumber Once the ID is intact it is provided to the verificationstep Otherwise the agent proceeds according to the next ac-tion The system extracts the information in each dialogue stateand therefore if the user gives updated information on any sub-

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 31

ject later in the dialogue history the corresponding slot will beupdated in the ID

Below are examples of two final cases (out of six) where Infor-mational Dictionary (ID) is considered to be complete

Case1 =

intent = Math

topic 6= None

exam mode = Final Examination

level = None

question nr 6= None

(14)

Table 7 Case 1 ndash Any topic examination mode is final examination exami-nation level does not matter any question number

Case2 =

intent = Math

topic 6= None

sub-topic 6= None

exam mode isin [Training Exercise]

level = Section

question nr 6= None

(15)

Table 8 Case 2 Any topic any related sub-topic examination mode is ei-ther training or exercise examination level is section any questionnumber

verification step After the system has collected all the essentialdata (ie ID is complete) it continues to the final verification stepHere the obtained data is presented to the current student inthe dialogue session The student is asked to review the exactnessof the data and if any entries are faulty to correct them TheInformational Dictionary (ID) is where necessary updated withthe user-provided data This procedure repeats until the userconfirms the correctness of the compiled data

fallback policy In the cases where the system fails to retrieveinformation from a studentrsquos query it re-asks a user and at-tempts to retrieve information from a follow-up response Themaximum number of re-ask trials is a parameter and set to threetimes (r = 3) If the system is still incapable of extracting in-formation the user input is considered as the ground truth andfilled to the appropriate slot in ID An exception to this ruleapplies where the user has to specify the intent manually Afterthree unclassified trials a dialogues session is directly handedover to a human tutor

[ September 4 2020 at 1017 ndash classicthesis ]

32 multipurpose ipa via conversational assistant

human request In every dialogue state a user can shift to a hu-man tutor For this a user can enter the ldquohumanrdquo (or ldquoMenschrdquoin German language) key-word Therefore every user messageis investigated for the presence of this key-word

326 Response Generation

The semantic representation of the systems next action is transformedinto a natural language utterance Hence each possible action ismapped to precisely one response that is predefined in the templateSome of the responses are fixed (ie ldquoWelches Kapitel bearbeitest dugeraderdquo) 2 Others have placeholders for custom values In the lattercase the utterance can be formulated dependent on the InformationalDictionary (ID)

Assume predefined response ldquoDeine Angaben waren wie folgt a)Kapitel K b) Abschnitt A c) Aufgabentyp M d) Aufgabe Q e)Ebene E Habe ich dich richtig verstanden Bitte antworte mit Ja oderNeinrdquo3 The slots [KAMQE] are the placeholders for the informa-tion stored in the ID During the conversation the system selects thetemplate according to the next action and fills the slots with extractedvalues if required After the extracted information is filled in the cor-responding placeholders in the utterance the final response could beas follows ldquoDeine Angaben waren wie folgt a) Kapitel I ElementaresRechnen b) Abschnitt Rechenregel fuumlr Potenzen c) Aufgabentyp Quiz d)Aufgabe 1(a) e) Ebene Abschnitt Habe ich dich richtig verstanden Bitteantworte mit Ja oder Neinrdquo4

Below we demonstrate five sample dialogues in original Germanlanguage with variable flows

bull short dialogue where certain information is already providedin the first studentrsquos input

bull organizational question case

bull contextual question case

bull example with the manual intent classification

bull example with fallback and human policies

bull long flow with entry correction on the final verification phase

2Translates to ldquoWhich chapter are you currently working onrdquo3Translates to ldquoYour details were as follows a) Chapter b) Section c) Examina-

tion Mode d) Question Number e) Level Did I understand you correctly Pleaseanswer yes or no

4Translates to ldquoYour details were as follows a) Elementary Calculus b) PowerLaws c) Quiz d) Question 1 (a) e) Section Level Did I understand you correctlyPlease answer yes or no

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 33

Student Hallo ich habe Schwierigkeiten beim Kapitel I Elementares Rech-nen mit der Uumlbungsaufgabe 1 a

Next-Action [Ask for a Level Chapter or Section]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel ldquoGeometrierdquo) oder um eine Aufgabe auf der Ebene einesAbschnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt ldquoWinkelrdquoim Kapitel ldquoGeometrierdquo) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlrAbschnitt

Student Kapitel

Next-Action [Final-Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Aufgabentyp Exercisec) Aufgabe 1ad) Ebene KapitelHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yep

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 9 Showcase 1 Short flow most of the information was extracted fromthe initial user input In bold are the extracted entities

Student Hallo ich habe eine Frage bzgl eines Zertifikates und zwar wiekann ich es austellen lassen

Next-Action [Organisational Question]Bot Es scheint sich wohl um eine organisatorische Frage zu handeln Bittefasse deine Frage in einem kurzen Text zusammen damit ich sie an meinenmenschlichen Kollegen weiterleiten kann

Student Wie bekomme ich ein Zertifikat

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 10 Showcase 2 Organisational Question Underlined are the key-words which point on the organisational intent

[ September 4 2020 at 1017 ndash classicthesis ]

34 multipurpose ipa via conversational assistant

Student hallo

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student ma

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student math

Next-Action [Ask for Topic]Bot Welches Kapitel bearbeitest du gerade

Student Lineare Gleichungssysteme

Next-Action [Ask for Examination Mode]Bot Bearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfung

Student human

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 11 Showcase 3 Manual intent classification Fallback and Human Re-quest policies (in bold type)

Student Halli-Hallo koumlnnte mir bitte jemand die Regel 72 (vertikale streck-ung und stauchung des graphen) erklaumlren Danke im Voraus

Next-Action [Contextual Question]Bot Es scheint sich wohl um eine inhaltliche Frage zu handeln Bitte fassedeine Frage in einem kurzen Text zusammen damit ich sie an meinen men-schlichen Kollegen weiterleiten kann

Student Ich habe Schwierigkeiten beim Verstehen der Regel 72 Was istgenau die vertikale Streckung des Graphen

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 12 Showcase 4 Contextual Question Underlined are the key-wordswhich point on the contextual intent

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 35

Student Hallo ich kann beim Kapitel 1 nicht weiterkommen Trainingsauf-gabe 1 (a)

Next-Action [Ask for Level]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel Geometrie) oder um eine Aufgabe auf der Ebene eines Ab-schnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt Winkel imKapitel Geometrie) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlr Ab-schnitt

Student ich glaube es ist eine Sektion

Next-Action [Ask for Sub-Topic]Bot Wie heiszligt der Abschnitt in dem sich die Aufgabe befindet auf die sichdeine Frage bezieht

Student ich bearbeite rechenregeln und potenzen

Next-Action [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Quizd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student nope

Next-Action [Ask_For]Bot Welche der folgenden Punkten habe ich falsch erkannt Schreibe bittewelche der folgenden Punkte es sind a b c d e

Student c

State-Transition [Correct-Fields]Bot Gib bitte die korrekte Information fuumlr c) ein

Student Training

State-Transition [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Trainingd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yes

State-Transition [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 13 Showcase 5 Long Flow Correction of entries In bold type are theextracted entities

[ September 4 2020 at 1017 ndash classicthesis ]

36 multipurpose ipa via conversational assistant

Figure 4 Dialogue Flow Abbreviations UNK - unknown ID - informa-tional dictionary RegEx - regular expressions

[ September 4 2020 at 1017 ndash classicthesis ]

33 evaluation 37

33 evaluation

In order to get the feedback on the quality functionality and use-fulness of the proposed model we evaluated it in two steps firstwith an automated method (Section 331) utilizing 130 manually an-notated conversations to verify the robustness of the system Secondwith the help of human tutors (Section 332) from OMB+ to exam-ine the user experience We describe the details as well as the mostcommon errors below

331 Automated Evaluation

To manage an automated evaluation we manually assembled a datasetwith 130 conversational flows We predefined viable initial user ques-tions user answers as well as gold system responses These flowscover most frequent dialogues that we previously observed in human-to-human dialogue data We examined our system by implementinga self-chatting evaluation bot The evaluation cycle is as follows

bull The system accepts the first question from a predefined dia-logue via an API request preprocesses and analyzes it to de-termine the intent and extract entities

bull Then it estimates the appropriate next action and responds ac-cordingly

bull This response is then compared to the systems gold answerIf the predicted result is correct then the system receives thenext predefined user input from the templated dialogue and re-sponds again as defined above This procedure proceeds untilthe dialogue terminates (ie ID is complete) Otherwise thesystem reports an unsuccessful case number

The system successfully passed this evaluation for all 130 cases

332 Human Evaluation amp Error Analysis

To evaluate the system on irregular cases we carried out experimentswith human tutors from the OMB+ platform The tutors are experi-enced regarding rare or complex issues ambiguous responses mis-spellings and other infrequent but still relevant problems which oc-cur during a natural language dialogue In the following we reviewsome common errors and describe additional observations

misspellings and confusable spelling frequently befall in theuser-generated text Since we attempt to let the conversationremain natural to provide better user experience we do notinstruct students for formal writing Therefore we have to ac-count for various writing issues One of the prevalent problems

[ September 4 2020 at 1017 ndash classicthesis ]

38 multipurpose ipa via conversational assistant

are misspellings German words are commonly long and canbe complex and because users often type quickly it can lead tothe wrong order of characters within a given word To tacklethis challenge we utilized fuzzy match within ES Though themaximum allowed edit distance in ElasticSearch (ES) is set to 2characters which means that all the misspellings beyond thisthreshold could not be correctly identified by ES (eg Differenzial-rechnung vs Differnetialrechnung) Another representative exam-ple would be the formulation of the section or question num-ber The equivalent information can be communicated in sev-eral discrete ways which has to be considered by RegEx unit(eg Aufgabe 5 a Aufgabe V a Aufgabe 5 (a)) A similar prob-lem occurs with confusable spelling (ie Differenzialrechnung vsDifferenzialgleichung)

We analyzed the cases stated above and improved our modelby adding some of the most common misspellings to the Elas-ticSearch (ES) database or handling them with RegEx during thepreprocessing step

elasticsearch threshold We observed both cases where thesystem failed to extract information although the user providedit and examples where ES extracted information not mentionedin a user query According to our understanding these errorsoccur due to the relevancy scoring algorithm of ElasticSearch (ES)where a documentrsquos score is a combination of textual similarityand other metadata based scores Our examination revealedthat ES mostly fails to extract the information if the user mes-sage is rather short (eg about 5 tokens) To overcome this diffi-culty we coupled the current input ut with the dialogue history(h = utmiddotmiddotmiddot tminus2) That eliminated the problem and improved theretrieval quality

To solve the opposite problem where ES extracts false infor-mation (or information that was not mentioned in a query)was more challenging We learned that the problem comesfrom short words or sub-words (ie suffix prefix) which ES

locates in a database and considers them credible enough TheES documentation suggests getting rid of stop words to elim-inate this behavior However this did not improve the searchAlso fine-tuning of ES parameters such as the relevance thresholdprefix length5 and minimum should match6 parameter did not re-sult in notable improvements To cope with this problem weimplemented a verification step where a user can edit the erro-neously retrieved data

5The amount of fuzzified characters This parameter helps to decrease the num-ber of terms which must be reviewed

6Indicates a number of terms that must match for a document to be consideredrelevant

[ September 4 2020 at 1017 ndash classicthesis ]

34 structured dialogue acquisition 39

34 structured dialogue acquisition

As we stated the implemented system not only supports the humantutor by assisting students but also collects structured and labeledtraining data in the background In a trial run of the rule-based sys-tem we accumulated a toy-dataset with dialogues The assembleddataset includes the following

bull Plain dialogues with unique dialogue-indexes

bull Plain ID information collected for the whole conversation

bull Pairs of questions (ie user requests) and responses (ie botreplies) with the unique dialogue- and turn-indexes

bull Triples in the form of (Question Next Action Response) Informa-tion on the next systemrsquos action could be employed to train aDM unit with (deep-) machine learning algorithms

bull On every conversational state we keep the entities that the sys-tem was able to extract along with their position in the utter-ance This information could be used to train a custom domain-specific NER model

35 summary

In this work we realized a conversational agent for IPA purposesthat addresses two problems First it decreases repetitive and time-consuming activities which allows workers of the e-learning plat-form to focus solely on mathematical and hence cognitively demand-ing questions Second by interacting with users it augments theresources with structured and labeled training data The latter canbe utilized for further implementation of learnable dialogue compo-nents

The realization of such a system was connected with multiple chal-lenges Among others were missing structured conversational dataambiguous or erroneous user-generated text and the necessity todeal with existing corporate tools and their design

The proposed conversational agent enabled us to accumulate struc-tured labeled data without any special efforts from the human (ietutors) side (eg manual annotation and post-processing of existingdata change of the conversational structure within the tutoring pro-cess) Once we collected structured dialogues we were able to re-train particular segments of the system with deep learning methodsand achieved consistent performance for both of the proposed tasksWe report the re-implemented elements in Chapter 4

At the time of writing this thesis the system described in this workwas deployed at the OMB+ platform

[ September 4 2020 at 1017 ndash classicthesis ]

40 multipurpose ipa via conversational assistant

36 future work

Possible extensions of our work are

bull In this work we implemented a rule-based dialogue modelfrom scratch and adjusted it for the specific domain (ie e-learning) The core of the model ndash is a dialogue manager thatdetermines the current state of the conversation and the possi-ble next action Rule-based systems are generally consideredto be hardly adaptable to new domains however our dialoguemanager proved to be flexible to slight modifications in a work-flow One of the possible directions of future work would bethus the investigation of the general adaptability of the dialoguemanager core to other scenarios and domains (eg differentcourse)

bull A further possible extension would we the introduction of dif-ferent languages The current model is implemented for the Ger-man language but the OMB+ platform provides this course alsoin English and Chinese Therefore it would be interesting toinvestigate whether it is possible to adjust the existing systemto other languages without drastic changes in the model

[ September 4 2020 at 1017 ndash classicthesis ]

4D E E P L E A R N I N G R E - I M P L E M E N TAT I O N O F U N I T S

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research described inthis chapter was carried out in its entirety by the authorof this thesis The other authors of the publication actedas advisors

In this chapter we address the challenge of re-implementation oftwo central components of our IPA dialogue system described in Chap-ter 3 by employing deep learning techniques Since the data for thetraining was collected in a (short) trial-run of the rule-based systemit is not sufficient to train a neural network from scratch Thereforewe utilize Transfer Learning (TL) methods and employ the previouslypre-trained Bidirectional Encoder Representations from Transform-ers (BERT) model that we fine-tune for two of our custom tasks

41 introduction

Data mining and machine learning technologies have gained notableadvancement in many research fields including but not limited tocomputer vision robotics advanced analytics and computational lin-guistics (Pan and Yang 2009 Trautmann et al 2019 Werner et al2015) Besides that in recent years deep neural networks with theirability to accurately map from inputs to outputs (ie labels imagessentences) while analyzing patterns in data became a potential solu-tion for many industrial automatization tasks (eg fake news detec-tion sentiment analysis)

Nevertheless conventional supervised models still have limitationswhen applied to real-world data and industrial tasks The first chal-lenge here is a training phase since a robust model requires an im-mense amount of structured and labeled data Still there are just afew cases where such data is publicly available (eg research datasets)but for most of the tasks domains and languages the data has tobe gathered over a long time The second hurdle is that if trainedon existing data from a distinct domain a supervised model cannotgeneralize to conditions that are different from those faced in thetraining dataset Most of the machine learning methods perform cor-rectly only under a general assumption the training and test dataare mapped to the same feature space and the same distribution (Panand Yang 2009) When the distribution changes most statistical mod-els need to be retrained from scratch using newly obtained training

41

[ September 4 2020 at 1017 ndash classicthesis ]

42 deep learning re-implementation of units

samples It is especially crucial if applying such approaches to real-world data that is usually very noisy and contains a large numberof scenarios In this case a model would be ill-trained to make deci-sions about novel patterns Furthermore for most of the real-worldapplications it is expensive or even not feasible at all to recollect therequired training data to retrain the model Hence the possibility totransfer the knowledge obtained during the training on existing datato the new unseen domain is one of the possible solutions for indus-trial problems Such a technique is called Transfer Learning (TL) andit allows dealing with the abovementioned scenarios by leveragingexisting labeled data of some related tasks or domains

411 Outline and Contributions

This work addresses the challenge of re-implementing two out ofthree central components in our rule-based IPA conversational assis-tant with deep learning methods As we discussed in Chapter 2 hy-brid dialogue models (ie consisting of both rule-based and trainablecomponents) appear to replace the established rule- and template-based methods which are to the most part employed in the in-dustrial setting To investigate this assumption and examine currentstate-of-the-art deep learning techniques we conducted experimentson our custom domain-specific data Since the available data werecollected in a trial-run and the obtained dataset was relatively smallto train a conventional machine learning model from scratch we uti-lized the Transfer Learning (TL) approach and fine-tuned the existingpre-trained model for our target domain

For the experiments we defined two tasks

bull First we considered the NER problem in a custom domain set-ting We defined a sequence labeling task and employed a Bidirec-tional Encoder Representations from Transformers (BERT) (De-vlin et al 2018) as one of the transfer learning methods widelyutilized in the NLP area We applied the model on our datasetand fine-tuned it for seven (7) domain-specific (ie e-learning)entities

bull Second we examined the effectiveness of transfer learning forthe dialogue manager core The DM is one of the fundamentalparts of the conversational agent and requires compelling learn-ing techniques to hold the conversation intelligently For thatexperiment we defined a classification task and applied the BERT

model to predict the systems next action for every given userinput in a conversation We then computed the macro F1-scorefor 14 possible classes (ie actions) and an average dialogueaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

42 related work 43

The main characteristics of both settings are

bull the set of named entities and labels (ie actions) between sourceand target domain is different

bull for both settings we utilized the BERT model in German andmultilingual versions with additional parameters

bull the dataset used to train the target model is small and consistsof highly domain-specific labels The latter is often appearingin industrial scenarios

We finally verified that the selected method performed reasonablywell on both tasks We reached the performance of 093 macro F1

points for Named Entity Recognition (NER) and 075 macro F1 pointsfor the Next Action Prediction (NAP) task Therefore we concludethat both NER and NAP components could be employed to substituteor extend the existing rule-based modules

42 related work

As discussed in the previous section the main benefit of TransferLearning (TL) is that it allows the domains and tasks used duringtraining and testing phases to be different Initial steps in the for-mulation of transfer learning were made after NeurIPS-951 workshopthat was focused on the demand for lifelong machine-learning meth-ods that preserve and reuse previously learned knowledge (Chen etal 2018) Research on transfer learning has attracted more and moreattention since then especially in the computer vision domain andin particular for image recognition tasks (Donahue et al 2014 SharifRazavian et al 2014)

Recent studies have shown that TL can also be successfully appliedwithin the Natural Language Processing (NLP) domain especially onsemantically equivalent tasks (Dai et al 2019 Devlin et al 2018Mou et al 2016) Several research works were carried out specificallyon the Named Entity Recognition (NER) task and explored the capa-bilities of transfer learning when applied to different named entitycategories (ie different output spaces) For instance Qu et al (2016)pre-trained a linear-chain Conditional Random Fields (CRF) on a largeamount of labeled data in the source domain Then they introduceda two linear layer neural network that learns the difference betweenthe source and target label distributions Finally the authors initial-ized another CRF with parameters learned in linear layers to predictthe labels of the target domain Kim et al (2015) conducted experi-ments with transferring features and model parameters between re-lated domains where the label classes are different but may have asemantic similarity Their primary approach was to construct labelembeddings to automatically map the source and target label classesto improve the transfer (Chen et al 2018)

1Conference on Neural Information Processing Systems Formerly NIPS

[ September 4 2020 at 1017 ndash classicthesis ]

44 deep learning re-implementation of units

However there are also models trained in a manner such that theycould be applied to multiple NLP tasks simultaneously Among sucharchitectures are ULMFit (Howard and Ruder 2018) BERT (Devlin etal 2018) GPT2 (Radford et al 2019) and TransformerXL (Dai et al2019) ndash powerful models that were recently proposed for transferlearning within the NLP domain These architectures are called lan-guage models since they attempt to learn language structure from largetextual collections in an unsupervised manner and then predict thenext word in a sequence based on previously observed context wordsThe Bidirectional Encoder Representations from Transformers (BERT)(Devlin et al 2018) model is one of the most popular architectureamong the abovementioned and it is also one of the first that showeda close to human-level performance on the General Language Un-derstanding Evaluation (GLUE) benchmark2 (Wang et al 2018) thatincludes tasks on sentiment analysis question answering semanticsimilarity and language inference The architecture of BERT consistsof stacked transformer blocks that are pre-trained on a large corpusconsisting of 800M words from modern English books (ie BooksCor-pus by Zhu et al) and 2 500M words of text from pre-processed (iewithout markup) English Wikipedia articles Overall BERT advancedthe state-of-the-art performance for eleven (11) NLP tasks We describethis approach in detail in the subsequent section

43 bert bidirectional encoder representations from

transformers

Bidirectional Encoder Representations from Transformers (BERT) ar-chitecture is divided into two steps pre-training and fine-tuning Thepre-training step was enabled through two following tasks

bull Masked Language Model ndash the idea behind this approach is totrain a deep bidirectional representation that is different fromconventional language models which are either trained left-to-right or right-to-left To train a bidirectional model the authorsmask out a certain number of tokens in a sequence The modelis then asked to predict the original words for masked tokens Itis essential that in contrast to denoising auto-encoders (Vincentet al 2008) the model does not need to reconstruct the entireinput instead it attempts to predict the masked words Sincethe model does not know which words exactly it will be askedabout it learns a representation for every token in a sequence(Devlin et al 2018)

bull Next Sentence Prediction ndash aims to capture relationships be-tween two sentences A and B In order to train the model theauthors pre-trained a binarized next sentence prediction taskwhere pairs of sequences were labeled according to the criteriaA and B either follow each other directly in the corpus (label

2httpsgluebenchmarkcom

[ September 4 2020 at 1017 ndash classicthesis ]

43 bert bidirectional encoder representations from transformers 45

IsNext) or are both taken from random places (label NotNext)The model is then asked to predict whether the former or thelatter is the case The authors demonstrated that pre-trainingtowards this task is extremely useful for both question answeringand natural language inference tasks (Devlin et al 2018)

Furthermore BERT uses WordPiece tokenization for embeddings(Wu et al 2016) which segments words into subword-level tokensThis approach allows the model to make additional inferences basedon word structure (ie -ing ending denotes similar grammatical cat-egories)

The fine-tuning step follows the pre-training process For that theBERT model is initialized with parameters obtained during the pre-training step and a task-specific fully-connected layer is used aftertransformer blocks which maps the general representations to cus-tom labels (employing softmax operation)

self-attention The key operation of BERT architecture is theself-attention which is a sequence-to-sequence operation with conven-tional encoder-decoder structure (Vaswani et al 2017) The encodermaps an input sequence (x1 x2 xn) to a sequence of continuousrepresentations z = (z1 z2 zn) Given z the decoder then gener-ates an output sequence (y1y2 ym) of symbols one element ata time (Vaswani et al 2017) To produce output vector yi the selfattention operation takes a weighted average over all input vectors

yi =sumj

wijxj (16)

where j is an index over the entire sequence and the weights sumto one (1) over all j The weight wij is derived from a dot productfunction over xi and xj (Bloem 2019 Vaswani et al 2017)

w primeij = xTi xj (17)

The dot product returns a value between negative and positive in-finity and a softmax maps these values to [0 1] to ensure that theysum to one (1) over the entire sequence (Bloem 2019 Vaswani et al2017)

wij =expwlsquoijsumj expwlsquoij

(18)

A self-attention is not the only operation in transformers but is theessential one since it propagates information between vectors Otheroperations in the transformer are applied to every vector in the inputsequence without interactions between vectors (Bloem 2019 Vaswaniet al 2017)

[ September 4 2020 at 1017 ndash classicthesis ]

46 deep learning re-implementation of units

44 data amp descriptive statistics

The benchmark that we accumulated through the trial-run includes300 structured and partially labeled dialogues with the average lengthof dialogue being six (6) utterances All the conversations were heldin the German language Detailed general statistics can be found inTable 14

Value

Max Len Dialogue (in utterances) 15

Avg Len Dialogue (in utterances) 6

Max Len Utterance (in tokens) 100

Avg Len Utterance (in tokens) 9

Overall Unique Action Labels 13

Overall Unique Entity Labels 7

Train ndash Dialogues ( Utterances) 200 (1161)

Eval ndash Dialogues ( Utterances) 50 (279)

Test ndash Dialogues ( Utterances) 50 (300)

Table 14 General statistics for conversational dataset

The dataset covers 14 unique next actions and seven (7) uniquenamed entities Detailed information on next actions and entities canbe found in Table 15 and Table 16 Below we listed possible actionswith the corresponding explanation and predefined mapped state-ments

unk ndash requests to define the intent of the question mathematicalorganizational or contextual (ie fallback policy in case if theautomated intent recognition unit fails) rarr ldquoHast du eine Fragezu einer Aufgabe (MATH) zu einem Text im Kurs (TEXT) oder eineorganisatorische Frage (ORG)rdquo

org ndash handovers discussion to a human tutor in case of organiza-tional question rarr ldquoEs scheint sich wohl um eine organisatorischeFrage zu handeln Bitte fasse deine Frage in einem kurzen Text zusam-men damit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

text ndash handovers discussion to a human tutor in case of contextualquestion rarr ldquoEs scheint sich wohl um eine inhaltliche Frage zuhandeln Bitte fasse deine Frage in einem kurzen Text zusammendamit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

[ September 4 2020 at 1017 ndash classicthesis ]

44 data amp descriptive statistics 47

topic ndash requests information on the topic rarr ldquoWelches Kapitel bear-beitest du gerade Antworte bitte mit der Kapitelnummer zB IA IB II oder mit dem Kapitelnamen zB Geometrierdquo

examination ndash requests information about the examination moderarrldquoBearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfungrdquo

subtopic ndash requests information about the subtopic in case of quizor section-level trainingexercise modes rarr ldquoWie heiszligt der Ab-schnitt in dem sich die Aufgabe befindet auf die sich deine Fragebeziehtrdquo

level ndash requests information about the level of examination modechapter or section rarr ldquoGeht es um eine Aufgabe auf Kapitel-Ebene(zB um eine Trainingsaufgabe im Kapitel Geometrie) oder um eineAufgabe auf der Ebene eines Abschnitts in einem Kapitel (zB umeine Aufgabe zum Abschnitt Winkel im Kapitel Geometrie) Antwortebitte mit KAP fuumlr Kapitel und SEK fuumlr Abschnittrdquo

question number ndash requests information about the question num-ber rarr ldquoWelche Aufgabe bearbeitest du gerade Wenn du zB beiTeilaufgabe c in der zweiten Trainingsaufgabe (oder der zweiten Uumlbungs-aufgabe) bist antworte mit 2c Bearbeitest du gerade einen Quiz odereine Schlusspruumlfung gib bitte nur die Teilaufgabe anrdquo

final request ndash considers the Informational Dictionary (ID) to becomplete and initiates the verification step rarr ldquoDeine Angabenwaren wie folgt Habe ich dich richtig verstanden Bitte antwortemit Ja oder Neinrdquo

verify request ndash requests the student to verify the assembled in-formation rarr ldquoWelche der folgenden Punkte habe ich falsch erkanntSchreibe bitte welche der folgenden Punkte es sind [a b c ]3 rdquo

correct request ndash requests the student to provide a correct infor-mation if there is an error in the assembled data rarr ldquoGib bittedie korrekte Information fuumlr [a]4 einrdquo

exact question ndash requests the student to provide the detailed ex-planation of the problem rarr ldquoBitte fasse deine Frage in einemkurzen Text zusammen damit ich sie an meinen menschlichen Kolle-gen weiterleiten kannrdquo

human handover ndash finalizes the conversation rarr ldquoVielen Dankunsere menschlichen Kollegen melden sich gleich bei dirrdquo

3Variables that depend on the state of Informational Dictionary (ID)4Variable that was selected as erroneous and needs to be corrected

[ September 4 2020 at 1017 ndash classicthesis ]

48 deep learning re-implementation of units

Action Counts Action Counts

Final Request 321 Unk 80

Human Handover 300 Subtopic 55

Exact Question 286 Correct Request 40

Question Number 176 Verify Request 34

Examination 175 Org 17

Topic 137 Text 13

Level 130

Table 15 Detailed statistics on possible systems actions Columns Countsdenote the number of occurrences of each action in the entiredataset

Entity Counts

Question Nr 317

Chapter 311

Examination 303

Subtopic 198

Level 80

Intent 70

Table 16 Detailed statistics on possible named entities Column Counts de-note the number of occurrences of each entity in the entire dataset

45 named entity recognition

We defined a sequence labeling task to extract custom entities fromuser input We considered seven (7) viable entities (see Table 16) tobe recognized by the model topic subtopic examination mode and levelquestion number intent as well as the entity other for remaining wordsin the utterance Since the data collected from the rule-based sys-tem already includes information on the entities we able to train adomain-specific NER unit However since the original user-input wasinformal the same information could be provided in different writ-ing styles That means that a single entity could have different surfaceforms (eg synonyms writing styles) Therefore entities that we ex-tracted from the rule-based system were transformed into a universalstandard (eg official chapter names) To consider all of the variableentity forms while post-labeling the original dataset we determinedgeneric entity names (eg chapter question nr) and mapped vari-ations of entities from the user input (eg Chapter = [ElementaryCalculus Chapter I ]) to them The overall dataset consists of 300labeled dialogues where 200 (with 1161 utterances) were employedfor training and 100 for evaluation and test sets (50 dialogues withca 300 utterances for each set respectively)

[ September 4 2020 at 1017 ndash classicthesis ]

46 next action prediction 49

451 Model Settings

We conducted experiments with German and multilingual BERT im-plementations5 The capitalization of words is significant for the Ger-man language therefore we run the experiments on the capitalizeddata while preserving the original punctuation We employed the avail-able base model for both multilingual and German BERT implemen-tations in the cased version We initiated the learning rate for bothmodels to 1e minus 4 and the maximum length of the tokenized inputwas set to 128 tokens We conducted the experiments multiple timeswith different seeds for a maximum of 50 epochs with the trainingbatch size set to 32 We utilized AdamW as the optimizer and appliedearly stopping if the performance did not improve significantly after5 epochs

46 next action prediction

We defined a classification problem where we aim to predict the sys-temrsquos next action according to the given user input We assumed 14custom actions (see Table 15) that we considered being our classesWhile collecting the toy dataset every input was automatically la-beled by the rule-based system with the corresponding next actionand the dialogue-id Thus no additional post-labeling was requiredon this step

We investigated two settings for this task

bull Default Setting Utilizing a user input and the correspondingclass (ie next action) without any supplementary context Bydefault we conduct all of our experiments in this setting

bull Extended Setting Employing a user input corresponding nextaction and previous systems action as a source of supplementarycontext For this setting we experimented with the best per-forming model from the default setting

The overall dataset consists of 300 labeled dialogues where 200(with 1161 utterances) were employed for training and 100 for evalu-ation and test sets (50 dialogues with ca 300 utterances for each setrespectively)

461 Model Settings

For the NAP task we carried out experiments with German and multi-lingual BERT implementations as well We examined the performanceof both capitalized and lower-cased inputs as well as plain and prepro-cessed data For the multilingual BERT we used the base model inboth cased and uncased modifications For the German BERT we

5httpsgithubcomhuggingfacetransformers

[ September 4 2020 at 1017 ndash classicthesis ]

50 deep learning re-implementation of units

utilized the base model in the cased modification only6 For bothmodels we initiated the learning rate to 4e minus 5 and the maximumlength of the tokenized input was set to 128 tokens We conductedthe experiments multiple times with different seeds for a maximumof 300 epochs with the training batch size set to 32 We utilized AdamW

as the optimizer and applied early stopping if the performance didnot improve significantly after 15 epochs

47 evaluation and results

To evaluate the models we computed word-level macro F1 score forthe Named Entity Recognition (NER) task and utterance-level macro F1

score for the Next Action Prediction (NAP) task The word-level F1

is estimated as the average of the F1 scores per class each computedfrom all words in the evaluation and test sets The outcomes for theNER task are represented in Table 17 For utterance-level F1 a singleclass label (ie next action) is taken for the whole utterance Theresults for the NAP task are shown in Table 18

We additionally computed average dialogue accuracy for the best per-forming NAP models This score indicates how well the predictednext actions compose the conversational flow The average dialogueaccuracy was estimated for 50 conversations in the evaluation andtest sets respectively The results are displayed in Table 19

The obtained results for the NER task revealed that German BERT

performed significantly better than the multilingual BERT model Theperformance of the custom NER unit is at 093 macro F1 points for allpossible named entities (see Table 17) In contrast for the NAP taskthe multilingual BERT model obtained better performance than theGerman BERT model The best performing method in the default set-ting gained a macro F1 of 0677 points for 14 possible classes whereasthe model in the extended setting performed better ndash its best macroF1 score is 0752 for the equal number of classes (see Table 18)

Regarding the dialogue accuracy the extended system fine-tunedwith multilingual BERT obtained better outcomes as the default one(see Table 19) The overall conclusion for the NAP is that the capi-talized setting increased the performance of the model whereas theinclusion of punctuation has not influenced the results

6Uncased pre-trained modification of model was not available

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 51

Task Model Cased Punct F1

Eval Test

NER GER X X 0971 0930

NER Mult X X 0926 0905

Table 17 Word-level F1 for the Named Entity Recognition (NER) task Inbold best performance for evaluation and test sets

Task Model Cased Punct Extended F1

Eval Test

NAP GER X times times 0711 0673

NAP GER X X times 0701 0606

NAP Mult X X times 0688 0625

NAP Mult X times times 0769 0677

NAP Mult X times X 0810 0752

NAP Mult times X times 0664 0596

NAP Mult times times times 0742 0502

Table 18 Utterance-level F1 for the Next Action Prediction (NAP) task Un-derlined best performance for evaluation and test sets for defaultsetting (without previous action context) In bold best perfor-mance for evaluation and test sets on extended setting (with previ-ous action context)

Model Accuracy

Eval Test

NAP default 0765 0724

NAP extended 0813 0801

Table 19 Average dialogue accuracy computed for the Next Action Predic-tion (NAP) task for best preforming models In bold best perfor-mance for evaluation and test sets

[ September 4 2020 at 1017 ndash classicthesis ]

52 deep learning re-implementation of units

Figure5M

acroF1

evaluationN

AP

ndashdefault

modelC

omparison

between

Germ

anand

multilingual

BERT

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 53

Figu

re6

Mac

roF1

test

NA

Pndash

defa

ult

mod

elC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

54 deep learning re-implementation of units

Figure7M

acroF1

evaluationN

AP

ndashextended

model

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 55

Figu

re8

Mac

roF1

test

NA

Pndash

exte

nded

mod

el

[ September 4 2020 at 1017 ndash classicthesis ]

56 deep learning re-implementation of units

Figure9M

acroF1

evaluationN

ERndash

Com

parisonbetw

eenG

erman

andm

ultilingualBER

T

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 57

Figu

re1

0M

acroF1

test

NER

ndashC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

58 deep learning re-implementation of units

48 error analysis

We then investigated the cases where models failed to predict thecorrect action or labeled the named entity span erroneously Belowwe outline the most common errors

next action prediction One of the prevalent flaws in the de-fault model was the mismatch between two consecutive actionsndash the action question number and subtopic That is due to theposition of these actions in the conversational flow The occur-rence of both actions in the dialogue is not strict and largelydepends on the previous action To explain this we illustratethe following possible conversational scenarios

bull The system can either directly proceed with the request forthe question number if the examination mode is the finalexamination

bull If the examination mode is training or exercise the systemhas to ask about the level first (ie chapter or section)

bull If it is a chapter-level the system can again directly pro-ceed with the request for the question number

bull In the case of section-level the system has to ask aboutthe subsection first (if this information was not providedbefore)

The examination of the extended model showed that the inductionof supplementary context (ie previous action) improved theperformance of the system by about 60

named entity recognition The cases where the system failedcover mismatches between the labels chapter and other andthe labels question number and other This failure arose dueto the imperfectly labeled span of a multi-word named entity(eg ldquoElementary Calculus Proportionality Percentagerdquo) The firstor last words in the named entity were excluded from the spanand erroneously labeled with the other label

49 summary

In this chapter we re-implemented two of three fundamental unitsof our IPA conversational agent ndash Named Entity Recognition (NER)and Dialogue Manager (DM) ndash using the methods of Transfer Learn-ing (TL) specifically the Bidirectional Encoder Representations fromTransformers (BERT) model

We believe that the obtained results are reasonably high consider-ing a comparatively little amount of training data we employed tofine-tune the models Therefore we conclude that both NAP and NER

segments could be used to replace or extend the existing rule-based

[ September 4 2020 at 1017 ndash classicthesis ]

410 future work 59

modules Rule-based components as well as ElasticSearch (ES) arelimited in their capacities and not flexible to novel patterns Whereasthe trainable units generalize better and could possibly decrease theamount of inaccurate predictions in case of accidental dialogue be-havior Furthermore to increase the overall robustness rule-basedand trainable components could be used synchronously when onesystem fails (eg ElasticSearch (ES) module can not extract an entity)the conversation continues on the prediction obtained from the othermodel

410 future work

There are possible directions for future work

bull At the time of writing of this thesis both units were trained andinvestigated on its own ndash that means without being incorpo-rated into the initial IPA conversational system Thus one of thepotential directions of future work would be the embodimentof these units into the original system and the examination ofpossible failure cases Furthermore the resulting hybrid sys-tem could then imply the constraint of additional meta policiesto prevent an unexpected systems behavior

bull Further investigation could be towards the multi-language mo-dality for the re-implemented units Since the Online Mathe-matik Bruumlckenkurs (OMB+) platform also supports English andChinese it would be interesting to examine whether the sim-ple translation from target language (ie English Chinese) tosource language (ie German) would be sufficient to employalready-assembled dataset and pre-trained units Presumablyit should be achievable in the case of the Next Action Predic-tion (NAP) unit but could lead to a noticeable performance dropfor the Named Entity Recognition (NER) task since there couldbe a considerable gap between translated and original entitiesspans

bull Last but not least one of the further extensions for the IPA

would be the eventual applicability for more cognitively de-manding tasks For instance it is of interest to extend the modelfor more complex domains like handling of mathematical ques-tions Would it be possible to automate the human assistance atleast by less complicated mathematical inquiries or is it still un-achievable with currently existing NLP techniques and machinelearning methods

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

Part II

C L U S T E R I N G - B A S E D T R E N D A N A LY Z E R

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

5F O U N D AT I O N S

As discussed in Chapter 1 predictive analytics is one of the poten-tial Intelligent Process Automation (IPA) tasks due to its capabilityto forecast future events by using current and historical data andtherethrough assist knowledge workers with for instance sales or in-vestments Modern machine learning algorithms allow an intelligentand fast analysis of large amounts of data However the evaluationof such algorithms and models remains one of the grave problemsin this domain This is mostly due to the lack of publicly availablebenchmarks for the evaluation of topic modeling and trend detectionalgorithms Hence in this part of the thesis we are addressing thischallenge and assembling the benchmark for detecting both trendsand downtrends (ie topics that become less frequent overtime) Tothe best of our knowledge the task of downtrend detection has notbeen well addressed before

In this chapter we introduce the concepts required by the secondpart of the thesis We begin with the enlightenment to the topic mod-eling task and define the emerging trend in Section 51 Next weexamine the related work existing approaches to topic modeling andtrend detection and their limitations in Section 52 In Chapter 6we introduce our TRENDNERT benchmark for trend and downtrenddetection and describe the method we employed for its assembly

51 motivation and challenges

Science changes and evolves rapidly novel research areas emergewhile others fade away Keeping pace with these changes is challeng-ing Therefore recognizing and forecasting emerging research trends isof significant importance for researchers academic publishers as wellas for funding agencies stakeholders and innovation companies

Previously the task of trend detection was mostly solved by do-main experts who used specialized and cumbersome tools to inves-tigate the data to get useful insights from it (eg through data visu-alization or statistics) However manual analysis of large amountsof data could be time-consuming and hiring domain experts is verycostly Furthermore the overall increase of research data (ie GoogleScholar PubMed DBLP) in the past decade makes the approachbased on human domain experts less and less scalable In contrastautomated approaches become a more desirable solution for the taskof emerging trend identification (Kontostathis et al 2004 Salatino2015) Due to a large number of electronic documents appearing on

63

[ September 4 2020 at 1017 ndash classicthesis ]

64 foundations

the web daily several machine learning techniques were developed tofind patterns of word occurrences in those documents using hierarchi-cal probabilistic models These approaches are called ndash topic modelsand they allow the analysis of extensive textual collections and valu-able insights from them The main advantage of topic models is theirability to discover hidden patterns of word-use and connecting docu-ments containing similar patterns In other words the idea of a topicmodel is that documents are a mixture of topics where a topic is de-fined as a probability distribution over words (Alghamdi and Alfalqi2015)

Topics have a property of being able to evolve thus modeling top-ics without considering time may confuse the precise topic identifi-cation Whereas modeling topics by considering time is called topicevolution modeling and it can reveal important hidden informationin the document collection allowing recognizing topics with the ap-pearance of time This approach also provides the ability to check theevolution of topics during a given time window and examine whethera given topic is gaining interest and popularity over time (ie becom-ing an emerging trend) or if its development is stable Considering thelatter approach we turn to the task of emerging trend identification

How is an emerging trend defined An emerging trend is a topicthat is gaining interest and utility over time and it has two attributesnovelty and growth (Small Boyack and Klavans 2014 Tu and Seng2012) Rotolo Hicks and Martin (2015) explain the correlations ofthese two attributes in the following way Up to the point of theemergence a research topic is specified by a high level of novelty Atthat time it does not attract any considerable attention from the sci-entific community and due to the limited impact its growth is nearlyflat After some turning points (ie the appearance of significant sci-entific publications) the research topic starts to grow faster and maybecome a trend if the growth is fast however the level of a noveltydecreases continuously once the emergence becomes clear Then atthe final phase the topic becomes well-established while its noveltylevels out (He and Chen 2018)

To detect trends in textual data one employs computational anal-ysis techniques and Natural Language Processing (NLP) In the sub-sequent section we give an overview of the existing approaches fortopic modeling and trend detection

52 detection of emerging research trends

The task of trend detection is closely associated with the topic mod-eling task since in the first instance it detects the latent topics in thesubstantial collection of data It secondarily employs techniques toinvestigate the evolution of the topic and whether it has the potentialto become a trend

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 65

Topic modeling has an extensive history of applications in the scien-tific domain including but not limited to studies of temporal trends(Griffiths and Steyvers 2004 Wang and McCallum 2006) and investi-gation of the impact prediction (Yogatama et al 2011) Below we givea general overview of the most significant methods used in this do-main explain their importance and if any the potential limitations

521 Methods for Topic Modeling

In order to extract a topic from textual documents (ie research pa-pers blog posts) the precise definition of the model for topic repre-sentation is crucial

latent semantic analysis or formerly called Latent SemanticIndexing is one of the fundamental methods in the topic model-ing domain proposed by Deerwester et al in 1990 Its primarygoal is to generate vector-based representations for documentsand terms and then employ these representations to computethe similarity between documents (resp terms) in order to se-lect the most related ones In this regard Latent Semantic Anal-ysis (LSA) takes a matrix of documents and terms and then de-composes it into a separate document-topic matrix and a topic-term matrix Before turning to the central task of finding latenttopics that capture the relationships among the words and docu-ments it also performs dimensionality reduction utilizing trun-cated Singular Value Decomposition This technique is usuallyapplied to reduce the sparseness noisiness and redundancyacross dimensions in the document-topic matrix Finally usingthe resulting document and term vectors one applies measuressuch as cosine similarity to evaluate the similarity of differentdocuments words or queries

One of the extensions to traditional LSA is probabilistic LatentSemantic Analysis (pLSA) that was introduced by Hofmann in1999 to fix several shortcomings The foremost advantage ofthe follow-up method was that it could successfully automatedocument indexing which in turn improved the LSA in a prob-abilistic sense by using a generative model (Alghamdi and Al-falqi 2015) Furthermore the pLSA can differentiate betweendiverse contexts of word usage without utilizing dictionariesor a thesaurus That affects better polysemy disambiguationand discloses typical similarities by grouping words that sharea similar context Among the works that employ pLSA is theapproach proposed by Gohr et al (2009) where authors applypLSA in a window that slides across the stream of documents toanalyze the evolution of topics Also Mei et al (2008) uses thepLSA in order to create a network of topics

[ September 4 2020 at 1017 ndash classicthesis ]

66 foundations

latent dirichlet allocation was developed by Blei Ng andJordan (2003) to overcome the limitations of both LSA and pLSA

models Nowadays Latent Dirichlet Allocation (LDA) is ex-pected to be one of the most utilized and robust probabilistictopic models The fundamental concept of LDA is the assump-tion that every document is represented as a mixture of topicswhere each topic is a discrete probability distribution that de-fines the probability of each word to appear in a given topicThese topic probabilities provide a compressed representationof a document In other words LDA models all of d documentsin the collection as a mixture over n latent topics where eachof the topics describes a multinomial distribution over awwordin the vocabulary (Alghamdi and Alfalqi 2015)

LDA fixes such weaknesses of the pLSA like the incapacity toassign a probability to previously unseen documents (becausepLSA learns the topic mixtures only for documents seen in thetraining phase) and the overfitting problem (ie the number ofparameters in pLSA increases linearly with the size of the corpus)(Salatino 2015)

An extension of the conventional LDA is the hierarchical LDA

(Griffiths et al 2004) where topics are grouped in hierarchiesEach node in the hierarchy is associated with a topic and a topicis a distribution across words Another comparable approach isthe Relational Topic Model (Chang and Blei 2010) which is amixture of a topic model and a network model for groups oflinked documents The attribute of every document is its text(ie discrete observations taken from a fixed vocabulary) andthe links between documents are hyperlinks or citations

LDA is a prevalent method for discovering topics (Blei and Laf-ferty 2006 Bolelli Ertekin and Giles 2009 Bolelli et al 2009Hall Jurafsky and Manning 2008 Wang and Blei 2011 Wangand McCallum 2006) However it was explored that to trainand fine-tune a stable LDA model could be very challenging andtime-consuming (Agrawal Fu and Menzies 2018)

correlated topic model One of the main drawbacks of the LDA

is its incapability of capturing dependencies among the topics(Alghamdi and Alfalqi 2015 He et al 2017) Therefore Bleiand Lafferty (2007) proposed the Correlated Topic Model as anextension of the LDA approach In this model the authors re-placed the Dirichlet distribution with a logistic-normal distribu-tion that models pairwise topic correlations with the Gaussiancovariance matrix (He et al 2017)

However one of the shortcomings of this model is the compu-tational cost The number of parameters in the covariance ma-trix increases quadratic to the number of topics and parameterestimation for the full-rank matrix can be inaccurate in high-dimensional space (He et al 2017) Therefore He et al (2017)

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 67

proposed their method to overcome this problem The authorsintroduced the model which learns dense topic embeddingsand captures topic correlations through the similarity betweenthe topic vectors According to He et al their method allowsefficient inference in the low-dimensional embedding space andsignificantly reduces previous cubic or quadratic time complex-ity to linear concerning the topic number

522 Topic Evolution of Scientific Publications

Although the topic modeling task is essential it has a significant off-spring task namely the modeling of topic evolution Methods able toidentify topics within the context of time are highly applicable in thescientific domain They allow understanding of the topic lineage andreveal how research on one topic influences another Generally thesemethods could be classified according to the primary sources of in-formation they employ text citations or key phrases

text-based Extensive textual collections have dynamic co-occurre-nces of word patterns that change over time Thus models thattrack topics evolution over time should consider both word co-occurrence patterns and the timestamps Blei and Lafferty 2006

In 2006 Wang and McCallum proposed a Non-Markov Continu-ous Time model for the detection of topical trends in the col-lection of NeurIPS publications (the overall timestamp of 17years was considered) The authors made the following as-sumption If the pattern of the word co-occurrence exists fora short time then the system creates a narrow-time-distributiontopic Whereas for the long-term patterns system generated abroad- time-distribution topic The principal objective of this ap-proach is that it models topic evolution without discretizing thetime ie the state at time t + t1 is independent of the state attime t (Wang and McCallum 2006)

Another work done by Blei and Lafferty (2006) proposed the Dy-namic Topic Models The authors assumed that the collectionof documents is organized based on certain time spans and thedocuments belonging to every time span are represented witha K-component model Thereby the topics are associated withtime span t and emerge from topics corresponding to span timetminus 1 One of the main differences of this model compared toother existing approaches is that it utilizes Gaussian distribu-tion for the topic parameters instead of the conventional Dirich-let distribution and therefore can capture the evolution overvarious time spans

[ September 4 2020 at 1017 ndash classicthesis ]

68 foundations

citation-based analysis is the second popular direction that isconsidered to be effective for trend identification He et al 2009Le Ho and Nakamori 2005 Shibata Kajikawa and Takeda2009 Shibata et al 2008 Small 2006 This method assumes thatcitations in their various forms (ie bibliographic citations co-citation networks citation graphs) indicate the meaningful rela-tionship between topics and uses citations to model the topicevolution in the scientific domain Nie and Sun (2017) utilizethis approach along with the Latent Dirichlet Allocation (LDA)and k-means clustering to identify research trends The authorsfirst use LDA to extract features and determine the optimal num-ber of topics Then they use k-means to obtain thematic clus-ters and finally they compute citation functions to identify thechanges of clusters over time

Though the citation-based approachrsquos main drawback is that aconsistent collection of publications associated with a specificresearch topic is required for these techniques to detect the clus-ter and novelty of the particular topic He and Chen 2018 Fur-thermore there are also many trend detection scenarios (egresearch news articles or research blog posts) in which citationsare not readily available That makes this approach not applica-ble to this kind of data

keyphrase-based Another standard solution is based on the useof keyphrase information extracted from research papers In thiscase every keyword is representative of a single research topicSystems like Saffron (Monaghan et al 2010) as well as Saffron-based models use this approach Saffron is a research toolkitthat provides algorithms for keyphrase extraction entity link-ing and taxonomy extraction Asooja et al (2016) utilize Saffronto extract the keywords from LREC proceedings and then pro-posed trend forecast based on regression models as an extensionto Saffron

However this method raises many questions including the fol-lowing how to deal with the noisiness of keywords (ie notrelevant keywords) and how to handle keywords that have theirhierarchies (ie sub-areas) Other drawbacks of this approachinclude the ambiguity of keywords (ie ldquoJavardquo as an island andldquoJavardquo as a programming language) or necessity to deal withsynonyms which could be treated as different topics

53 summary

An emerging trend detection algorithm aims to recognize topics thatwere earlier inappreciable but now gaining the importance and in-terest within the specific domain Knowledge of emerging trends isespecially essential for stakeholders researchers academic publish-ers and funding organizations Assume a business analyst in the

[ September 4 2020 at 1017 ndash classicthesis ]

53 summary 69

biotech company who is interested in the analysis of technical arti-cles for recent trends that could potentially impact the companiesinvestments A manual review of all the available information in aspecific domain would be extremely time-consuming or even not fea-sible However this process could be solved through Intelligent Pro-cess Automation (IPA) algorithms Such software would assist humanexperts who are tasked with identifying emerging trends through au-tomatic analysis of extensive textual collections and visualization ofoccurrences and tendencies of a given topic

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

6B E N C H M A R K F O R S C I E N T I F I C T R E N D D E T E C T I O N

This chapter covers work already published at interna-tional peer-reviewed conferences The relevant publica-tion is Moiseeva and Schuumltze 2020 The research describedin this chapter was carried out in its entirety by the authorof the thesis The other author of the publication acted asan advisor

Computational analysis and modeling of the evolution of trendsis a significant area of research because of its socio-economic impactHowever no large publicly available benchmark for trend detectioncurrently exists making a comparative evaluation of methods impos-sible We remedy this situation by publishing the benchmark TREND-NERT consisting of a set of gold trends (resp downtrends) and doc-ument labels that is available as an unlimited download and a largeunderlying document collection that can also be obtained for free Wepropose Mean Average Precision (MAP) as an evaluation measure fortrend detection and apply this measure in an investigation of severalbaselines

61 motivation

What is an emerging research topic and how is it defined in the litera-ture At the time of emergence a research topic often does not attractmuch recognition from the scientific society and is exemplified by justa few publications He and Chen (2018) associate the appearance ofa trend to some triggering event eg the publication of an article ina high-impact journal Later the research topic starts to grow fasterand becomes a trend (He and Chen 2018)

We adopt this as our key representation of trend in this paper atrend is a research topic with a strongly increasing number of publi-cations in a particular time interval In opposite to trends there arealso downtrends which refer to the topics that move lower in theirdemand or growth as they evolve Therefore we define a downtrendas the converse to a trend a downtrend is a research topic that has astrongly decreasing number of publications in a particular time inter-val Downtrends can be as important to detect as trends eg for afunding agency planning its budget

To detect both types of topics (ie trends and downtrends) intextual data one employs computational analysis techniques and NLPThese methods enable the analysis of extensive textual collectionsand gain valuable insights into topical trendiness and evolution over

71

[ September 4 2020 at 1017 ndash classicthesis ]

72 benchmark for scientific trend detection

time Despite the overall importance of trend analysis there is to ourknowledge no large publicly available benchmark for trend detectionThis makes a comparative evaluation of methods and algorithms im-possible Most of the previous work has used proprietary or moder-ate datasets or evaluation measures were not directly bound to theobjectives of trend detection (Gollapalli and Li 2015)

We remedy this situation by publishing the benchmark TREND-NERT1 The benchmark consists of the following

1 A set of gold trends compiled based on an extensive analysis ofthe meta-literature

2 A set of labeled documents (based on more than one millionscientific publications) where each document is assigned to atrend a downtrend or the class of flat topics and supported bythe topic-title

3 An evaluation script to compute Mean Average Precision (MAP)as the measure for the accuracy of trend detection

We make the benchmark available as unlimited downloads2 Theunderlying research corpus can be obtained for free from SemanticScholar3 (Ammar et al 2018)

611 Outline and Contributions

In this work we address the challenge of the benchmark creationfor (down)trend detection This chapter is structured as follows Sec-tion 62 introduces our model for corpus creation and its fundamen-tal components In Section 63 we present the crowdsourcing proce-dure for the benchmark labeling Section 65 outlines the proposedbaseline for trend detection our experimental setup and outcomesFinally we illustrate the details on the distributed benchmark in Sec-tion 66 and conclude in Section 67

Our contributions within this work are as follows

bull TRENDNERT is the first publicly available benchmark for (down)-trend detection

bull TRENDNERT is based on a collection of more than one milliondocuments It is among the largest that has been used for trenddetection and therefore offers a realistic setting for developingtrend detection algorithms

bull TRENDNERT addresses the task of detecting both trends anddowntrends To the best of our knowledge the task of down-trend detection has not been addressed before

1The name is a concatenation of trend and its ananym dnert Because it supportsthe evaluation of both trends and downtrends

2httpsdoiorg1017605OSFIODHZCT3httpsapisemanticscholarorgcorpus

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 73

62 corpus creation

In this section we explain the methodology we used to create theTRENDNERT benchmark

621 Underlying Data

We employ a corpus provided by Semantic Scholar4 It includes morethan one million papers published mostly between 2000 and 2015

in about 5000 computer science journals and conference proceedingsEvery record in the collection consists of a title key phrases abstractfull text and metadata

622 Stratification

The distribution of documents over time in the entire original datasetis skewed (see Figure 11) We discovered that clustering the entire col-lection as well as a random sample produces corrupt results becausemore weight is given to later years than to earlier years Thereforewe first generated a stratified sample of the original document collec-tion To this end we randomly pick 10 000 documents for each yearbetween 2000 and 2016 for an overall sample size of 160 000

Figure 11 Overall distribution of papers in the entire dataset Years 1975 to2016

4httpswwwsemanticscholarorg

[ September 4 2020 at 1017 ndash classicthesis ]

74 benchmark for scientific trend detection

623 Document representations amp Clustering

As we mentioned this workrsquos primary focus was the creation of abenchmark not the development of new algorithms or models fortrend detection Therefore for our experiments we have selected analgorithm based on k-means clustering as a simple baseline methodfor trend detection

In conventional document clustering documents are usually repre-sented as bag-of-words (BOW) feature vectors (Manning Raghavanand Schuumltze 2008) Those however have a major weakness theyneglect the semantics of words (Le and Mikolov 2014) Still the re-cent work in representation learning domain and particularly doc2vecndash the method proposed by Le and Mikolov (2014) ndash can provide rep-resentations to document clustering that overcome this weakness

The doc2vec5 algorithm is inspired by techniques for learning wordvectors (Mikolov et al 2013) and is capable of capturing semantic reg-ularities in textual collections This is an unsupervised approach forlearning continuous distributed vector representations for text (ieparagraphs documents) This approach maps documents into a vec-tor space such that semantically related documents are assigned sim-ilar vector representations (eg an article about ldquogenomicsrdquo is closerto an article about ldquogene expressionrdquo than to an article about ldquofuzzysetsrdquo) Formally each paragraph is mapped to the individual vectorrepresented by a column in matrix D and every word is also mappedto the individual vector represented by a column in matrix W Theparagraph vector and word vectors are concatenated to predict thenext word in a context (Le and Mikolov 2014) Other work has al-ready successfully employed this type of document representationsfor topic modeling combining them with both Latent Dirichlet Allo-cation (LDA) and clustering approaches (Curiskis et al 2019 DiengRuiz and Blei 2019 Moody 2016 Xie and Xing 2013)

In our work we run doc2vec on the stratified sample of 160 000papers and represent each document as a length-normalized vector(ie document embedding) These vectors are then clustered intok = 1000 clusters using the scikit-learn6 implementation of k-means(MacQueen 1967) with default parameters The combination of doc-ument representations with a clustering algorithm is conceptuallysimple and interpretable Also the comprehensive comparative eval-uation of topic modeling methods utilizing document embeddingsperformed by Curiskis et al (2019) showed that doc2vec feature repre-sentations with k-means clustering outperform several other methods7

on three evaluation measures (Normalized Mutual Information Ad-justed Mutual Information and Adjusted Rand Index)

5We use the implementation provided by Gensim httpsradimrehurekcom

gensimmodelsdoc2vechtml6httpsscikit-learnorgstable7hierarchical clustering k-medoids NMF and LDA

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 75

We run 10 trials of stratification and clustering resulting in 10 dif-ferent clusterings We do this to protect against the variability ofclustering and because we do not want to rely on a single clusteringfor proposing (down)trends for the benchmark

624 Trend amp Downtrend Estimation

Recalling our definitions of trend and downtrend A trend (resp down-trend) is defined as a research topic that has a strongly increasing(resp decreasing) number of publications in a particular time inter-val

Linear trend estimation is a widely used technique in predictive(time-series) analysis that proves statements about tendencies in thedata by correlating the measurements to the times at which they oc-curred (Hess Iyer and Malm 2001) Considering the definition oftrend (resp downtrend) and the main concept of linear trend esti-mation models we utilize the linear regression to identify trend anddowntrend candidates in resulting clustering sets

Specifically we count the number of documents per year for a clus-ter and then estimate the parameters of the best fitting line (ie slope)Clusters are then ranked according to the resulting slope and then = 150 clusters with the largest (resp smallest) slope are selectedas trend (resp downtrend) candidates Thus our definition of a single-clustering (down)trend candidate is a cluster with extreme ascendingor descending slope Figure 12 and Figure 13 give two examples of(down)trend candidates

625 (Down)trend Candidates Validation

As detailed in Section 623 we run ten series of clustering result-ing in ten discrete clustering sets We then estimated the (down)trendcandidates according to the procedure described in Section 624 Con-sequently for each of the ten clustering sets we obtained 150 trendand 150 downtrend candidates

Nonetheless for the benchmark among all the (down)trend can-didates across the ten clustering sets we want to sort out only thosethat are consistent overall sets To obtain consistent (down)trend candi-dates we apply a Jaccard coefficient metric that is used for measuringthe similarity and diversity of sample sets We define an equivalencerelation simR two candidates (ie clusters) are equivalent if their Jac-card coefficient is gt τsim where τsim = 05 Following this notion wecompute the equivalence of (down)trend candidates across the tensets

Finally we define an equivalence class as an equivalence class trendcandidate (resp equivalence class downtrend candidate) if it contains trend(resp downtrends) candidates in at least half of the clusterings This

[ September 4 2020 at 1017 ndash classicthesis ]

76 benchmark for scientific trend detection

Figure 12 A trend candidate (positive slope) Topic Sentiment and EmotionAnalysis (using Social Media Channels)

Figure 13 A downtrend candidate (negative slope) Topic XML

[ September 4 2020 at 1017 ndash classicthesis ]

63 benchmark annotation 77

procedure gave us in total 110 equivalence class trend candidates and107 equivalence class downtrend candidates that we then annotatefor our benchmark8

63 benchmark annotation

To annotate the equivalence class (down)trend candidates obtainedfrom our model we used the Figure-Eight9 platform It provides anintegrated quality-control mechanism and a training phase before theactual annotation (Kolhatkar Zinsmeister and Hirst 2013) whichminimizes the rate of poorly qualified annotators

We design our annotation task as follows A worker is shown adescription of a particular (down)trend candidate along with the list ofpossible (down)trend tags Descriptions are mixed and presented inrandom order The temporal presentation order of candidates is alsoarbitrary The worker is asked to pick from the list with (down)trendtags an item that matches the cluster (resp candidate) best Evenif there are several items that are a possible match the worker mustchoose only one If there is no matching item the worker can selectthe option other Three different workers evaluated each cluster andthe final label is the majority label If there is no majority label wepresent the cluster repeatedly to the workers until there is a majority

The descriptors of candidates are collected through the followingprocedure

bull First we review the documents assigned to a candidate cluster

bull Then we extract keywords and titles of the papers attributed tothe cluster Semantic Scholar provides keywords and titles as apart of the metadata

bull Finally we compute the top 15 most frequent keywords andrandomly select 15 titles These keywords and titles are thedescriptor of the cluster candidate presented to crowd-workers

We create the (down)trend tags based on keywords titles and meta-data such as publication venue The cluster candidates were manu-ally labeled by graduate employees (ie domain experts in the com-puter science domain) considering the abovementioned items

631 Inter-annotator Agreement

To ensure the quality of obtained annotations we computed the InterAnnotator Agreement (IAA) - a measure of how well two (or more)

8Note that the overall number of 217 refers to the number of the (down)trendcandidates and not to the number of documents Each candidate contains hundredsof documents

9Formerly called CrowdFlower

[ September 4 2020 at 1017 ndash classicthesis ]

78 benchmark for scientific trend detection

annotators can make the same decision for a particular category Todo that we used the following metrics

krippendorffrsquos α is a reliability coefficient developed to measurethe agreement among observers or measuring instruments draw-ing distinctions among typically unstructured phenomena orassign computable values to them (Krippendorff 2011) Wechoose Krippendorffrsquos α as an inter-annotator agreement mea-sure because it ndash unlike other specialized coefficients ndash is a gen-eralization of several reliability criteria and applies to situationslike ours where we have more than two annotators and a givenannotator only labels a subset of the data (ie we have to dealwith missing values)

Krippendorff (2011) has proposed several metric variations eachof which is associated with a specific set of weights We useKrippendorffrsquos α considering any number of observers and miss-ing data In this case Krippendorffrsquos α for the overall numberof 217 annotated documents is α = 0798 According to Landisand Koch (1977) this value corresponds to a substantial level ofagreement

figure-eight agreement rate Figure-Eight10 provides an agree-ment score c for each annotated unit u which is based onthe majority vote of the trusted workers The score has beenproven to perform well compared to the classic metrics (Kol-hatkar Zinsmeister and Hirst 2013) We computed the averageC for the 217 annotated documents C is 0799

Since both Krippendorffrsquos α and internal Figure Eight IAA are ratherhigh we consider the obtained annotations for the benchmark to beof good quality

632 Gold Trends

We compiled a list of computer science trends for the years 2016based on the analysis of the twelve survey publications (Ankerholz2016 Augustin 2016 Brooks 2016 Frot 2016 Harriet 2015 IEEEComputer Society 2016 Markov 2015 Meyerson and Mariette 2016Nessma 2015 Reese 2015 Rivera 2016 Zaino 2016) For this yearwe identified a total of 31 trends see Table 20

Since there is a variation as to the name of a specific trend wecreated a unique name for each of them Out of the 31 trends 28(90 3) are instantiated as trends by our model that we confirmedin the crowdsourcing procedure11 We searched for the three missingtrends using a variety of techniques (ie inspection of non-trendsdowntrends random browsing of documents not assigned to trend

10Former CrowdFlower platform11The author verified this based on her judgment of equivalence of one of the 31

gold trend names in the literature with one of the trend tags

[ September 4 2020 at 1017 ndash classicthesis ]

64 evaluation measure for trend detection 79

candidates and keyword searches) Two of the missing trends donot seem to occur in the collection at all ldquoHuman Augmentationrdquo andldquoFood and Water Technologyrdquo The third missing trend ldquoCryptocurrencyand Cryptographyrdquo does occur in the collection but the occurrencerate is very small (ca 001) We do not consider these three trendsin the rest of the paper

Computer Science Trend Computer Science Trend

Autonomous Agents and Systems Natural Language Processing

Autonomous Vehicles Open Source

Big Data Analytics Privacy

Bioinformatics and (Bio) Neuroscience Quantum Computing

Biometrics and Personal Identification Recommender Systems and Social Networks

Cloud Computing and Software as a Service Reinforcement Learning

Cryptocurrency and Cryptographylowast Renewable Energy

Cyber Security Robotics

E-business Semantic Web

Food and Water Technologylowast Sentiment and Emotion Analysis

Game-based Learning Smart Cities

Games and (Virtual) Augmented Reality Supply Chains and RFIDs

Human Augmentationlowast Technology for Climate Change

MachineDeep Learning Transportation and Energy

Medical Advances and DNA Computing Wearables

Mobile Computing

Table 20 31 areas identified as computer science trends for year 2016 in themedia 28 of these trends are instantiated by our model as welland thus covered in our benchmark Topics marked with asterisk(lowast) were not found in our underlying data collection

In summary based on our analysis of meta-literature we identified28 gold trends that also occur in our collection We will use them asthe gold standard that trend detection algorithms should aim to discover

64 evaluation measure for trend detection

Whether something is a trend is a graded concept For this reasonwe adopt an evaluation measure based on a ranking of candidatesspecifically Mean Average Precision (MAP) The score denotes theaverage of the precision values of ranks of correctly identified andnon-redundant trends

We approach trend detection as computing a ranked list of sets ofdocuments where each set is a trend candidate We refer to docu-ments as trendy downtrendy and flat depending on which class theyare assigned to in our benchmark We consider a trend candidate

[ September 4 2020 at 1017 ndash classicthesis ]

80 benchmark for scientific trend detection

c as a correct recognition of gold trend t if it satisfies the followingrequirements

bull c is trendy

bull t is the largest trend in c

bull |tcap c||c| gt ρ where ρ is a coverage parameter

bull t was not earlier recognized in the list

These criteria give us a true positive (ie recognition of a gold trend)or false positive (otherwise) for every position of the ranked list anda precision score for each trend Precision for a trend that was notobserved is 0 We then compute MAP see Table 21 for an example

Gold Trends Gold Downtrends

Cloud Computing Compilers

Bioinformatics Petri Nets

Sentiment Analysis Fuzzy Sets

Privacy Routing Protocols

Trend Candidate |tcap c||c| tpfp P

c1 Cloud Computing 060 tp 100

c2 Privacy 020 fp 050

c3 Cloud Computing 070 fp 033

c4 Bioinformatics 055 tp 050

c5 Sentiment Analysis 095 tp 060

c6 Sentiment Analysis 059 fp 050

c7 Bioinformatics 041 fp 043

c8 Compilers 060 fp 038

c9 Petri Nets 080 fp 033

c10 Privacy 033 fp 030

Table 21 Example of proposed MAP evaluation (with ρ = 05) MAP is theaverage of the precision values 10 (Cloud Computing) 05 (Bioin-formatics) 06 (Sentiment Analysis) and 00 (Privacy) ie MAP =214 = 0525 True positive tp False positive fp Precision P

65 trend detection baselines

In this section we investigate the impact of four different configura-tion choices on the performance of trend detection by way of ablation

[ September 4 2020 at 1017 ndash classicthesis ]

65 trend detection baselines 81

651 Configurations

abstracts vs full texts The underlying data collection containsboth abstracts and full texts12 Our initial expectation was thatthe full text of a document comprises more information thanthe abstract and it should be a more reliable basis for trend de-tection This is one of the configurations we test in the ablationWe employ our proposed method on full text (solely documenttexts without abstracts) and abstract (only abstract texts) collec-tion separately and observe the results

stratification Due to the uneven distribution of documents overthe years the last years (with increased volume of publications)may get too much impact on results compared to earlier yearsTo investigate this we conduct an ablation experiment wherewe compare (i) randomly sampled 160k documents from theentire collection and (ii) stratified sampling In stratified sam-pling we select 10k documents for each of the years 2000 ndash 2015resulting in a sampled collection of 160k

length Lt of interval Clusters are ranked by trendiness and theresulting ranked list is evaluated To measure the trendiness orgrowth of topics over time we fit a line to an interval (ini|i0 6i lt i0 + Lt by linear regression where Lt is the length of theinterval i is one of the 16 years (2000 2015) and ni is thenumber of documents that were assigned to cluster c in thatyear As a simple default baseline we apply the regression to ahalf of an entire interval ie Lt = 8 There are nine such inter-vals in our 16-year period As the final measure of growth forthe cluster we take the maximal of the nine individual slopesTo determine how much this configuration choice affects our re-sults we also test a linear regression over the entire time spanie Lt = 16 In this case there is a single interval

clustering method We consider Gaussian Mixture Models (GMM)and k-means We use GMM with spherical type of covariancewhere each component has the same variance For both weproceed as described in Section 623

652 Experimental Setup and Results

We conducted experiments on the Semantic Scholar corpus and eval-uated a ranked list of trend candidates against the benchmark con-sidering the configurations mentioned above We adopted Mean Av-erage Precision (MAP) as the principal evaluation measure As a sec-ondary evaluation measure we computed recall at 50 (R50) whichestimates the percentage of gold trends found in the list of 50 highest-ranked trend candidates We found that the configuration (0) in Ta-

12Semantic Scholar provided the underlying dataset at the beginning of our workin 2016

[ September 4 2020 at 1017 ndash classicthesis ]

82 benchmark for scientific trend detection

ble 22 works best We then conducted ablation experiments to dis-cover the importance of the configuration choices

(0) (1) (2) (3) (4)

document part A F A A A

stratification yes yes no yes yes

Lt 8 8 8 16 8

clustering GMMsph GMMsph GMMsph GMMsph kmicro

MAP 36 (03) 07 (19) 30 (03) 32 (04) 34 (21)

R50 avg 50 (012) 25 (018) 50 (016) 46 (018) 53(014)

R50 max 61 37 53 61 61

Table 22 Ablation results Standard deviations are in parentheses GMMclustering performed with the spherical (sph) type of covarianceK-means clustering is denoted as kmicro in the ablation table

comparing (0) and (1) 13 we observed that abstracts (A) are amore reliable representation for our trend detection baselinethan full texts (F) The possible explanation for that observa-tion is that an abstract is a summary of a scientific paper thatcovers only the central points and is semantically very conciseand rich In opposite the full text contains numerous parts thatare secondary (ie future work related work) and thus mayskew the documentrsquos overall meaning

comparing (0) and (2) we recognized that stratification (yes) im-proves results compared to the setting with non-stratified (no)randomly sampled data We would expect the effect of stratifi-cation to be even more substantial for collections in which thedistribution of documents over time is more skewed than in our

comparing (0) and (3) the length of the interval is a vital config-uration choice As we assumed the 16 year interval is ratherlong and 8 year interval could be a better choice primarily ifone aims to find short-term trends

comparing (0) and (4) we recognized that topics we obtain fromk-means have a similar nature to those from GMM HoweverGMM that take variance and covariance into account and esti-mate a soft assignment still perform slightly better

66 benchmark description

Below we report the contents of the TRENDNERT benchmark14

1 Two records containing information on documents One filewith the documents assigned to (down)trends and the otherfile with documents assigned to flat topics

13Numbers (0) (1) (2) (3) (4) are the configuration choices in the ablation table14httpsdoiorg1017605OSFIODHZCT

[ September 4 2020 at 1017 ndash classicthesis ]

67 summary 83

2 A script to produce document hash codes

3 A file to map every hash code to internal IDs in the benchmark

4 A script to compute MAP (default setting is ρ = 25)

5 READMEmd a file with guidance notes

Every document in the benchmark contains the following informa-tion

bull Paper ID internal ID of documents assigned to each documentin the original Semantic Scholar collection

bull Cluster ID unique ID of meta-cluster where the document wasobserved

bull LabelTag Name of the gold (down)trend assigned by crowd-workers (eg Cyber Security Fuzzy Sets and Systems)

bull Type of (down)trend candidate trend (T ) downtrend (D) or flattopic (F)

bull Hash ID15 of each paper from the original Semantic Scholarcollection

67 summary

Emerging trend detection is a promising task for Intelligent ProcessAutomation (IPA) that allows companies and funding agencies to au-tomatically analyze extensive textual collections in order to detectemerging fields and forecast the future employing machine learningalgorithms Thus the availability of publicly available benchmarksfor trend detection is of significant importance as only thereby canone evaluate the performance and robustness of the techniques em-ployed

Within this work we release TRENDNERT ndash the first publicly availablebenchmark for (down)trend detection that offers a realistic setting fordeveloping trend detection algorithms TRENDNERT also supportsdowntrend detection ndash a significant problem that was not addressedbefore We also present several experimental findings on trend detec-tion First stratification improves trend detection if the distribution ofdocuments is skewed over time Third abstract-based trend detectionperforms better than full text if straightforward models are used

68 future work

There are possible directions for future work

bull The presented approach to estimate the trendiness of the topicby means of linear regression is straightforward and can be

15MD5-hash of First Author + Title + Year

[ September 4 2020 at 1017 ndash classicthesis ]

84 benchmark for scientific trend detection

replaced by more sophisticated versions like Kendallrsquos τ cor-relation coefficient or the Least Squares Regression Smoother(LOESS) (Cleveland 1979)

bull It also would be challenging to replace the k-means clustering al-gorithm that we adopted within this work with the LDA modelwhile retaining the document representations as it was donein Dieng Ruiz and Blei 2019 The case to investigate wouldbe then whether the topic modeling outperforms the simpleclustering algorithm in this particular setting And would theresulting system benefit from the topic modeling algorithm

bull Furthermore it would be of interest to conduct more researchon the following What kind of document representation canmake effective use of the information in full texts Recentlyproposed contextualised word embeddings could be a possibledirection for these experiments

[ September 4 2020 at 1017 ndash classicthesis ]

7D I S C U S S I O N A N D F U T U R E W O R K

As we observed in the two parts of this thesis Intelligent Process Au-tomation (IPA) is a challenging research area due to its multifaceted-ness and appliance in a real-world scenario Its main objective is toautomatize time-consuming and routine tasks and free up the knowl-edge worker for more cognitively demanding tasks However thistask addresses many difficulties such as a lack of resources for bothtraining and evaluation of algorithms as well as an insufficient per-formance of machine learning techniques when applied to real-worlddata and practical problems In this thesis we investigated two IPA

applications within the Natural Language Processing (NLP) domainndash conversational agents and predictive analytics ndash as well as addressedsome of the most common issues

bull In the first part of the thesis we have addressed the challenge ofimplementing a robust conversational system for IPA purposeswithin the practical e-learning domain considering the condi-tions of missing training data The primary purpose of the re-sulting system is to perform the onboarding process betweenthe tutors and students This onboarding reduces repetitiveand time-consuming activities while allowing tutors to focuson solely mathematical questions which is the core task withthe most impact on students Besides that by interacting withusers the system augments the resources with structured andlabeled training data that we used to implement learnable dia-logue components

The realization of such a system was associated with severalchallenges Among others were missing structured data andambiguous user-generated content that requires a precise lan-guage understanding Also the system that simultaneouslycommunicates with a large number of users has to maintainadditional meta policies such as fallback and check-up policiesto hold the conversation intelligently

bull Moreover we have shown the rule-based dialogue systemrsquos po-tential extension using transfer learning We utilized a smalldataset of structured dialogues obtained in a trial-run of therule-based system and applied the current state-of-the-art Bidi-rectional Encoder Representations from Transformers (BERT) ar-chitecture to solve the domain-specific tasks Specifically weretrained two of three components in the conversational systemndash Named Entity Recognition (NER) and Dialogue Manager (DM)(ie NAP task) ndash and confirmed the applicability of such algo-

85

[ September 4 2020 at 1017 ndash classicthesis ]

86 discussion and future work

rithms in a real-world low-resource scenario by showing thereasonably high performance of resulting models

bull Last but not least in the second part of the thesis we addressedthe issue of missing publicly available benchmarks for the eval-uation of machine learning algorithms for the trend detectiontask To the best of our knowledge we released the first openbenchmark for this purpose which offers a realistic setting fordeveloping trend detection algorithms Besides that this bench-mark supports downtrend detection ndash a significant problemthat was not sufficiently addressed before We additionally pro-posed the evaluation measure and presented several experimen-tal findings on trend detection

The ultimate objective of Intelligent Process Automation (IPA) shouldbe a creation of robust algorithms in low-resource and domain-specificconditions This is still a challenging task however some of the theexperiments presented in this thesis revealed that this goal could bepotentially achieved by means of Transfer Learning (TL) techniques

[ September 4 2020 at 1017 ndash classicthesis ]

B I B L I O G R A P H Y

Agrawal Amritanshu Wei Fu and Tim Menzies (2018) bdquoWhat iswrong with topic modeling And how to fix it using search-basedsoftware engineeringldquo In Information and Software Technology 98pp 74ndash88 (cit on p 66)

Alan M (1950) bdquoTuringldquo In Computing machinery and intelligenceMind 59236 pp 433ndash460 (cit on p 7)

Alghamdi Rubayyi and Khalid Alfalqi (2015) bdquoA survey of topicmodeling in text miningldquo In Int J Adv Comput Sci Appl(IJACSA)61 (cit on pp 64ndash66)

Ammar Waleed Dirk Groeneveld Chandra Bhagavatula Iz BeltagyMiles Crawford Doug Downey Jason Dunkelberger Ahmed El-gohary Sergey Feldman Vu Ha et al (2018) bdquoConstruction ofthe literature graph in semantic scholarldquo In arXiv preprint arXiv1805 02262 (cit on p 72)

Ankerholz Amber (2016) 2016 Future of Open Source Survey Says OpenSource Is The Modern Architecture httpswwwlinuxcomnews2016-future-open-source-survey-says-open-source-modern-

architecture (cit on p 78)Asooja Kartik Georgeta Bordea Gabriela Vulcu and Paul Buitelaar

(2016) bdquoForecasting emerging trends from scientific literatureldquoIn Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) pp 417ndash420 (cit on p 68)

Augustin Jason (2016) Emergin Science and Technology Trends A Syn-thesis of Leading Forecast httpwwwdefenseinnovationmarket-placemilresources2016_SciTechReport_16June2016pdf (citon p 78)

Blei David M and John D Lafferty (2006) bdquoDynamic topic modelsldquoIn Proceedings of the 23rd international conference on Machine learn-ing ACM pp 113ndash120 (cit on pp 66 67)

Blei David M John D Lafferty et al (2007) bdquoA correlated topic modelof scienceldquo In The Annals of Applied Statistics 11 pp 17ndash35 (citon p 66)

Blei David M Andrew Y Ng and Michael I Jordan (2003) bdquoLatentdirichlet allocationldquo In Journal of machine Learning research 3Janpp 993ndash1022 (cit on p 66)

Bloem Peter (2019) Transformers from Scratch httpwwwpeterbloemnlblogtransformers (cit on p 45)

Bocklisch Tom Joey Faulkner Nick Pawlowski and Alan Nichol(2017) bdquoRasa Open source language understanding and dialoguemanagementldquo In arXiv preprint arXiv171205181 (cit on p 15)

Bolelli Levent Seyda Ertekin and C Giles (2009) bdquoTopic and trenddetection in text collections using latent dirichlet allocationldquo InAdvances in Information Retrieval pp 776ndash780 (cit on p 66)

87

[ September 4 2020 at 1017 ndash classicthesis ]

88 Bibliography

Bolelli Levent Seyda Ertekin Ding Zhou and C Lee Giles (2009)bdquoFinding topic trends in digital librariesldquo In Proceedings of the 9thACMIEEE-CS joint conference on Digital libraries ACM pp 69ndash72

(cit on p 66)Bordes Antoine Y-Lan Boureau and Jason Weston (2016) bdquoLearning

end-to-end goal-oriented dialogldquo In arXiv preprint arXiv160507683(cit on pp 16 17)

Brooks Chuck (2016) 7 Top Tech Trends Impacting Innovators in 2016httpinnovationexcellencecomblog201512267-top-

tech-trends-impacting-innovators-in-2016 (cit on p 78)Burtsev Mikhail Alexander Seliverstov Rafael Airapetyan Mikhail

Arkhipov Dilyara Baymurzina Eugene Botvinovsky NickolayBushkov Olga Gureenkova Andrey Kamenev Vasily Konovalovet al (2018) bdquoDeepPavlov An Open Source Library for Conver-sational AIldquo In (cit on p 15)

Chang Jonathan David M Blei et al (2010) bdquoHierarchical relationalmodels for document networksldquo In The Annals of Applied Statis-tics 41 pp 124ndash150 (cit on p 66)

Chen Hongshen Xiaorui Liu Dawei Yin and Jiliang Tang (2017)bdquoA survey on dialogue systems Recent advances and new fron-tiersldquo In Acm Sigkdd Explorations Newsletter 192 pp 25ndash35 (citon pp 14ndash16)

Chen Lingzhen Alessandro Moschitti Giuseppe Castellucci AndreaFavalli and Raniero Romagnoli (2018) bdquoTransfer Learning for In-dustrial Applications of Named Entity Recognitionldquo In NL4AIAI IA pp 129ndash140 (cit on p 43)

Cleveland William S (1979) bdquoRobust locally weighted regression andsmoothing scatterplotsldquo In Journal of the American statistical asso-ciation 74368 pp 829ndash836 (cit on p 84)

Collobert Ronan Jason Weston Leacuteon Bottou Michael Karlen KorayKavukcuoglu and Pavel Kuksa (2011) bdquoNatural language pro-cessing (almost) from scratchldquo In Journal of machine learning re-search 12Aug pp 2493ndash2537 (cit on p 16)

Cuayaacutehuitl Heriberto Simon Keizer and Oliver Lemon (2015) bdquoStrate-gic dialogue management via deep reinforcement learningldquo InarXiv preprint arXiv151108099 (cit on p 14)

Cui Lei Shaohan Huang Furu Wei Chuanqi Tan Chaoqun Duanand Ming Zhou (2017) bdquoSuperagent A customer service chat-bot for e-commerce websitesldquo In Proceedings of ACL 2017 SystemDemonstrations pp 97ndash102 (cit on p 8)

Curiskis Stephan A Barry Drake Thomas R Osborn and Paul JKennedy (2019) bdquoAn evaluation of document clustering and topicmodelling in two online social networks Twitter and Redditldquo InInformation Processing amp Management (cit on p 74)

Dai Zihang Zhilin Yang Yiming Yang William W Cohen Jaime Car-bonell Quoc V Le and Ruslan Salakhutdinov (2019) bdquoTransformer-xl Attentive language models beyond a fixed-length contextldquo InarXiv preprint arXiv190102860 (cit on pp 43 44)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 89

Daniel Ben (2015) bdquoBig Data and analytics in higher education Op-portunities and challengesldquo In British journal of educational tech-nology 465 pp 904ndash920 (cit on p 22)

Deerwester Scott Susan T Dumais George W Furnas Thomas K Lan-dauer and Richard Harshman (1990) bdquoIndexing by latent seman-tic analysisldquo In Journal of the American society for information sci-ence 416 pp 391ndash407 (cit on p 65)

Deoras Anoop Kaisheng Yao Xiaodong He Li Deng Geoffrey Ger-son Zweig Ruhi Sarikaya Dong Yu Mei-Yuh Hwang and Gre-goire Mesnil (2015) Assignment of semantic labels to a sequence ofwords using neural network architectures US Patent App 14016186

(cit on p 13)Devlin Jacob Ming-Wei Chang Kenton Lee and Kristina Toutanova

(2018) bdquoBert Pre-training of deep bidirectional transformers forlanguage understandingldquo In arXiv preprint arXiv181004805 (citon pp 42ndash45)

Dhingra Bhuwan Lihong Li Xiujun Li Jianfeng Gao Yun-NungChen Faisal Ahmed and Li Deng (2016) bdquoTowards end-to-end re-inforcement learning of dialogue agents for information accessldquoIn arXiv preprint arXiv160900777 (cit on p 16)

Dieng Adji B Francisco JR Ruiz and David M Blei (2019) bdquoTopicModeling in Embedding Spacesldquo In arXiv preprint arXiv190704907(cit on pp 74 84)

Donahue Jeff Yangqing Jia Oriol Vinyals Judy Hoffman Ning ZhangEric Tzeng and Trevor Darrell (2014) bdquoDecaf A deep convolu-tional activation feature for generic visual recognitionldquo In Inter-national conference on machine learning pp 647ndash655 (cit on p 43)

Eric Mihail and Christopher D Manning (2017) bdquoKey-value retrievalnetworks for task-oriented dialogueldquo In arXiv preprint arXiv 170505414 (cit on p 16)

Fang Hao Hao Cheng Maarten Sap Elizabeth Clark Ari HoltzmanYejin Choi Noah A Smith and Mari Ostendorf (2018) bdquoSoundingboard A user-centric and content-driven social chatbotldquo In arXivpreprint arXiv180410202 (cit on p 10)

Friedrich Fabian Jan Mendling and Frank Puhlmann (2011) bdquoPro-cess model generation from natural language textldquo In Interna-tional Conference on Advanced Information Systems Engineering pp 482

ndash496 (cit on p 2)Frot Mathilde (2016) 5 Trends in Computer Science Research https

wwwtopuniversitiescomcoursescomputer-science-info-

rmation-systems5-trends-computer-science-research (citon p 78)

Glasmachers Tobias (2017) bdquoLimits of end-to-end learningldquo In arXivpreprint arXiv170408305 (cit on pp 16 17)

Goddeau David Helen Meng Joseph Polifroni Stephanie Seneffand Senis Busayapongchai (1996) bdquoA form-based dialogue man-ager for spoken language applicationsldquo In Proceeding of FourthInternational Conference on Spoken Language Processing ICSLPrsquo96Vol 2 IEEE pp 701ndash704 (cit on p 14)

[ September 4 2020 at 1017 ndash classicthesis ]

90 Bibliography

Gohr Andre Alexander Hinneburg Rene Schult and Myra Spiliopou-lou (2009) bdquoTopic evolution in a stream of documentsldquo In Pro-ceedings of the 2009 SIAM International Conference on Data MiningSIAM pp 859ndash870 (cit on p 65)

Gollapalli Sujatha Das and Xiaoli Li (2015) bdquoEMNLP versus ACLAnalyzing NLP research over timeldquo In EMNLP pp 2002ndash2006

(cit on p 72)Griffiths Thomas L and Mark Steyvers (2004) bdquoFinding scientific top-

icsldquo In Proceedings of the National academy of Sciences 101suppl 1pp 5228ndash5235 (cit on p 65)

Griffiths Thomas L Michael I Jordan Joshua B Tenenbaum andDavid M Blei (2004) bdquoHierarchical topic models and the nestedChinese restaurant processldquo In Advances in neural information pro-cessing systems pp 17ndash24 (cit on p 66)

Hall David Daniel Jurafsky and Christopher D Manning (2008) bdquoStu-dying the history of ideas using topic modelsldquo In Proceedingsof the conference on empirical methods in natural language processingAssociation for Computational Linguistics pp 363ndash371 (cit onp 66)

Harriet Taylor (2015) Privacy will hit tipping point in 2016 httpwwwcnbccom20151109privacy-will-hit-tipping-point-

in-2016html (cit on p 78)He Jiangen and Chaomei Chen (2018) bdquoPredictive Effects of Novelty

Measured by Temporal Embeddings on the Growth of ScientificLiteratureldquo In Frontiers in Research Metrics and Analytics 3 p 9

(cit on pp 64 68 71)He Junxian Zhiting Hu Taylor Berg-Kirkpatrick Ying Huang and

Eric P Xing (2017) bdquoEfficient correlated topic modeling with topicembeddingldquo In Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ACM pp 225ndash233 (cit on pp 66 67)

He Qi Bi Chen Jian Pei Baojun Qiu Prasenjit Mitra and Lee Giles(2009) bdquoDetecting topic evolution in scientific literature How cancitations helpldquo In Proceedings of the 18th ACM conference on In-formation and knowledge management ACM pp 957ndash966 (cit onp 68)

Henderson Matthew Blaise Thomson and Jason D Williams (2014)bdquoThe second dialog state tracking challengeldquo In Proceedings of the15th Annual Meeting of the Special Interest Group on Discourse andDialogue (SIGDIAL) pp 263ndash272 (cit on p 15)

Henderson Matthew Blaise Thomson and Steve Young (2013) bdquoDeepneural network approach for the dialog state tracking challengeldquoIn Proceedings of the SIGDIAL 2013 Conference pp 467ndash471 (cit onp 14)

Hess Ann Hari Iyer and William Malm (2001) bdquoLinear trend analy-sis a comparison of methodsldquo In Atmospheric Environment 3530Visibility Aerosol and Atmospheric Optics pp 5211 ndash5222 issn1352-2310 doi httpsdoiorg101016S1352- 2310(01)00342-9 url httpwwwsciencedirectcomsciencearticlepiiS1352231001003429 (cit on p 75)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 91

Hofmann Thomas (1999) bdquoProbabilistic latent semantic analysisldquo InProceedings of the Fifteenth conference on Uncertainty in artificial in-telligence Morgan Kaufmann Publishers Inc pp 289ndash296 (cit onp 65)

Honnibal Matthew and Ines Montani (2017) bdquospacy 2 Natural lan-guage understanding with bloom embeddings convolutional neu-ral networks and incremental parsingldquo In To appear 7 (cit onp 15)

Howard Jeremy and Sebastian Ruder (2018) bdquoUniversal languagemodel fine-tuning for text classificationldquo In arXiv preprint arXiv180106146 (cit on p 44)

IEEE Computer Society (2016) Top 9 Computing Technology Trends for2016 httpswwwscientificcomputingcomnews201601top-9-computing-technology-trends-2016 (cit on p 78)

Ivancic Lucija Dalia Suša Vugec and Vesna Bosilj Vukšic (2019) bdquoRobo-tic Process Automation Systematic Literature Reviewldquo In In-ternational Conference on Business Process Management Springerpp 280ndash295 (cit on pp 1 2)

Jain Mohit Pratyush Kumar Ramachandra Kota and Shwetak NPatel (2018) bdquoEvaluating and informing the design of chatbotsldquoIn Proceedings of the 2018 Designing Interactive Systems ConferenceACM pp 895ndash906 (cit on p 8)

Khanpour Hamed Nishitha Guntakandla and Rodney Nielsen (2016)bdquoDialogue act classification in domain-independent conversationsusing a deep recurrent neural networkldquo In Proceedings of COL-ING 2016 the 26th International Conference on Computational Lin-guistics Technical Papers pp 2012ndash2021 (cit on p 13)

Kim Young-Bum Karl Stratos Ruhi Sarikaya and Minwoo Jeong(2015) bdquoNew transfer learning techniques for disparate label setsldquoIn Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference onNatural Language Processing (Volume 1 Long Papers) pp 473ndash482

(cit on p 43)Kolhatkar Varada Heike Zinsmeister and Graeme Hirst (2013) bdquoAn-

notating anaphoric shell nouns with their antecedentsldquo In Pro-ceedings of the 7th Linguistic Annotation Workshop and Interoperabil-ity with Discourse pp 112ndash121 (cit on pp 77 78)

Kontostathis April Leon M Galitsky William M Pottenger SomaRoy and Daniel J Phelps (2004) bdquoA survey of emerging trend de-tection in textual data miningldquo In Survey of text mining Springerpp 185ndash224 (cit on p 63)

Krippendorff Klaus (2011) bdquoComputing Krippendorffrsquos alpha- relia-bilityldquo In Annenberg School for Communication (ASC) DepartmentalPapers 43 (cit on p 78)

Kurata Gakuto Bing Xiang Bowen Zhou and Mo Yu (2016) bdquoLever-aging sentence-level information with encoder lstm for semanticslot fillingldquo In arXiv preprint arXiv160101530 (cit on p 13)

Landis J Richard and Gary G Koch (1977) bdquoThe measurement ofobserver agreement for categorical dataldquo In biometrics pp 159ndash174 (cit on p 78)

[ September 4 2020 at 1017 ndash classicthesis ]

92 Bibliography

Le Minh-Hoang Tu-Bao Ho and Yoshiteru Nakamori (2005) bdquoDe-tecting emerging trends from scientific corporaldquo In InternationalJournal of Knowledge and Systems Sciences 22 pp 53ndash59 (cit onp 68)

Le Quoc V and Tomas Mikolov (2014) bdquoDistributed Representationsof Sentences and Documentsldquo In ICML Vol 14 pp 1188ndash1196

(cit on p 74)Lee Ji Young and Franck Dernoncourt (2016) bdquoSequential short-text

classification with recurrent and convolutional neural networksldquoIn arXiv preprint arXiv160303827 (cit on p 13)

Leopold Henrik Han van der Aa and Hajo A Reijers (2018) bdquoIden-tifying candidate tasks for robotic process automation in textualprocess descriptionsldquo In Enterprise Business-Process and Informa-tion Systems Modeling Springer pp 67ndash81 (cit on p 2)

Levenshtein Vladimir I (1966) bdquoBinary codes capable of correctingdeletions insertions and reversalsldquo In Soviet physics doklady 8pp 707ndash710 (cit on p 25)

Liao Vera Masud Hussain Praveen Chandar Matthew Davis Yasa-man Khazaeni Marco Patricio Crasso Dakuo Wang Michael Mu-ller N Sadat Shami Werner Geyer et al (2018) bdquoAll Work andNo Playldquo In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems ACM p 3 (cit on p 8)

Lowe Ryan Thomas Nissan Pow Iulian Vlad Serban Laurent Char-lin Chia-Wei Liu and Joelle Pineau (2017) bdquoTraining end-to-enddialogue systems with the ubuntu dialogue corpusldquo In Dialogueamp Discourse 81 pp 31ndash65 (cit on p 22)

Luger Ewa and Abigail Sellen (2016) bdquoLike having a really bad PAthe gulf between user expectation and experience of conversa-tional agentsldquo In Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems ACM pp 5286ndash5297 (cit on p 8)

MacQueen James (1967) bdquoSome methods for classification and analy-sis of multivariate observationsldquo In Proceedings of the fifth Berkeleysymposium on mathematical statistics and probability Vol 1 OaklandCA USA pp 281ndash297 (cit on p 74)

Manning Christopher D Prabhakar Raghavan and Hinrich Schuumltze(2008) Introduction to Information Retrieval Web publication at in-formationretrievalorg Cambridge University Press (cit on p 74)

Markov Igor (2015) 13 Of 2015rsquos Hottest Topics In Computer ScienceResearch httpswwwforbescomsitesquora2015042213-of-2015s-hottest-topics-in-computer-science-research

fb0d46b1e88d (cit on p 78)Martin James H and Daniel Jurafsky (2009) Speech and language pro-

cessing An introduction to natural language processing computationallinguistics and speech recognition PearsonPrentice Hall Upper Sad-dle River

Mei Qiaozhu Deng Cai Duo Zhang and ChengXiang Zhai (2008)bdquoTopic modeling with network regularizationldquo In Proceedings ofthe 17th international conference on World Wide Web ACM pp 101ndash110 (cit on p 65)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 93

Meyerson Bernard and DiChristina Mariette (2016) These are the top10 emerging technologies of 2016 httpwww3weforumorgdocsGAC16_Top10_Emerging_Technologies_2016_reportpdf (cit onp 78)

Mikolov Tomas Kai Chen Greg Corrado and Jeffrey Dean (2013)bdquoEfficient estimation of word representations in vector spaceldquo InarXiv preprint arXiv13013781 (cit on p 74)

Miller Alexander Adam Fisch Jesse Dodge Amir-Hossein KarimiAntoine Bordes and Jason Weston (2016) bdquoKey-value memorynetworks for directly reading documentsldquo In (cit on p 16)

Mohanty Soumendra and Sachin Vyas (2018) bdquoHow to Compete inthe Age of Artificial Intelligenceldquo In (cit on pp III V 1 2)

Moiseeva Alena and Hinrich Schuumltze (2020) bdquoTRENDNERT A Bench-mark for Trend and Downtrend Detection in a Scientific DomainldquoIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial In-telligence (cit on p 71)

Moiseeva Alena Dietrich Trautmann Michael Heimann and Hin-rich Schuumltze (2020) bdquoMultipurpose Intelligent Process Automa-tion via Conversational Assistantldquo In arXiv preprint arXiv200102284 (cit on pp 21 41)

Monaghan Fergal Georgeta Bordea Krystian Samp and Paul Buite-laar (2010) bdquoExploring your research Sprinkling some saffron onsemantic web dog foodldquo In Semantic Web Challenge at the Inter-national Semantic Web Conference Vol 117 Citeseer pp 420ndash435

(cit on p 68)Monostori Laacuteszloacute Andraacutes Maacuterkus Hendrik Van Brussel and E West-

kaumlmpfer (1996) bdquoMachine learning approaches to manufactur-ingldquo In CIRP annals 452 pp 675ndash712 (cit on p 15)

Moody Christopher E (2016) bdquoMixing dirichlet topic models andword embeddings to make lda2vecldquo In arXiv preprint arXiv160502019 (cit on p 74)

Mou Lili Zhao Meng Rui Yan Ge Li Yan Xu Lu Zhang and ZhiJin (2016) bdquoHow transferable are neural networks in nlp applica-tionsldquo In arXiv preprint arXiv160306111 (cit on p 43)

Mrkšic Nikola Diarmuid O Seacuteaghdha Tsung-Hsien Wen Blaise Thom-son and Steve Young (2016) bdquoNeural belief tracker Data-drivendialogue state trackingldquo In arXiv preprint arXiv160603777 (citon p 14)

Nessma Joussef (2015) Top 10 Hottest Research Topics in Computer Sci-ence http www pouted com top - 10 - hottest - research -

topics-in-computer-science (cit on p 78)Nie Binling and Shouqian Sun (2017) bdquoUsing text mining techniques

to identify research trends A case study of design researchldquo InApplied Sciences 74 p 401 (cit on p 68)

Pan Sinno Jialin and Qiang Yang (2009) bdquoA survey on transfer learn-ingldquo In IEEE Transactions on knowledge and data engineering 2210pp 1345ndash1359 (cit on p 41)

Qu Lizhen Gabriela Ferraro Liyuan Zhou Weiwei Hou and Tim-othy Baldwin (2016) bdquoNamed entity recognition for novel types

[ September 4 2020 at 1017 ndash classicthesis ]

94 Bibliography

by transfer learningldquo In arXiv preprint arXiv161009914 (cit onp 43)

Radford Alec Jeffrey Wu Rewon Child David Luan Dario Amodeiand Ilya Sutskever (2019) bdquoLanguage models are unsupervisedmultitask learnersldquo In OpenAI Blog 18 (cit on p 44)

Reese Hope (2015) 7 trends for artificial intelligence in 2016 rsquoLike 2015on steroidsrsquo httpwwwtechrepubliccomarticle7-trends-for-artificial-intelligence-in-2016-like-2015-on-steroids

(cit on p 78)Rivera Maricel (2016) Is Digital Game-Based Learning The Future Of

Learning httpselearningindustrycomdigital-game-based-

learning-future (cit on p 78)Rotolo Daniele Diana Hicks and Ben R Martin (2015) bdquoWhat is an

emerging technologyldquo In Research Policy 4410 pp 1827ndash1843

(cit on p 64)Salatino Angelo (2015) bdquoEarly detection and forecasting of research

trendsldquo In (cit on pp 63 66)Serban Iulian V Alessandro Sordoni Yoshua Bengio Aaron Courville

and Joelle Pineau (2016) bdquoBuilding end-to-end dialogue systemsusing generative hierarchical neural network modelsldquo In Thirti-eth AAAI Conference on Artificial Intelligence (cit on p 11)

Sharif Razavian Ali Hossein Azizpour Josephine Sullivan and Ste-fan Carlsson (2014) bdquoCNN features off-the-shelf an astoundingbaseline for recognitionldquo In Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops pp 806ndash813 (citon p 43)

Shibata Naoki Yuya Kajikawa and Yoshiyuki Takeda (2009) bdquoCom-parative study on methods of detecting research fronts using dif-ferent types of citationldquo In Journal of the American Society for in-formation Science and Technology 603 pp 571ndash580 (cit on p 68)

Shibata Naoki Yuya Kajikawa Yoshiyuki Takeda and KatsumoriMatsushima (2008) bdquoDetecting emerging research fronts basedon topological measures in citation networks of scientific publica-tionsldquo In Technovation 2811 pp 758ndash775 (cit on p 68)

Small Henry (2006) bdquoTracking and predicting growth areas in sci-enceldquo In Scientometrics 683 pp 595ndash610 (cit on p 68)

Small Henry Kevin W Boyack and Richard Klavans (2014) bdquoIden-tifying emerging topics in science and technologyldquo In ResearchPolicy 438 pp 1450ndash1467 (cit on p 64)

Su Pei-Hao Nikola Mrkšic Intildeigo Casanueva and Ivan Vulic (2018)bdquoDeep Learning for Conversational AIldquo In Proceedings of the 2018Conference of the North American Chapter of the Association for Com-putational Linguistics Tutorial Abstracts pp 27ndash32 (cit on p 12)

Sutskever Ilya Oriol Vinyals and Quoc V Le (2014) bdquoSequence tosequence learning with neural networksldquo In Advances in neuralinformation processing systems pp 3104ndash3112 (cit on pp 16 17)

Trautmann Dietrich Johannes Daxenberger Christian Stab HinrichSchuumltze and Iryna Gurevych (2019) bdquoRobust Argument Unit Recog-nition and Classificationldquo In arXiv preprint arXiv190409688 (citon p 41)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 95

Tu Yi-Ning and Jia-Lang Seng (2012) bdquoIndices of novelty for emerg-ing topic detectionldquo In Information processing amp management 482pp 303ndash325 (cit on p 64)

Turing Alan (1949) London Times letter to the editor (cit on p 7)Ultes Stefan Lina Barahona Pei-Hao Su David Vandyke Dongho

Kim Inigo Casanueva Paweł Budzianowski Nikola Mrkšic Tsu-ng-Hsien Wen et al (2017) bdquoPydial A multi-domain statistical di-alogue system toolkitldquo In Proceedings of ACL 2017 System Demon-strations pp 73ndash78 (cit on p 15)

Vaswani Ashish Noam Shazeer Niki Parmar Jakob Uszkoreit LlionJones Aidan N Gomez Łukasz Kaiser and Illia Polosukhin (2017)bdquoAttention is all you needldquo In Advances in neural information pro-cessing systems pp 5998ndash6008 (cit on p 45)

Vincent Pascal Hugo Larochelle Yoshua Bengio and Pierre-AntoineManzagol (2008) bdquoExtracting and composing robust features withdenoising autoencodersldquo In Proceedings of the 25th internationalconference on Machine learning ACM pp 1096ndash1103 (cit on p 44)

Vodolaacuten Miroslav Rudolf Kadlec and Jan Kleindienst (2015) bdquoHy-brid dialog state trackerldquo In arXiv preprint arXiv151003710 (citon p 14)

Wang Alex Amanpreet Singh Julian Michael Felix Hill Omer Levyand Samuel R Bowman (2018) bdquoGlue A multi-task benchmarkand analysis platform for natural language understandingldquo InarXiv preprint arXiv180407461 (cit on p 44)

Wang Chong and David M Blei (2011) bdquoCollaborative topic modelingfor recommending scientific articlesldquo In Proceedings of the 17thACM SIGKDD international conference on Knowledge discovery anddata mining ACM pp 448ndash456 (cit on p 66)

Wang Xuerui and Andrew McCallum (2006) bdquoTopics over time anon-Markov continuous-time model of topical trendsldquo In Pro-ceedings of the 12th ACM SIGKDD international conference on Knowl-edge discovery and data mining ACM pp 424ndash433 (cit on pp 65ndash67)

Wang Zhuoran and Oliver Lemon (2013) bdquoA simple and generic be-lief tracking mechanism for the dialog state tracking challengeOn the believability of observed informationldquo In Proceedings ofthe SIGDIAL 2013 Conference pp 423ndash432 (cit on p 14)

Weizenbaum Joseph et al (1966) bdquoELIZAmdasha computer program forthe study of natural language communication between man andmachineldquo In Communications of the ACM 91 pp 36ndash45 (cit onp 7)

Wen Tsung-Hsien Milica Gasic Nikola Mrksic Pei-Hao Su DavidVandyke and Steve Young (2015) bdquoSemantically conditioned lstm-based natural language generation for spoken dialogue systemsldquoIn arXiv preprint arXiv150801745 (cit on p 14)

Wen Tsung-Hsien David Vandyke Nikola Mrksic Milica Gasic LinaM Rojas-Barahona Pei-Hao Su Stefan Ultes and Steve Young(2016) bdquoA network-based end-to-end trainable task-oriented dia-logue systemldquo In arXiv preprint arXiv160404562 (cit on pp 1116 17 22)

[ September 4 2020 at 1017 ndash classicthesis ]

96 Bibliography

Werner Alexander Dietrich Trautmann Dongheui Lee and RobertoLampariello (2015) bdquoGeneralization of optimal motion trajecto-ries for bipedal walkingldquo In pp 1571ndash1577 (cit on p 41)

Williams Jason D Kavosh Asadi and Geoffrey Zweig (2017) bdquoHybridcode networks practical and efficient end-to-end dialog controlwith supervised and reinforcement learningldquo In arXiv preprintarXiv170203274 (cit on p 17)

Williams Jason Antoine Raux and Matthew Henderson (2016) bdquoThedialog state tracking challenge series A reviewldquo In Dialogue ampDiscourse 73 pp 4ndash33 (cit on p 12)

Wu Chien-Sheng Richard Socher and Caiming Xiong (2019) bdquoGlobal-to-local Memory Pointer Networks for Task-Oriented DialogueldquoIn arXiv preprint arXiv190104713 (cit on pp 15ndash17)

Wu Yonghui Mike Schuster Zhifeng Chen Quoc V Le Moham-mad Norouzi Wolfgang Macherey Maxim Krikun Yuan CaoQin Gao Klaus Macherey et al (2016) bdquoGooglersquos neural ma-chine translation system Bridging the gap between human andmachine translationldquo In arXiv preprint arXiv160908144 (cit onp 45)

Xie Pengtao and Eric P Xing (2013) bdquoIntegrating Document Cluster-ing and Topic Modelingldquo In Proceedings of the 30th Conference onUncertainty in Artificial Intelligence (cit on p 74)

Yan Zhao Nan Duan Peng Chen Ming Zhou Jianshe Zhou andZhoujun Li (2017) bdquoBuilding task-oriented dialogue systems foronline shoppingldquo In Thirty-First AAAI Conference on Artificial In-telligence (cit on p 14)

Yogatama Dani Michael Heilman Brendan OrsquoConnor Chris DyerBryan R Routledge and Noah A Smith (2011) bdquoPredicting a scien-tific communityrsquos response to an articleldquo In Proceedings of the Con-ference on Empirical Methods in Natural Language Processing Asso-ciation for Computational Linguistics pp 594ndash604 (cit on p 65)

Young Steve Milica Gašic Blaise Thomson and Jason D Williams(2013) bdquoPomdp-based statistical spoken dialog systems A reviewldquoIn Proceedings of the IEEE 1015 pp 1160ndash1179 (cit on p 17)

Zaino Jennifer (2016) 2016 Trends for Semantic Web and Semantic Tech-nologies http www dataversity net 2017 - predictions -

semantic-web-semantic-technologies (cit on p 78)Zhou Hao Minlie Huang et al (2016) bdquoContext-aware natural lan-

guage generation for spoken dialogue systemsldquo In Proceedingsof COLING 2016 the 26th International Conference on ComputationalLinguistics Technical Papers pp 2032ndash2041 (cit on pp 14 15)

Zhu Yukun Ryan Kiros Rich Zemel Ruslan Salakhutdinov RaquelUrtasun Antonio Torralba and Sanja Fidler (2015) bdquoAligningbooks and movies Towards story-like visual explanations by watch-ing movies and reading booksldquo In Proceedings of the IEEE interna-tional conference on computer vision pp 19ndash27 (cit on p 44)

[ September 4 2020 at 1017 ndash classicthesis ]

  • Abstract
  • Abstract
  • Acknowledgements
  • Contents
  • List of Figures
    • List of Figures
      • List of Tables
        • List of Tables
          • Acronyms
            • Acronyms
              • 1 Introduction
                • 11 Outline
                  • E-learning Conversational Assistant
                    • 2 Foundations
                      • 21 Introduction
                      • 22 Conversational Assistants
                        • 221 Taxonomy
                        • 222 Conventional Architecture
                        • 223 Existing Approaches
                          • 23 Target Domain amp Task Definition
                          • 24 Summary
                            • 3 Multipurpose IPA via Conversational Assistant
                              • 31 Introduction
                                • 311 Outline and Contributions
                                  • 32 Model
                                    • 321 OMB+ Design
                                    • 322 Preprocessing
                                    • 323 Natural Language Understanding
                                    • 324 Dialogue Manager
                                    • 325 Meta Policy
                                    • 326 Response Generation
                                      • 33 Evaluation
                                        • 331 Automated Evaluation
                                        • 332 Human Evaluation amp Error Analysis
                                          • 34 Structured Dialogue Acquisition
                                          • 35 Summary
                                          • 36 Future Work
                                            • 4 Deep Learning Re-Implementation of Units
                                              • 41 Introduction
                                                • 411 Outline and Contributions
                                                  • 42 Related Work
                                                  • 43 BERT Bidirectional Encoder Representations from Transformers
                                                  • 44 Data amp Descriptive Statistics
                                                  • 45 Named Entity Recognition
                                                    • 451 Model Settings
                                                      • 46 Next Action Prediction
                                                        • 461 Model Settings
                                                          • 47 Evaluation and Results
                                                          • 48 Error Analysis
                                                          • 49 Summary
                                                          • 410 Future Work
                                                              • Clustering-based Trend Analyzer
                                                                • 5 Foundations
                                                                  • 51 Motivation and Challenges
                                                                  • 52 Detection of Emerging Research Trends
                                                                    • 521 Methods for Topic Modeling
                                                                    • 522 Topic Evolution of Scientific Publications
                                                                      • 53 Summary
                                                                        • 6 Benchmark for Scientific Trend Detection
                                                                          • 61 Motivation
                                                                            • 611 Outline and Contributions
                                                                              • 62 Corpus Creation
                                                                                • 621 Underlying Data
                                                                                • 622 Stratification
                                                                                • 623 Document representations amp Clustering
                                                                                • 624 Trend amp Downtrend Estimation
                                                                                • 625 (Down)trend Candidates Validation
                                                                                  • 63 Benchmark Annotation
                                                                                    • 631 Inter-annotator Agreement
                                                                                    • 632 Gold Trends
                                                                                      • 64 Evaluation Measure for Trend Detection
                                                                                      • 65 Trend Detection Baselines
                                                                                        • 651 Configurations
                                                                                        • 652 Experimental Setup and Results
                                                                                          • 66 Benchmark Description
                                                                                          • 67 Summary
                                                                                          • 68 Future Work
                                                                                            • 7 Discussion and Future Work
                                                                                            • Bibliography
Page 9: Statistical Natural Language Processing Methods for

C O N T E N T S

List of Figures XIList of Tables XIIAcronyms XIII1 introduction 1

11 Outline 3

i e-learning conversational assistant 5

2 foundations 7

21 Introduction 7

22 Conversational Assistants 9

221 Taxonomy 9

222 Conventional Architecture 12

223 Existing Approaches 15

23 Target Domain amp Task Definition 18

24 Summary 19

3 multipurpose ipa via conversational assistant 21

31 Introduction 21

311 Outline and Contributions 22

32 Model 23

321 OMB+ Design 23

322 Preprocessing 23

323 Natural Language Understanding 25

324 Dialogue Manager 26

325 Meta Policy 30

326 Response Generation 32

33 Evaluation 37

331 Automated Evaluation 37

332 Human Evaluation amp Error Analysis 37

34 Structured Dialogue Acquisition 39

35 Summary 39

36 Future Work 40

4 deep learning re-implementation of units 41

41 Introduction 41

411 Outline and Contributions 42

42 Related Work 43

43 BERT Bidirectional Encoder Representations from Trans-formers 44

44 Data amp Descriptive Statistics 46

45 Named Entity Recognition 48

451 Model Settings 49

46 Next Action Prediction 49

461 Model Settings 49

47 Evaluation and Results 50

48 Error Analysis 58

IX

[ September 4 2020 at 1017 ndash classicthesis ]

X contents

49 Summary 58

410 Future Work 59

ii clustering-based trend analyzer 61

5 foundations 63

51 Motivation and Challenges 63

52 Detection of Emerging Research Trends 64

521 Methods for Topic Modeling 65

522 Topic Evolution of Scientific Publications 67

53 Summary 68

6 benchmark for scientific trend detection 71

61 Motivation 71

611 Outline and Contributions 72

62 Corpus Creation 73

621 Underlying Data 73

622 Stratification 73

623 Document representations amp Clustering 74

624 Trend amp Downtrend Estimation 75

625 (Down)trend Candidates Validation 75

63 Benchmark Annotation 77

631 Inter-annotator Agreement 77

632 Gold Trends 78

64 Evaluation Measure for Trend Detection 79

65 Trend Detection Baselines 80

651 Configurations 81

652 Experimental Setup and Results 81

66 Benchmark Description 82

67 Summary 83

68 Future Work 83

7 discussion and future work 85

bibliography 87

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F F I G U R E S

Figure 1 Taxonomy of dialogue systems 10

Figure 2 Example of a conventional dialogue architecture 12

Figure 3 OMB+ online learning platform 24

Figure 4 Ruled-Based System Dialogue flow 36

Figure 5 Macro F1 evaluation NAP ndash default modelComparison between German and multilingualBERT 52

Figure 6 Macro F1 test NAP ndash default model Compari-son between German and multilingual BERT 53

Figure 7 Macro F1 evaluation NAP ndash extended model 54

Figure 8 Macro F1 test NAP ndash extended model 55

Figure 9 Macro F1 evaluation NER ndash Comparison be-tween German and multilingual BERT 56

Figure 10 Macro F1 test NER ndash Comparison between Ger-man and multilingual BERT 57

Figure 11 (Down)trend detection Overall distribution ofpapers in the entire dataset 73

Figure 12 A trend candidate (positive slope) Topic Sen-timent and Emotion Analysis (using Social Me-dia Channels) 76

Figure 13 A downtrend candidate (negative slope) TopicXML 76

XI

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F TA B L E S

Table 1 Natural language representation of a sentence 13

Table 2 Rule 1 Admissible configurations 27

Table 3 Rule 2 Admissible configurations 27

Table 4 Rule 3 Admissible configurations 27

Table 5 Rule 4 Admissible configurations 28

Table 6 Rule 5 Admissible configurations 28

Table 7 Case 1 ID completeness 31

Table 8 Case 2 ID completeness 31

Table 9 Showcase 1 Short flow 33

Table 10 Showcase 2 Organisational question 33

Table 11 Showcase 3 Fallback and human-request poli-cies 34

Table 12 Showcase 4 Contextual question 34

Table 13 Showcase 5 Long flow 35

Table 14 General statistics for conversational dataset 46

Table 15 Detailed statistics on possible systems actionsin conversational dataset 48

Table 16 Detailed statistics on possible named entitiesin conversational dataset 48

Table 17 F1-results for the Named Entity Recognition task 51

Table 18 F1-results for the Next Action Prediction task 51

Table 19 Average dialogue accuracy computed for theNext Action Prediction task 51

Table 20 Trend detection List of gold trends 79

Table 21 Trend detection Example of MAP evaluation 80

Table 22 Trend detection Ablation results 82

XII

[ September 4 2020 at 1017 ndash classicthesis ]

acronyms XIII

AI Artificial Intelligence

API Application Programming Interface

BERT Bidirectional Encoder Representations from Transformers

BiLSTM Bidirectional Long Short Term Memory

CRF Conditional Random Fields

DM Dialogue Manager

DST Dialogue State Tracker

DSTC Dialogue State Tracking Challenge

EC Equivalence Classes

ES ElasticSearch

GM Gaussian Models

IAA Inter Annotator Agreement

ID Informational Dictionary

IPA Intelligent Process Automation

ITS Intelligent Tutoring System

LDA Latent Dirichlet Allocation

LSA Latent Semantic Analysis

LSTM Long Short Term Memory

MAP Mean Average Precision

MER Mutually Exclusive Rules

NAP Next Action Prediction

NER Named Entity Recognition

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OMB+ Online Mathematik Bruumlckenkurs

PL Policy Learning

pLSA probabilistic Latent Semantic Analysis

REST Representational State Transfer

RNN Recurrent Neural Network

RPA Robotic Process Automation

RegEx Regular Expressions

TL Transfer Learning

ToDS Task-oriented Dialogue System

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

1I N T R O D U C T I O N

Robotic Process Automation (RPA) is a type of software bots that im-itates manual human activities like entering data into a system log-ging into accounts or executing simple but repetitive workflows (Mo-hanty and Vyas 2018) In other words RPA-systems allow businessapplications to work without human involvement by replicating hu-man worker activities A further benefit of RPA-bots is that they aremuch cheaper compared to the employment of a human worker andthey are also able to work around-the-clock Overall advantages ofsoftware bots are hugely appealing and include cost savings accuracyand compliance while executing processes improved responsiveness(ie bots are faster than human workers) and finally such bots areusually agile and multi-skilled (Ivancic Vugec and Vukšic 2019)

However one of the main benefits and at the same time ndash draw-backs of the RPA-systems is their ability to precisely fulfill the as-signed task On the one hand the bot can carry out the task accu-rately and diligently On the other hand it is highly susceptible tochanges in defined scenarios Being designed for a particular taskthe RPA-bot is often not adaptable to other domains or even simplechanges in a workflow (Mohanty and Vyas 2018) This inability toadapt to changing conditions gave rise to a further area of improve-ment for RPA-bots ndash Intelligent Process Automation (IPA) systems

IPA-bots combine Robotic Process Automation (RPA) with ArtificialIntelligence (AI) and thus can perform complex and more cognitivelydemanding tasks that require reasoning and language understand-ing Hence IPA-bots went beyond automating simple ldquoclick tasksrdquo (asin RPA) and can accomplish jobs more intelligently ndash employing ma-chine learning algorithms and advanced analytics Such IPA-systemsundertake time-consuming and routine tasks and thus enable smartworkflows and free up skilled workers to accomplish higher-valueactivities

Previously IPA techniques were primarily employed within the com-puter vision domain Starting with the automation of simple displayclick-tasks and going towards sophisticated self-driving cars withtheir ability to interpret a stream of situational data obtained fromsensors and cameras However in recent times Natural LanguageProcessing (NLP) became one of the potential applications for IPA aswell due to its ability to understand and interpret human languageNLP methods are used to analyze large amounts of textual data andto respond to various inquiries Furthermore if the available datais unstructured and does not have any predefined format (ie cus-tomer emails) or if it is available in a variable format (ie invoices

1

[ September 4 2020 at 1017 ndash classicthesis ]

2 introduction

legal documents) then the Natural Language Processing (NLP) tech-niques are applied to extract the multiple relevant information Thisdata then can be utilized to solve distinct problems (ie natural lan-guage understanding predictive analytics) (Mohanty and Vyas 2018)For instance in the case of predictive analytics such a bot wouldmake forecasts about the future using current and historical data andtherethrough assist with sales or investments

However NLP in the context of Intelligent Process Automation (IPA)is not limited to the extraction of relevant data and insights from tex-tual documents One of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots) that are used toenable human-to-machine interaction This type of software is com-monly used for customer support or within recommendation- andbooking systems Conversational agents gain enormous demand dueto their ability to simultaneously support a large number of userswhile communicating in a natural language In a conventional chat-bot system a user provides input in a natural language the bot thenprocesses this query to extract potentially useful information and re-sponds with a relevant reply This process comprises multiple stagesand involves different types of NLP subtasks starting with NaturalLanguage Understanding (NLU) (eg intent recognition named en-tity extraction) and going towards dialogue management (ie deter-mining the next possible bots action considering the dialogue his-tory) and response generation (eg converting the semantic repre-sentation of next bots action to a natural language utterance) Typicaldialogue system for Intelligent Process Automation (IPA) purposesusually undertakes shallow customer support requests (eg answer-ing of FAQs) allowing human workers to focus on more sophisticatedinquiries

Natural Language Processing (NLP) techniques may also be usedindirectly for IPA purposes There are attempts to automatically iden-tify and classify tasks extracted from textual process descriptions asmanual user or automated The goal of such an NLP applicationis to reduce the effort required to identify suitable candidates for fur-ther robotic process automation (Friedrich Mendling and Puhlmann2011 Leopold Aa and Reijers 2018)

Intelligent Process Automation (IPA) is an emerging technology andevolves mainly due to the recognized potential benefit of combiningRobotic Process Automation (RPA) and Artificial Intelligence (AI) tech-niques to solve industrial problems (Ivancic Vugec and Vukšic 2019Mohanty and Vyas 2018) Industries are typically interested in be-ing responsive to customers and markets (Mohanty and Vyas 2018)Thus most customer support applications need to be real-time activeand highly robust to achieve higher client satisfaction rates The man-ual accomplishment of such tasks is often time-consuming expensiveand error-prone Furthermore due to the massive amounts of textualdata available for analysis it seems not to be feasible to process thisdata entirely by human means

[ September 4 2020 at 1017 ndash classicthesis ]

11 outline 3

11 outline

This thesis covers two topics united by the broader theme which isIntelligent Process Automation (IPA) utilizing statistical Natural Lan-guage Processing (NLP) methods

the first block of the thesis (Chapter 2 ndash Chapter 4) addressesthe challenge of implementing a conversational agent for Intelli-gent Process Automation (IPA) purposes within the e-learningfield As mentioned above chatbots are one of the central ap-plications of the IPA domain since they can effectively performtime-consuming tasks requiring natural language understand-ing allowing human workers to concentrate on more valuabletasks The IPA conversational bot that was realized within thisthesis takes care of routine and time-consuming tasks performedby human tutors within an online mathematical course Thisbot is deployed in a real-world setting and interact with stu-dents within the OMB+ mathematical platform Conducting ex-periments we observed two possibilities to build the conversa-tional agent in industrial settings ndash first with purely rule-basedmethods considering the missing training data and individualaspects of the target domain (ie e-learning) Second we re-implemented two of the main system components (ie Natu-ral Language Understanding (NLU) and Dialogue Manager (DM)units) using the current state-of-the-art deep-learning algorithm(ie Bidirectional Encoder Representations from Transformers(BERT)) and investigated their performance and potential use asa part of a hybrid model (ie containing both rule-based andmachine learning methods)

the second block of the thesis (Chapter 5 ndash Chapter 6) con-siders an Intelligent Process Automation (IPA) problem withinthe predictive analytics domain and addresses the task of scien-tific trend detection Predictive analytics makes forecasts aboutfuture outcomes based on historical and current data Henceorganizations can benefit through the advanced analytics mod-els by detecting trends and their behaviors and then employ-ing this information while making crucial business decisions(ie investments) In this work we deal with the subproblemof trend detection task ndash specifically we address the lack ofpublicly available benchmarks for the evaluation of trend de-tection algorithms Therefore we assembled the benchmark forboth trends and downtrends (ie topics that become less frequentovertime) detection To the best of our knowledge the task ofdowntrend detection has not been well addressed before Fur-thermore the resulting benchmark is based on a collection ofmore than one million documents which is among the largestthat has been used for trend detection before and therefore of-fers a realistic setting for developing trend detection algorithms

[ September 4 2020 at 1017 ndash classicthesis ]

4 introduction

All main chapters contain their specific introduction contribution listrelated work section and summary

[ September 4 2020 at 1017 ndash classicthesis ]

Part I

E - L E A R N I N G C O N V E R S AT I O N A L A S S I S TA N T

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

2F O U N D AT I O N S

In this block we address the challenge of designing and implement-ing a conversational assistant for Intelligent Process Automation (IPA)purposes within the e-learning domain The system aims to supporthuman tutors of the Online Mathematik Bruumlckenkurs (OMB+) plat-form during the tutoring round by undertaking routine and time-consuming responsibilities Specifically the system intelligently auto-mates the process of information accumulation and validation thatprecede every conversation Therethrough the IPA-bot frees up tutorsand allows them to concentrate on more complicated and cognitivelydemanding tasks

In this chapter we introduce the fundamental concepts required bythe topic of the first part of the thesis We begin with the foundationsto conversational assistants in Section 22 where we present the taxon-omy of existing dialogue systems in Section 221 give an overview ofthe conventional dialogue architecture in Section 222 with its mostessential components and describe the existing approaches for build-ing a conversational agent in Section 223 We finally introduce thetarget domain for our experiments in Section 23

21 introduction

Conversational Assistants - a type of software that interacts with peo-ple via written or spoken natural language have become widely pop-ularized in recent times Todayrsquos conversational assistants support abroad range of everyday skills including but not limited to creatinga reminder answering factoid questions and controlling smart homedevices

Initially the notion of artificial companion capable of conversingin a natural language was anticipated by Alan Turing in 1949 (Tur-ing 1949) Still the first significant step in this direction was donein 1966 with the appearance of Weizenbaums system ELIZA (Weizen-baum 1966) Its successor was the ALICE (Artificial LinguisticInternet Computer Entity) ndash a natural language processing dialoguesystem that conversed with a human by applying heuristical patternmatching rules Despite being remarkably popular and technologi-cally advanced for their days both systems were unable to pass theTuring test (Alan 1950) Following a long silent period the nextwave of growth of intelligent agents was caused by the advances inNatural Language Processing (NLP) This was enabled by access tolow-cost memory higher processing speed vast amounts of available

7

[ September 4 2020 at 1017 ndash classicthesis ]

8 foundations

data and sophisticated machine learning algorithms Hence one ofthe most notable leaps in the history of conversational agents beganin the year 2011 with intelligent personal assistants In that time Ap-ples Siri followed by Microsofts Cortana and Amazons Alexa in 2014and then Google Assistant in 2016 came to the scene The main focusof these agents was not to mimic humans as in ELIZA or ALICEbut to assist a human by answering simple factoid questions and set-ting notification tasks across a broad range of content Another typeof conversational agent called Task-oriented Dialogue System (ToDS)became a focus of industries in the year 2016 For the most part thesebots aimed to support customer service (Cui et al 2017) and respondto Frequently Asked Questions (Liao et al 2018) Their conversa-tional style provided more depth than those of personal assistantsbut a narrower range since they were patterned for task solving andnot for open-domain discussions

One of the central advantages of conversational agents and the rea-son why they are attracting considerable interest is their ability togive attention to several users simultaneously while supporting nat-ural language communication Due to this conversational assistantsgain momentum in numerous kinds of practical support applications(ie customer support) where a high number of users are involved atthe same time Classical engagement of human assistants is very cost-efficient and such support is usually not full-time accessible There-fore to automate human assistance is desirable but also an ambi-tious task Due to the lack of solid Natural Language Understand-ing (NLU) advancing beyond primary tasks is still challenging (Lugerand Sellen 2016) and even the most prosperous dialogue systemsoften fail to meet userrsquos expectations (Jain et al 2018) Significantchallenges involved in the design and implementation of a conversa-tional agent varying from domain engineering to Natural LanguageUnderstanding (NLU) and designing of an adaptive domain-specificconversational flow The quintessential problems and how-to ques-tions while building conversational assistants include

bull How to deal with variability and flexibility of language

bull How to get high-quality in-domain data

bull How to build meaningful representations

bull How to integrate commonsense and domain knowledge

bull How to build a robust system

Besides that current Natural Language Processing (NLP) and Ar-tificial Intelligence (AI) techniques have weaknesses when applied todomain-specific task-oriented problems especially if underlying datacomes from an industrial source As machine learning techniquesheavily rely on structured and cleaned training data to deliver consis-tent performance typically noisy real-world data without access toadditional knowledge databases negatively impacts the classificationaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 9

Despite that conversational agents could be successfully employedto solve less cognitive but still highly relevant tasks In this case adialogue agent could be operated as a part of the Intelligent ProcessAutomation (IPA) system where it takes care of straightforward butroutine and time-consuming functions performed in a natural lan-guage while allowing a human worker to focus on more cognitivedemanding processes

22 conversational assistants

Development and design of a particular dialogue system usually varyby its objectives among the most fundamental are

bull Purpose Should it be an open-ended dialogue system (egchit-chat) task-oriented system (eg customer support) or apersonal assistant (eg Google Assistant)

bull Skills What kind of behavior should a system demonstrate

bull Data Which data is required to develop (resp train) specificblocks and how could it be assembled and curated

bull Deployment Will the final system be integrated into a partic-ular software platform What are the challenges in choosingsuitable tools How will the deployment happen

bull Implementation Should it be a rule-based information-retrievalmachine-learning or a hybrid system

In the following chapter we will take a look at the most commontaxonomies of dialogue systems and different design and implemen-tation strategies

221 Taxonomy

According to the goals conversational agents could be classified intotwo main categories task-oriented systems (eg for customer support)and social bots (eg for chit-chat purposes) Furthermore systemsalso vary regarding their implementation characteristics Below weprovide a detailed overview of the most common variations

task-oriented dialogue systems are famous for their abilityto lead a dialogue with the ultimate goal of solving a userrsquos problem(see Figure 1) A customer support agent or a flightrestaurant book-ing system exemplify this class of conversational assistants Suchsystems are more meaningful and useful as those with an entirely so-cial goal however they usually affect more complications during theimplementation stage Since Task-oriented Dialogue System (ToDS) re-quire a precise understanding of the userrsquos intent (ie goal) the Natu-ral Language Understanding (NLU) segment must perform flawlessly

[ September 4 2020 at 1017 ndash classicthesis ]

10 foundations

Besides that if applied in an industrial setting ToDS are usually builtmodular which means they mostly rely on handcrafted rules and pre-defined templates and are restricted in their abilities Personal assis-tants (ie Google Assistant) are members of the task-oriented familyas well Though compared to standard agents they involve more so-cial conversation and could be used for entertainment purposes (egldquoOk Google tell me a jokerdquo) The latter is typically not available inconventional task-oriented systems

social bots and chit-chat systems in contrast regularly donot hold any specific goal and are designed for small-talk and enter-tainment conversations (see Figure 1) Those agents are usually end-to-end trainable and highly data-driven which means they requirelarge amounts of training data However due to entirely learnabletraining such systems often produce inappropriate responses mak-ing them extremely unreliable for practical applications Alike chit-chat systems social bots are rich in social conversation however theypossess an ability to solve uncomplicated user requests and conducta more meaningful dialogue (Fang et al 2018) Those requests arenot the same as in task-oriented systems and are mostly pointed toanswer simple factoid questions

Figure 1 Taxonomy of dialogue systems according to the measure of theirtask accomplishment and social contribution Figure originatesfrom Fang et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 11

The aforementioned system types could be decomposed farther ac-cording to the implementation characteristics

open domain amp closed domain A Task-oriented Dialogue Sys-tem (ToDS) is typically built within a closed domain setting Thespace of potential inputs and outputs is restricted since the sys-tem attempts to achieve a specific goal Customer support orshopping assistants are cases of closed domain problems (Wenet al 2016) These systems do not need to be able to talk aboutoff-site topics but they have to fulfill their particular tasks asefficiently as possible

Whereas chit-chat systems do not assume a well-defined goalor intention and thus could be built within an open domainThis means the conversation can progress in any direction andwithout any or minimal functional purpose (Serban et al 2016)However the infinite number of various topics and the fact thata certain amount of commonsense is required to create reason-able responses makes an open-domain setting a hard problem

content-driven amp user-centric Social bots and chit-chat sys-tems are typically utilizing a content-driven dialogue mannerThat means that their responses are based on the large and dy-namic content derived from the daily web mining and knowl-edge graphs This approach builds a dialogue based on trendycontent from diverse sources and thus it is an ideal way forsocial bots to catch usersrsquo attention and bring a user into thedialogue

User-centric approaches in turn assume precise language un-derstanding to detect the sentiment of usersrsquo statements Ac-cording to the sentiment the dialogue manager learns usersrsquopersonalities handles rapid topic changes and tracks engage-ment In this case the dialogue builds on specific user interests

long amp short conversations Models with a short conversationstyle attempt to create a single response to a single input (egweather forecast) In the case with a long conversation a systemhas to go through multiple turns and to keep track of what hasbeen said before (ie dialogue history) The longer the conver-sation the more challenging it is to train a system Customersupport conversations are typically long conversational threadswith multiple questions

response generation One of the essential elements of a dialoguesystem is a response generation This could be either done in aretrieval-based manner or by using generative models (ie NaturalLanguage Generation (NLG))

The former usually utilizes predefined responses and heuristicsto select a relevant response based on the input and context Theheuristic could be a rule-based expression match (ie RegularExpressions (RegEx)) or an ensemble of machine learning clas-

[ September 4 2020 at 1017 ndash classicthesis ]

12 foundations

sifiers (eg entity extraction prediction of next action) Suchsystems do not generate any original text instead they select aresponse from a fixed set of predefined candidates These mod-els are usually used in an industrial setting where the systemmust show high robustness performance and interpretabilityof the outcomes

The latter in contrast do not rely on predefined replies Sincesuch models are typically based on (deep-) machine learningtechniques they attempt to generate new responses from scratchThis is however a complicated task and is mostly used forvanilla chit-chat systems or in a research domain where thestructured training data is available

222 Conventional Architecture

In this and the following subsection we review the pipeline and meth-ods for Task-oriented Dialogue System (ToDS) Such agents are typi-cally composed of several components (Figure 2) addressing diversetasks required to facilitate a natural language dialogue (WilliamsRaux and Henderson 2016)

Figure 2 Example of a conventional dialogue architecture with its maincomponents in the bounding box Language Understanding (NLU)Dialogue Manager (DM) Response Generation (NLG) Figure origi-nates from NAACL 2018 Tutorial Su et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 13

A conventional dialogue architecture implements this with the fol-lowing components

natural language understanding unit has the following ob-jectives it parses the user input classifies the domain (ie cus-tomer support restaurant booking not required for a singledomain systems) and intent (ie find-flight book-taxi) and ex-tracts entities (ie Italian food Berlin)

Generally NLU unit maps every user utterance into semanticframes that are predefined according to different scenarios Ta-ble 1 illustrates an example of sentence representation wherethe departure-location (Berlin) and the arrival-location (Munich)are specified as slot values and the domain (airline travel) andintent (find-flight) are specified as well Typically there are twokinds of possible representations The first one is the utterancecategory represented in the form of the userrsquos intent The sec-ond could be the word-level information extraction in the formof Named Entity Recognition (NER) or Slot Filling tasks

Sentence Find flight from Berlin to Munich today

SlotsConcepts O O O B-dept O B-arr B-date

Named Entity O O O B-city O B-city O

Intent Find_Flight

Domain Airline Travel

Table 1 An illustrative example of a sentence representation1

A dialog act classification sub-module also known as intent clas-sification is dedicated to detecting the primary userrsquos goal atevery dialogue state It labels each userrsquos utterance with oneof the predefined intents Deep neural networks can be directlyapplied to this conventional classification problem (KhanpourGuntakandla and Nielsen 2016 Lee and Dernoncourt 2016)

Slot filling is another sub-module for language understandingUnlike intent detection (ie classification problem) slot fillingis defined as a sequence labeling problem where words in thesequence are labeled with semantic tags The input is a se-quence of words and the output is a sequence of slots onefor every word Variations of Recurrent Neural Network (RNN)architectures (LSTM BiLSTM) (Deoras et al 2015) and (attention-based) encoder-decoder architectures (Kurata et al 2016) havebeen employed for this task

[ September 4 2020 at 1017 ndash classicthesis ]

14 foundations

dialogue manager unit is subdivided into two components ndashfirst is a Dialogue State Tracker (DST) that holds the conver-sational state and second is a Policy Learning (PL) that definesthe systems next action

Dialogue State Tracker (DST) is the core component to guaranteea robust dialogue flow since it manages the userrsquos intent at ev-ery turn The conventional methods which are broadly appliedin commercial implementations often adopt handcrafted rulesto pick the most probable candidate (Goddeau et al 1996 Wangand Lemon 2013) among the predefined Though a variety ofstatistical approaches have emerged since the Dialogue StateTracking Challenge (DSTC)2 was arranged by Jason D Williamsin year 2013 Among the deep-learning based approaches isthe Belief Tracker proposed by Henderson Thomson and Young(2013) It employs a sliding window to produce a sequence ofprobability distributions over an arbitrary number of possiblevalues (Chen et al 2017) In the work of Mrkšic et al (2016) au-thors introduced a Neural Belief Tracker to identify the slot-valuepairs The system used the user intents preceding the user in-put the user utterance itself and a candidate slot-value pairwhich it needs to decide about as the input Then the systemiterated overall candidate slot-value pairs to determine whichones have just been expressed by the user (Chen et al 2017)Finally Vodolaacuten Kadlec and Kleindienst proposed a Hybrid Di-alog State Tracker that achieved the state-of-the-art performancefor DSTC2 shared task Their model used a separate-learned de-coder coupled with a rule-based system

Based on the state representation from the Dialogue State Tracker(DST) the PL unit generates the next system action that is thendecoded by the Natural Language Generation (NLG) moduleUsually a rule-based agent is used to initialize the system (Yanet al 2017) and then supervised learning is applied to theactions generated by these rules Alternatively the dialoguepolicy can be trained end-to-end with reinforcement learningto manage the system making policies toward the final perfor-mance (Cuayaacutehuitl Keizer and Lemon 2015)

natural language generation unit transforms the next actioninto a natural language response Conventional approaches toNatural Language Generation (NLG) typically adopt a template-based style where a set of rules is defined to map every nextaction to a predefined natural language response Such ap-proaches are simple generally error-free and easy to controlhowever their implementation is time-consuming and the styleis repetitive and hardly scalable

Among the deep learning techniques is the work by Wen etal (2015) that introduced neural network-based approaches toNLG with an LSTM-based structure Later Zhou and Huang

2Currently Dialog System Technology Challenge

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 15

(2016) adopted this model along with the question informa-tion semantic slot values and dialogue act types to generatemore precise answers Finally Wu Socher and Xiong (2019)proposed Global to Local Memory Pointer Networks (GLMP) ndash anarchitecture that incorporates knowledge bases directly into alearning framework (due to the Memory Networks component)and is able to re-use information from the user input (due tothe Pointer Networks component)

In particular cases dialogue flow can include speech recognitionand speech synthesis units (if managing a spoken natural languagesystem) and connection to third-party APIs to request informationstored in external databases (eg weather forecast) (as depicted inFigure 2)

223 Existing Approaches

As stated above a conventional task-oriented dialogue system is im-plemented in a pipeline manner That means that once the systemreceives a user query it interprets it and acts according to the dia-logue state and the corresponding policy Additionally based on un-derstanding it may access the knowledge base to find the demandedinformation there Finally a system transforms a systemrsquos next ac-tion (and if given an extracted information) into its surface form as anatural language response (Monostori et al 1996)

Individual components (ie Natural Language Understanding (NLU)Dialogue Manager (DM) Natural Language Generation (NLG)) of aparticular conversational system could be implemented using differ-ent approaches starting with entirely rule- and template-based meth-ods and going towards hybrid approaches (using learnable componentsalong with handcrafted units) and end-to-end trainable machine learn-ing methods

rule-based approaches Though many of the latest researchapproaches handle Natural Language Understanding (NLU) and Nat-ural Language Generation (NLG) units by using statistical NaturalLanguage Processing (NLP) models (Bocklisch et al 2017 Burtsevet al 2018 Honnibal and Montani 2017) most of the industriallydeployed dialogue systems still utilise handcrafted features and rulesfor the state tracking action prediction intent recognition and slotfilling tasks (Chen et al 2017 Ultes et al 2017) So most of thePyDial3 framework modules offer rule-based implementations usingRegular Expressions (RegEx) rule-based dialogue trackers (Hender-son Thomson and Williams 2014) and handcrafted policies Alsoin order to map the next action to text Ultes et al (2017) suggest rulesalong with a template-based response generation

3PyDial Framework httpwwwcamdialorgpydial

[ September 4 2020 at 1017 ndash classicthesis ]

16 foundations

The rule-based approach ensures robustness and stable performancethat is crucial for industrial systems that communicate with a largenumber of users simultaneously However it is highly expensive andtime-consuming to deploy a real dialogue system built in that man-ner The major disadvantage is that the usage of handcrafted systemsis restricted to a specific domain and possible adaptation requiresextensive manual engineering

end-to-end learning approaches Due to the recent advanceof end-to-end neural generative models (Collobert et al 2011) manyefforts have been made to build an end-to-end trainable architecturefor conversational agents Rather than using the traditional pipeline(see Figure 2) the end-to-end model is conceived as a single module(Chen et al 2017)

Wen et al (2016) followed by Bordes Boureau and Weston (2016)were among the pioneers who adapted an end-to-end trainable ap-proach for conversational systems Their approach was to treat dia-logue learning as the problem of learning a mapping from dialoguehistories to system responses and to apply an encoder-decoder model(Sutskever Vinyals and Le 2014) to train the entire system The pro-posed model still had its significant limitations (Glasmachers 2017)such as missing policies to learn a dialogue flow and the inability toincorporate additional knowledge that is especially crucial for train-ing a task-oriented agent To overcome the first problem Dhingraet al (2016) introduced an end-to-end reinforcement learning approachthat jointly trains dialogue state tracker and policy learning unitsThe second weakness was addressed by Eric and Manning (2017)who augmented existing recurrent network architectures with a dif-ferentiable attention-based key-value retrieval mechanism over theentries of a knowledge base This approach was inspired by Key-Value Memory Networks (Miller et al 2016) Later on Wu Socherand Xiong (2019) proposed Global to Local Memory Pointer Networks(GLMP) where the authors incorporated knowledge bases directlyinto a learning framework In their model a global memory encoderand a local memory decoder are employed to share external knowl-edge The encoder encodes dialogue history modifies global con-textual representation and generates a global memory pointer Thedecoder first produces a sketch response with empty slots Next ittransfers the global memory pointer to filter the external knowledgefor relevant information and then instantiates the slots via the localmemory pointers

Despite having better adaptability compared to any rule-based sys-tem and being simpler to train end-to-end approaches remain unattain-able for commercial conversational agents that operating on real-world data A well and carefully constructed task-oriented dialoguesystem in an observed domain using handcrafted rules (in NLU andDST units) and predefined responses for NLG still outperforms the

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 17

end-to-end systems due to its robustness (Glasmachers 2017 WuSocher and Xiong 2019)

hybrid approaches Though end-to-end learning is an attrac-tive solution for conversational systems current techniques are data-intensive and require large amounts of dialogues to learn simple be-haviors To overcome this barrier Williams Asadi and Zweig (2017)introduce Hybrid Code Networks (HCNs) which is an ensemble ofretrieval and trainable units System utilizes an Recurrent NeuralNetwork (RNN) for Natural Language Understanding (NLU) blockdomain-specific knowledge that are encoded as software (ie API

calls) and custom action templates used for Natural Language Gen-eration (NLG) unit Compared to existing end-to-end methods theauthors report that their approach considerably reduces the amountof data required for training (Williams Asadi and Zweig 2017) Ac-cording to the authors HCNs achieve state-of-the-art performance onthe bAbI dialog dataset (Bordes Boureau and Weston 2016) and out-perform two commercial customer dialogue systems4

Along with this work Wen et al (2016) propose a neural network-based model that is end-to-end trainable but still modularly con-nected In their work authors treat dialogue as a sequence to se-quence mapping problem (Sutskever Vinyals and Le 2014) aug-mented with the dialogue history and the current database searchoutcome (Wen et al 2016) The authors explain the learning processas follows At every turn the system takes a user input representedas a sequence of tokens and converts it into two internal represen-tations a distributed representation of the intent and a probabilitydistribution over slot-value pairs called the belief state (Young et al2013) The database operator then picks the most probable values inthe belief state to form the search result At the same time the intentrepresentation and the belief state are transformed and combined bya Policy Learning (PL) unit to form a single vector representing thenext system action This system action is then used to generate asketch response token by the token The final system response iscomposed by substituting the actual values of the database entriesinto the sketch sentence structure (Wen et al 2016)

Hybrid models appear to replace the established rule- and template-based approaches currently utilized in an industrial setting The per-formance of the most Natural Language Understanding (NLU) com-ponents (ie Named Entity Recognition (NER) domain and intentclassification) is reasonably high to apply them for the developmentof commercial conversational agents Whereas the Natural LanguageGeneration (NLG) unit could still be realized as a template-based so-lution by employing predefined response candidates and predictingthe next system action employing deep learning systems

4Internal Microsoft Research datasets in customer support and troubleshootingdomains were employed for training and evaluation

[ September 4 2020 at 1017 ndash classicthesis ]

18 foundations

23 target domain amp task definition

MUMIE5 is an open-source e-learning platform for studying and teach-ing mathematics and computer science Online Mathematik Bruumlck-enkurs (OMB+) is one of the projects powered by MUMIE SpecificallyOnline Mathematik Bruumlckenkurs (OMB+)6 is a German online learningplatform that assists students who are preparing for an engineeringor computer science study at a university

The coursersquos central purpose is to assist students in reviving theirmathematical skills so that they can follow the upcoming universitycourses This online course is thematically segmented into 13 partsand includes free mathematical classes with theoretical and practi-cal content Every chapter begins with an overview of its sectionsand consists of explanatory texts with built-in videos examples andquick checks of understanding Furthermore various examinationmodes to practice the content (ie training exercises) and to checkthe understanding of the theory (ie quiz) are available Besidesthat OMB+ provides a possibility to get assistance from a human tu-tor Usually the students and tutors interact via a chat interface inwritten form The language of communication is German

The current dilemma of the OMB+ platform is that the number ofstudents grows significantly every year but it is challenging to findqualified tutors ndash and it is expensive to hire more human tutors Thisresults in a longer waiting period for students until their problemscan be considered and processed In general all student questionscan be grouped into three main categories Organizational questions(ie course certificate) contextual questions (ie content theorem) andmathematical questions (exercises solutions) To assist a student witha mathematical question a tutor has to know the following regularinformation What kind of topic (or subtopic) a student has a problemwith At which examination mode (quiz chapter level training or exer-cise section level training or exercise or final examination) a studentis working right now And finally the exact question number and exactproblem formulation This means that a tutor has to ask the same on-boarding questions every time a new conversation starts This is how-ever very time-consuming and could be successfully solved within anIntelligent Process Automation (IPA) system

The work presented in Chapter 3 and Chapter 4 was enabled by thecollaboration with MUMIE and Online Mathematik Bruumlckenkurs (OMB+)team and addresses the abovementioned problem Within Chapter 3we implemented an Intelligent Process Automation (IPA) dialogue sys-tem with rule- and retrieval-based algorithms that interacts with stu-dents and performs the onboarding This system is multipurposeon the one hand it eases the assistance process for human tutors byreducing repetitive and time-consuming activities and therefore al-lows workers to focus on more challenging tasks Second interacting

5Mumie httpswwwmumienet6httpswwwombplusde

[ September 4 2020 at 1017 ndash classicthesis ]

24 summary 19

with students it augments the resources with structured and labeledtraining data The latter in turn allowed us to re-train specific sys-tem components utilizing machine and deep learning methods Wedescribe this procedure in Chapter 4

We report that the earlier collected human-to-human data from theOMB+ platform could not be used for training a machine learning sys-tem since it is highly unstructured and contains no direct question-answer matching which is crucial for trainable conversational algo-rithms We analyzed these conversations to get insight from themthat we used to define the dialogue flow

We also do not consider the automatization of mathematical ques-tions in this work since this task would require not only preciselanguage understanding and extensive commonsense knowledge butalso the accurate perception of mathematical notations The tutor-ing process at OMB+ differentiates a lot from the standard tutoringmethods Here instructors attempt to navigate students by givinghints and tips instead of providing an out-of-the-box solution Exist-ing human-to-human dialogues revealed that such kind of task couldbe even challenging for human tutors since it is not always directlyperceptible what precisely the student does not understand or hasdifficulties with Furthermore dialogues containing mathematical ex-planations could last for a long time and the topic (ie intent andfinal goal) can shift multiple times during the conversation

To our knowledge this task would not be feasible to sufficientlysolve with current machine learning algorithms especially consider-ing the missing training data A rule-based system would not bethe right solution for this purpose as well due to a large number ofcomplex rules which would need to be defined Thus in this workwe focus on the automatization of less cognitively demanding taskswhich are still relevant and must be solved in the first instance to easethe tutoring process for instructors

24 summary

Conversational agents are a highly demanding solution for indus-tries due to their potential ability to support a large number of usersmulti-threaded while performing a flawless natural language conver-sation Many messenger platforms are created and custom chatbotsin various constellations emerge daily Unfortunately most of thecurrent conversational agents are often disappointing to use due tothe lack of substantial natural language understanding and reason-ing Furthermore Natural Language Processing (NLP) and ArtificialIntelligence (AI) techniques have limitations when applied to domain-specific and task-oriented problems especially if no structured train-ing data is available

Despite that conversational agents could be successfully applied tosolve less cognitive but still highly relevant tasks ndash that is within the

[ September 4 2020 at 1017 ndash classicthesis ]

20 foundations

Intelligent Process Automation (IPA) domain Such IPA-bots wouldundertake tedious and time-consuming responsibilities to free up theknowledge worker for more complex and intricate tasks IPA-botscould be seen as a member of goal-oriented dialogue family and arefor the most part performed either in a rule-based manner or throughhybrid models when it comes to a real-world setting

[ September 4 2020 at 1017 ndash classicthesis ]

3M U LT I P U R P O S E I PA V I A C O N V E R S AT I O N A LA S S I S TA N T

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research describedin this chapter was carried out in its entirety by the au-thor of this thesis The other authors of the publicationacted as advisers At the time of writing this thesis thesystem described in this work was deployed at the OnlineMathematik Bruumlckenkurs (OMB+) platform

As we outlined in the introduction Intelligent Process Automa-tion (IPA) is an emerging technology with the primary goal of assist-ing the knowledge worker by taking care of repetitive routine andlow-cognitive tasks Conversational assistants that interact with usersin a natural language are a potential application for the IPA domainSuch smart virtual agents can support the user by responding to var-ious inquiries and performing routine tasks carried out in a naturallanguage (ie customer support)

In this work we tackle the challenge of implementing an IPA con-versational agent in a real-world industrial context and conditions ofmissing training data Our system has two meaningful benefits Firstit decreases monotonous and time-consuming tasks and hence letsworkers concentrate on more intellectual processes Second interact-ing with users it augments the resources with structured and labeledtraining data The latter in turn can be used to retrain a systemutilizing machine and deep learning methods

31 introduction

Conversational dialogue systems can give attention to several userssimultaneously while supporting natural communication They arethus exceptionally needed for practical assistance applications wherea high number of users are involved at the same time Regularly di-alogue agents require a lot of natural language understanding andcommonsense knowledge to hold a conversation reasonably and in-telligently Therefore nowadays it is still challenging to substitutehuman assistance with an intelligent agent completely However adialogue system could be successfully employed as a part of the In-telligent Process Automation (IPA) application In this case it wouldtake care of simple but routine and time-consuming tasks performed

21

[ September 4 2020 at 1017 ndash classicthesis ]

22 multipurpose ipa via conversational assistant

in a natural language while allowing a human worker to focus onmore cognitive demanding processes

Recent research in the dialogue generation domain is conducted byemploying Artificial Intelligence (AI) techniques like machine- anddeep learning (Lowe et al 2017 Wen et al 2016) However thosemethods require high-quality data for training that is often not di-rectly available for domain-specific and industrial problems Eventhough companies gather and store vast volumes of data it is of-ten low-quality with regards to machine learning approaches (Daniel2015) Especially if it concerns dialogue data which has to be prop-erly structured as well annotated Whereas if trained on artificiallycreated datasets (which is often the case in the research setting) thesystems can solve only specific problems and are hardly adaptableto other domains Hence despite the popularity of deep learningend-to-end models one still needs to rely on traditional pipelines inpractical dialogue engineering mainly while introducing a new do-main

311 Outline and Contributions

This work addresses the challenge of implementing a dialogue systemfor IPA purposes within the practical e-learning domain under condi-tions of missing training data The chapter is structured as followsIn Section 32 we introduce our method and describe the design ofthe rule-based dialogue architecture with its main components suchas Natural Language Understanding (NLU) Dialogue Manager (DM)and Natural Language Generation (NLG) units In Section 33 wepresent the results of our dual-focused evaluation and Section 34 de-scribes the format of collected structured data Finally we concludein Section 35

Our contributions within this work are as follows

bull We implemented a robust conversational system for IPA pur-poses within a practical e-learning domain and under the con-ditions of missing training (ie dialogue) data

bull The system has two main objectives

ndash First it reduces repetitive and time-consuming activitiesand allows workers of the e-learning platform to focussolely on mathematical inquiries

ndash Second by interacting with users it augments the resourceswith structured and partially labeled training data for fur-ther implementation of trainable dialogue components

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 23

32 model

The central purpose of the introduced system is to interact with stu-dents at the beginning of every chat conversation and gather informa-tion on the topic (and sub-topic) examination mode and level questionnumber and exact problem formulation Such onboarding process savestime for human tutors and allows them to handle mathematical ques-tions solely Furthermore the system is realized in a way such that itaccumulates labeled and structured dialogues in the background

Figure 4 displays the intact conversation flow After the systemaccepts user input it analyzes it at different levels and extracts infor-mation if such is provided If some of the information required fortutoring is missing the system asks the student to provide it Whenall the information points are collected they are automatically vali-dated and forwarded to a human tutor The latter can then directlyproceed with the assistance process In the following we explain thekey components of the system

321 OMB+ Design

Figure 3 illustrates the internal structure and design of the OMB+ plat-form It has topics and sub-topics as well as four various examinationmodes Each topic (Figure 3 tag 1) correspond to a chapter level and al-ways has sub-topics (Figure 3 tag 2) that correspond to a section levelExamination modes training and exercise are ambiguous because theycorrespond to either a chapter (Figure 3 tag 3) or a section (Figure 3tag 5) level and it is important to differentiate between them sincethey contain different types of content The mode final examination(Figure 3 tag 4) always corresponds to a chapter level whereas quiz(Figure 3 tag 5) can belong only to a section level According to thedesign of the OMB+ platform there are several ways of how a possibledialogue flow can proceed

322 Preprocessing

In a natural language conversation one may respond in various formsmaking the extraction of data from user-generated text challengingdue to potential misspellings or confusable spellings (ie Aufgabe 1aAufgabe 1 (a)) Hence to facilitate a substantial recognition of intentsand extraction of entities we normalize (eg correct misspellingsdetect synonyms) and preprocess every user input before moving for-ward to the Natural Language Understanding (NLU) module

[ September 4 2020 at 1017 ndash classicthesis ]

24 multipurpose ipa via conversational assistant

Figure 3 OMB+ Online Learning Platform where 1 is the Topic (corre-sponds to a chapter level) 2 is a Sub-Topic (corresponds to a sectionlevel) 3 is chapter level examination mode 4 is the Final Exam-ination Mode (available only for chapter level) 5 are the Examina-tion Modes Exercise Training (available at section levels) and Quiz(available only at section level) and 6 is the OMB+ Chat

We perform both linguistic and domain-specific preprocessing whichinclude the following steps

bull lowercasing and stemming of words in the input query

bull removal of stop words and punctuation

bull all mentions of X in mathematical formulations are removed toavoid confusion with roman number 10 (ldquoXrdquo)

bull in a combination of the type word ldquoKapitelAufgaberdquo + digitwritten as a word (ie ldquoeinsrdquo ldquozweitesrdquo etc) word is replacedwith a digit (ldquoKapitel einsrdquo rarr ldquoKapitel 1rdquo ldquoim fuumlnften Kapitelrdquo rarrldquoim Kapitel 5rdquo) roman numbers are replaced with a digit as well(ldquoKapitel IVrdquorarr ldquoKapitel 4rdquo)

bull detected ambiguities are normalized (eg ldquoTrainingsaufgaberdquorarrldquoTrainingrdquo)

bull recognized misspellings resp type errors are corrected (egldquoDifeernzialrechnungrdquorarr ldquoDifferentialrechnungrdquo)

bull permalinks are parsed and analyzed From each permalink it ispossible to extract topic examination mode and question num-ber

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 25

323 Natural Language Understanding

We implement the Natural Language Understanding (NLU) unit uti-lizing handcrafted rules Regular Expressions (RegEx) and Elasctic-search1 API

intent classification According to the design of the OMB+platform all student questions can be classified into three cate-gories organizational questions (ie course certificate) contextualquestions (ie content theorem) and mathematical questions (exer-cises solutions) To classify the input query by its intent (ie cat-egory) we employ weighted key-word information in the formof handcrafted rules We assume that specific words are explic-itly associated with a corresponding intent (eg theorem or rootdenote the mathematical question feedback or certificate denoteorganizational inquire) If no intent could be ordered then itis assumed that the NLU unit was not capable of understand-ing and the intent is unknown In this case the virtual agentrequests the user to provide an intent manually by picking onefrom the mentioned three options The queries from organiza-tional and theoretical categories are directly handed over to ahuman tutor while mathematical questions are analyzed by theautomated agent for further information extraction

entity extraction On the next step the system retrieves the en-tities from a user message We specify five following prerequi-site entities which have to be collected before the dialogue canbe handed over to a human tutor topic sub-topic examina-tion mode and level and question number This part is imple-mented using ElasticSearch (ES) and RegEx To facilitate the useof ES we indexed the OMB+ web-page to an internal databaseBesides indexing the titles of topics and sub-topics we also pro-vided supplementary information on possible synonyms andwriting styles We additionally filed OMB+ permalinks whichdirect to the site pages To query the resulting database we em-ploy the internal Elasticsearch multi match function and set theminimum should match parameter to 20 This parameter spec-ifies the number of terms that must match for a document tobe considered relevant Besides that we adopted fuzziness withthe maximum edit distance set to 2 characters The fuzzy queryuses similarity based on Levenshtein edit distance (Levenshtein1966) Finally the system produces a ranked list of potentialmatching entries found in the database within the predefinedrelevance threshold (we set it to θ=15) We then pick the mostprobable entry as the right one and select the correspondingentity from the user input

Overall the Natural Language Understanding (NLU) module re-ceives the user input as a preprocessed text and examines it across all

1httpswwwelasticcoproductselasticsearch

[ September 4 2020 at 1017 ndash classicthesis ]

26 multipurpose ipa via conversational assistant

predefined RegEx statements and for a match in the ElasticSearch (ES)database We use ES to retrieve entities for the topic sub-topic andexamination mode whereas RegEx is used to extract the informationon a question number Every time the entity is extracted it is filled inthe Informational Dictionary (ID) The ID has the following six slotsto be filled in topic sub-topic examination level examination modequestion number and exact problem formulation (this information isrequested on further steps)

324 Dialogue Manager

A conventional Dialogue Manager (DM) consists of two interactingcomponents The first component is the Dialogue State Tracker (DST)that maintains a representation of the current conversational stateand the second is the Policy Learning (PL) that defines the next systemaction (ie response)

In our model every agentrsquos next action is determined by the stateof the previously obtained information accumulated in the Informa-tional Dictionary (ID) For instance if the system recognizes that thestudent works on the final examination it also knows (defined by hand-coded logic in the predefined rules) that there is no necessity to askfor sub-topic because the final examination always corresponds to achapter level (due to the layout of OMB+ platform) If the system rec-ognizes that the user struggles in solving quiz it has to ask for boththe corresponding topic and sub-topic because the quiz always relatesto a section level

To discover all of the potential conversational flows we implementMutually Exclusive Rules (MER) which indicate that two events e1and e2 are mutually exclusive or disjoint since they cannot both oc-cur at the same time (ie the intersection of these events is emptyP(A cap B) = 0) Additionally we defined transition and mapping rulesHence only particular entities can co-occur while others must beexcluded from the specific rule combination Assume a topic t =

Geometry According to MER this topic can co-occur with one of thepertaining sub-topics defined at the OMB+ (ie s = Angel) Thus thelevel of the examination would be l = Section (but l 6= Chapter)because the student already reported a sub-section he or she is work-ing on In this specific case the examination mode could either bee isin [TrainingExerciseQuiz] but not e = Final_ExaminationThis is because e = Final_Examination can co-occur only with atopic (chapter-level) but not with a sub-topic (section-level) The ques-tion number q could be arbitrary and has to co-occur with every othercombination

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 27

This allows the formal solution to be found Assume a list of alltheoretically possible dialogue states S = [topic sub-topic trainingexercise chapter level section level quiz final examination questionnumber] and for each element sn in S is true that

sn =

1 if information is given

0 otherwise(1)

This gives all general (resp possible) dialogue states without ref-erence to the design of the OMB+ platform However to make thedialogue states fully suitable for the OMB+ from the general stateswe select only those which are valid To define the validness of thestate we specify the following five MERrsquos

R1 Topic

not T T

Table 2 Rule 1 ndash Admissible topic configurations

Rule (R1) in Table 2 denotes admissible configurations for topic andmeans that the topic could be either given (T ) or not (notT )

R2 Examination Mode

not TR not E not Q not FE

TR not E not Q not FE

not TR E not Q not FE

not TR not E Q not FE

not TR not E not Q FE

Table 3 Rule 2 ndash Admissible examination mode configurations

Rule (R2) in Table 3 denotes that either no information on the ex-amination mode is given or examination mode is Training (TR) orExercise (E) or Quiz (Q) or Final Examination (FE) but not more thanone mode at the same time

R3 Level

not CL not SL

CL not SL

not CL SL

Table 4 Rule 3 ndash Admissible level configurations

Rule (R3) in Table 4 indicates that either no level information isprovided or the level corresponds to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

[ September 4 2020 at 1017 ndash classicthesis ]

28 multipurpose ipa via conversational assistant

R4 Examination amp Level

not TR not E not SL not CL

TR not E SL not CL

TR not E not SL CL

TR not E not SL not CL

not TR E SL not CL

not TR E not SL CL

not TR E not SL not CL

Table 5 Rule 4 ndash Admissible examination mode (only for Training and Exer-cise) and corresponding level configurations

Rule (R4) in Table 5 means that Training (TR) and Exercise (E) ex-amination modes can either belong to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

R5 Topic amp Sub-Topic

not T not ST

T not ST

not T ST

T ST

Table 6 Rule 5 ndash Admissible topic and corresponding sub-topic configura-tions

Rule (R5) in Table 6 symbolizes that we could be either given onlya topic (T ) or the combination of topic and sub-topic (ST ) at the sametime or only sub-topic or no information on this point at all

We then define a valid dialogue state as a dialogue state that meetsall requirements of the abovementioned rules

foralls(State(s)and R1minus5(s))rarr Valid(s) (2)

Once we have the valid states for dialogues we perform a mappingfrom each valid dialogue state to the next possible systems action Forthat we first define five transition rules The order of transition rulesis important

T1 = notT (3)

means that no topic (T ) is found in the ID (ie could not be ex-tracted from user input)

T2 = notEM (4)

indicates that no examination mode (EM) is found in the ID

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 29

T3 = EM where EM isin [TRE] (5)

denotes that the extracted examination mode (EM) is either Train-ing (TR) or Exercise (E)

T4 =

T notST TR SL

T notST E SL

T notST Q

(6)

means that no sub-topic (ST ) is provided by a user but ID either al-ready contains the combination of topic (T ) training (TR) and sectionlevel (SL) or the combination of topic exercise (E) and section levelor the combination of topic and quiz (Q)

T5 = notQNR (7)

indicates that no question number (QNR) was provided by a stu-dent (or could not be successfully extracted)

Finally we assumed the list of possible next actions for the system

A =

Ask for Topic

Ask for Examination Mode

Ask for Level

Ask for Sub-Topic

Ask for Question Nr

(8)

Following the transition rules we map each valid dialogue state tothe possible next action am in A

existaforalls(Valid(s)and T1(s))rarr a(s T) (9)

in the case where we do not have any topic provided the nextaction is to ask for the topic (T )

existaforalls(Valid(s)and T2(s))rarr a(s EM) (10)

if no examination mode is provided by a user (or it could not besuccessfully extracted from the user query) the next action is definedas ask for examination mode (EM)

existaforalls(Valid(s)and T3(s))rarr a(s L) (11)

in case where we know the examination mode EM isin [TrainingExercise] we have to ask about the level (ie training at chapter levelor training at section level) thus the next action is ask for level (L)

[ September 4 2020 at 1017 ndash classicthesis ]

30 multipurpose ipa via conversational assistant

existaforalls(Valid(s)and T4(s))rarr a(s ST) (12)

if no sub-topic is provided but the examination mode EM isin [Train-ing Exercise] at section level the next action is defined as ask for sub-topic (ST )

existaforalls(Valid(s)and T5(s))rarr a(s QNR) (13)

if no question number is provided by a user then the next action isask for question number (QNR)

Following this scheme we generate 56 state transitions which spec-ify the next system actions Being on a new conversation state wecompare the extracted (or updated) data in the Informational Dictio-nary (ID) with the valid dialogue states and select the mapped actionas the next systemrsquos action

flexibility of dst The DST transitions outlined above are meantto maintain the current configuration of the OMB+ learning platformStill further MERs could be effortlessly generated to create new tran-sitions Examining this we performed tests with the previous de-sign of the platform where all the topics except for the first onehad sub-topics Whereas the first topic contained both sub-topics andsub-sub-topics We could produce the missing transitions with the de-scribed approach The number of permissible transitions in this caseincreased from 56 to 117

325 Meta Policy

Since the proposed system is expected to operate in a real-world set-ting we had to develop additional policies that manage the dialogueflow and verify the systemrsquos accuracy We explain these policies be-low

completeness of informational dictionary The model au-tomatically verifies the completeness of the Informational Dic-tionary (ID) which is determined by the number of essentialslots filled in the informational dictionary There are 6 discretecases when the ID is deemed to be complete For example ifa user works on a final examination mode the agent does nothas to request a sub-topic or examination level Hence the ID

has to be filled only with data for a topic examination typeand question number Whereas if the user works on trainingmode the system has to collect information about the topic sub-topic examination level examination mode and the questionnumber Once the ID is intact it is provided to the verificationstep Otherwise the agent proceeds according to the next ac-tion The system extracts the information in each dialogue stateand therefore if the user gives updated information on any sub-

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 31

ject later in the dialogue history the corresponding slot will beupdated in the ID

Below are examples of two final cases (out of six) where Infor-mational Dictionary (ID) is considered to be complete

Case1 =

intent = Math

topic 6= None

exam mode = Final Examination

level = None

question nr 6= None

(14)

Table 7 Case 1 ndash Any topic examination mode is final examination exami-nation level does not matter any question number

Case2 =

intent = Math

topic 6= None

sub-topic 6= None

exam mode isin [Training Exercise]

level = Section

question nr 6= None

(15)

Table 8 Case 2 Any topic any related sub-topic examination mode is ei-ther training or exercise examination level is section any questionnumber

verification step After the system has collected all the essentialdata (ie ID is complete) it continues to the final verification stepHere the obtained data is presented to the current student inthe dialogue session The student is asked to review the exactnessof the data and if any entries are faulty to correct them TheInformational Dictionary (ID) is where necessary updated withthe user-provided data This procedure repeats until the userconfirms the correctness of the compiled data

fallback policy In the cases where the system fails to retrieveinformation from a studentrsquos query it re-asks a user and at-tempts to retrieve information from a follow-up response Themaximum number of re-ask trials is a parameter and set to threetimes (r = 3) If the system is still incapable of extracting in-formation the user input is considered as the ground truth andfilled to the appropriate slot in ID An exception to this ruleapplies where the user has to specify the intent manually Afterthree unclassified trials a dialogues session is directly handedover to a human tutor

[ September 4 2020 at 1017 ndash classicthesis ]

32 multipurpose ipa via conversational assistant

human request In every dialogue state a user can shift to a hu-man tutor For this a user can enter the ldquohumanrdquo (or ldquoMenschrdquoin German language) key-word Therefore every user messageis investigated for the presence of this key-word

326 Response Generation

The semantic representation of the systems next action is transformedinto a natural language utterance Hence each possible action ismapped to precisely one response that is predefined in the templateSome of the responses are fixed (ie ldquoWelches Kapitel bearbeitest dugeraderdquo) 2 Others have placeholders for custom values In the lattercase the utterance can be formulated dependent on the InformationalDictionary (ID)

Assume predefined response ldquoDeine Angaben waren wie folgt a)Kapitel K b) Abschnitt A c) Aufgabentyp M d) Aufgabe Q e)Ebene E Habe ich dich richtig verstanden Bitte antworte mit Ja oderNeinrdquo3 The slots [KAMQE] are the placeholders for the informa-tion stored in the ID During the conversation the system selects thetemplate according to the next action and fills the slots with extractedvalues if required After the extracted information is filled in the cor-responding placeholders in the utterance the final response could beas follows ldquoDeine Angaben waren wie folgt a) Kapitel I ElementaresRechnen b) Abschnitt Rechenregel fuumlr Potenzen c) Aufgabentyp Quiz d)Aufgabe 1(a) e) Ebene Abschnitt Habe ich dich richtig verstanden Bitteantworte mit Ja oder Neinrdquo4

Below we demonstrate five sample dialogues in original Germanlanguage with variable flows

bull short dialogue where certain information is already providedin the first studentrsquos input

bull organizational question case

bull contextual question case

bull example with the manual intent classification

bull example with fallback and human policies

bull long flow with entry correction on the final verification phase

2Translates to ldquoWhich chapter are you currently working onrdquo3Translates to ldquoYour details were as follows a) Chapter b) Section c) Examina-

tion Mode d) Question Number e) Level Did I understand you correctly Pleaseanswer yes or no

4Translates to ldquoYour details were as follows a) Elementary Calculus b) PowerLaws c) Quiz d) Question 1 (a) e) Section Level Did I understand you correctlyPlease answer yes or no

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 33

Student Hallo ich habe Schwierigkeiten beim Kapitel I Elementares Rech-nen mit der Uumlbungsaufgabe 1 a

Next-Action [Ask for a Level Chapter or Section]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel ldquoGeometrierdquo) oder um eine Aufgabe auf der Ebene einesAbschnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt ldquoWinkelrdquoim Kapitel ldquoGeometrierdquo) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlrAbschnitt

Student Kapitel

Next-Action [Final-Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Aufgabentyp Exercisec) Aufgabe 1ad) Ebene KapitelHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yep

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 9 Showcase 1 Short flow most of the information was extracted fromthe initial user input In bold are the extracted entities

Student Hallo ich habe eine Frage bzgl eines Zertifikates und zwar wiekann ich es austellen lassen

Next-Action [Organisational Question]Bot Es scheint sich wohl um eine organisatorische Frage zu handeln Bittefasse deine Frage in einem kurzen Text zusammen damit ich sie an meinenmenschlichen Kollegen weiterleiten kann

Student Wie bekomme ich ein Zertifikat

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 10 Showcase 2 Organisational Question Underlined are the key-words which point on the organisational intent

[ September 4 2020 at 1017 ndash classicthesis ]

34 multipurpose ipa via conversational assistant

Student hallo

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student ma

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student math

Next-Action [Ask for Topic]Bot Welches Kapitel bearbeitest du gerade

Student Lineare Gleichungssysteme

Next-Action [Ask for Examination Mode]Bot Bearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfung

Student human

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 11 Showcase 3 Manual intent classification Fallback and Human Re-quest policies (in bold type)

Student Halli-Hallo koumlnnte mir bitte jemand die Regel 72 (vertikale streck-ung und stauchung des graphen) erklaumlren Danke im Voraus

Next-Action [Contextual Question]Bot Es scheint sich wohl um eine inhaltliche Frage zu handeln Bitte fassedeine Frage in einem kurzen Text zusammen damit ich sie an meinen men-schlichen Kollegen weiterleiten kann

Student Ich habe Schwierigkeiten beim Verstehen der Regel 72 Was istgenau die vertikale Streckung des Graphen

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 12 Showcase 4 Contextual Question Underlined are the key-wordswhich point on the contextual intent

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 35

Student Hallo ich kann beim Kapitel 1 nicht weiterkommen Trainingsauf-gabe 1 (a)

Next-Action [Ask for Level]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel Geometrie) oder um eine Aufgabe auf der Ebene eines Ab-schnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt Winkel imKapitel Geometrie) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlr Ab-schnitt

Student ich glaube es ist eine Sektion

Next-Action [Ask for Sub-Topic]Bot Wie heiszligt der Abschnitt in dem sich die Aufgabe befindet auf die sichdeine Frage bezieht

Student ich bearbeite rechenregeln und potenzen

Next-Action [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Quizd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student nope

Next-Action [Ask_For]Bot Welche der folgenden Punkten habe ich falsch erkannt Schreibe bittewelche der folgenden Punkte es sind a b c d e

Student c

State-Transition [Correct-Fields]Bot Gib bitte die korrekte Information fuumlr c) ein

Student Training

State-Transition [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Trainingd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yes

State-Transition [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 13 Showcase 5 Long Flow Correction of entries In bold type are theextracted entities

[ September 4 2020 at 1017 ndash classicthesis ]

36 multipurpose ipa via conversational assistant

Figure 4 Dialogue Flow Abbreviations UNK - unknown ID - informa-tional dictionary RegEx - regular expressions

[ September 4 2020 at 1017 ndash classicthesis ]

33 evaluation 37

33 evaluation

In order to get the feedback on the quality functionality and use-fulness of the proposed model we evaluated it in two steps firstwith an automated method (Section 331) utilizing 130 manually an-notated conversations to verify the robustness of the system Secondwith the help of human tutors (Section 332) from OMB+ to exam-ine the user experience We describe the details as well as the mostcommon errors below

331 Automated Evaluation

To manage an automated evaluation we manually assembled a datasetwith 130 conversational flows We predefined viable initial user ques-tions user answers as well as gold system responses These flowscover most frequent dialogues that we previously observed in human-to-human dialogue data We examined our system by implementinga self-chatting evaluation bot The evaluation cycle is as follows

bull The system accepts the first question from a predefined dia-logue via an API request preprocesses and analyzes it to de-termine the intent and extract entities

bull Then it estimates the appropriate next action and responds ac-cordingly

bull This response is then compared to the systems gold answerIf the predicted result is correct then the system receives thenext predefined user input from the templated dialogue and re-sponds again as defined above This procedure proceeds untilthe dialogue terminates (ie ID is complete) Otherwise thesystem reports an unsuccessful case number

The system successfully passed this evaluation for all 130 cases

332 Human Evaluation amp Error Analysis

To evaluate the system on irregular cases we carried out experimentswith human tutors from the OMB+ platform The tutors are experi-enced regarding rare or complex issues ambiguous responses mis-spellings and other infrequent but still relevant problems which oc-cur during a natural language dialogue In the following we reviewsome common errors and describe additional observations

misspellings and confusable spelling frequently befall in theuser-generated text Since we attempt to let the conversationremain natural to provide better user experience we do notinstruct students for formal writing Therefore we have to ac-count for various writing issues One of the prevalent problems

[ September 4 2020 at 1017 ndash classicthesis ]

38 multipurpose ipa via conversational assistant

are misspellings German words are commonly long and canbe complex and because users often type quickly it can lead tothe wrong order of characters within a given word To tacklethis challenge we utilized fuzzy match within ES Though themaximum allowed edit distance in ElasticSearch (ES) is set to 2characters which means that all the misspellings beyond thisthreshold could not be correctly identified by ES (eg Differenzial-rechnung vs Differnetialrechnung) Another representative exam-ple would be the formulation of the section or question num-ber The equivalent information can be communicated in sev-eral discrete ways which has to be considered by RegEx unit(eg Aufgabe 5 a Aufgabe V a Aufgabe 5 (a)) A similar prob-lem occurs with confusable spelling (ie Differenzialrechnung vsDifferenzialgleichung)

We analyzed the cases stated above and improved our modelby adding some of the most common misspellings to the Elas-ticSearch (ES) database or handling them with RegEx during thepreprocessing step

elasticsearch threshold We observed both cases where thesystem failed to extract information although the user providedit and examples where ES extracted information not mentionedin a user query According to our understanding these errorsoccur due to the relevancy scoring algorithm of ElasticSearch (ES)where a documentrsquos score is a combination of textual similarityand other metadata based scores Our examination revealedthat ES mostly fails to extract the information if the user mes-sage is rather short (eg about 5 tokens) To overcome this diffi-culty we coupled the current input ut with the dialogue history(h = utmiddotmiddotmiddot tminus2) That eliminated the problem and improved theretrieval quality

To solve the opposite problem where ES extracts false infor-mation (or information that was not mentioned in a query)was more challenging We learned that the problem comesfrom short words or sub-words (ie suffix prefix) which ES

locates in a database and considers them credible enough TheES documentation suggests getting rid of stop words to elim-inate this behavior However this did not improve the searchAlso fine-tuning of ES parameters such as the relevance thresholdprefix length5 and minimum should match6 parameter did not re-sult in notable improvements To cope with this problem weimplemented a verification step where a user can edit the erro-neously retrieved data

5The amount of fuzzified characters This parameter helps to decrease the num-ber of terms which must be reviewed

6Indicates a number of terms that must match for a document to be consideredrelevant

[ September 4 2020 at 1017 ndash classicthesis ]

34 structured dialogue acquisition 39

34 structured dialogue acquisition

As we stated the implemented system not only supports the humantutor by assisting students but also collects structured and labeledtraining data in the background In a trial run of the rule-based sys-tem we accumulated a toy-dataset with dialogues The assembleddataset includes the following

bull Plain dialogues with unique dialogue-indexes

bull Plain ID information collected for the whole conversation

bull Pairs of questions (ie user requests) and responses (ie botreplies) with the unique dialogue- and turn-indexes

bull Triples in the form of (Question Next Action Response) Informa-tion on the next systemrsquos action could be employed to train aDM unit with (deep-) machine learning algorithms

bull On every conversational state we keep the entities that the sys-tem was able to extract along with their position in the utter-ance This information could be used to train a custom domain-specific NER model

35 summary

In this work we realized a conversational agent for IPA purposesthat addresses two problems First it decreases repetitive and time-consuming activities which allows workers of the e-learning plat-form to focus solely on mathematical and hence cognitively demand-ing questions Second by interacting with users it augments theresources with structured and labeled training data The latter canbe utilized for further implementation of learnable dialogue compo-nents

The realization of such a system was connected with multiple chal-lenges Among others were missing structured conversational dataambiguous or erroneous user-generated text and the necessity todeal with existing corporate tools and their design

The proposed conversational agent enabled us to accumulate struc-tured labeled data without any special efforts from the human (ietutors) side (eg manual annotation and post-processing of existingdata change of the conversational structure within the tutoring pro-cess) Once we collected structured dialogues we were able to re-train particular segments of the system with deep learning methodsand achieved consistent performance for both of the proposed tasksWe report the re-implemented elements in Chapter 4

At the time of writing this thesis the system described in this workwas deployed at the OMB+ platform

[ September 4 2020 at 1017 ndash classicthesis ]

40 multipurpose ipa via conversational assistant

36 future work

Possible extensions of our work are

bull In this work we implemented a rule-based dialogue modelfrom scratch and adjusted it for the specific domain (ie e-learning) The core of the model ndash is a dialogue manager thatdetermines the current state of the conversation and the possi-ble next action Rule-based systems are generally consideredto be hardly adaptable to new domains however our dialoguemanager proved to be flexible to slight modifications in a work-flow One of the possible directions of future work would bethus the investigation of the general adaptability of the dialoguemanager core to other scenarios and domains (eg differentcourse)

bull A further possible extension would we the introduction of dif-ferent languages The current model is implemented for the Ger-man language but the OMB+ platform provides this course alsoin English and Chinese Therefore it would be interesting toinvestigate whether it is possible to adjust the existing systemto other languages without drastic changes in the model

[ September 4 2020 at 1017 ndash classicthesis ]

4D E E P L E A R N I N G R E - I M P L E M E N TAT I O N O F U N I T S

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research described inthis chapter was carried out in its entirety by the authorof this thesis The other authors of the publication actedas advisors

In this chapter we address the challenge of re-implementation oftwo central components of our IPA dialogue system described in Chap-ter 3 by employing deep learning techniques Since the data for thetraining was collected in a (short) trial-run of the rule-based systemit is not sufficient to train a neural network from scratch Thereforewe utilize Transfer Learning (TL) methods and employ the previouslypre-trained Bidirectional Encoder Representations from Transform-ers (BERT) model that we fine-tune for two of our custom tasks

41 introduction

Data mining and machine learning technologies have gained notableadvancement in many research fields including but not limited tocomputer vision robotics advanced analytics and computational lin-guistics (Pan and Yang 2009 Trautmann et al 2019 Werner et al2015) Besides that in recent years deep neural networks with theirability to accurately map from inputs to outputs (ie labels imagessentences) while analyzing patterns in data became a potential solu-tion for many industrial automatization tasks (eg fake news detec-tion sentiment analysis)

Nevertheless conventional supervised models still have limitationswhen applied to real-world data and industrial tasks The first chal-lenge here is a training phase since a robust model requires an im-mense amount of structured and labeled data Still there are just afew cases where such data is publicly available (eg research datasets)but for most of the tasks domains and languages the data has tobe gathered over a long time The second hurdle is that if trainedon existing data from a distinct domain a supervised model cannotgeneralize to conditions that are different from those faced in thetraining dataset Most of the machine learning methods perform cor-rectly only under a general assumption the training and test dataare mapped to the same feature space and the same distribution (Panand Yang 2009) When the distribution changes most statistical mod-els need to be retrained from scratch using newly obtained training

41

[ September 4 2020 at 1017 ndash classicthesis ]

42 deep learning re-implementation of units

samples It is especially crucial if applying such approaches to real-world data that is usually very noisy and contains a large numberof scenarios In this case a model would be ill-trained to make deci-sions about novel patterns Furthermore for most of the real-worldapplications it is expensive or even not feasible at all to recollect therequired training data to retrain the model Hence the possibility totransfer the knowledge obtained during the training on existing datato the new unseen domain is one of the possible solutions for indus-trial problems Such a technique is called Transfer Learning (TL) andit allows dealing with the abovementioned scenarios by leveragingexisting labeled data of some related tasks or domains

411 Outline and Contributions

This work addresses the challenge of re-implementing two out ofthree central components in our rule-based IPA conversational assis-tant with deep learning methods As we discussed in Chapter 2 hy-brid dialogue models (ie consisting of both rule-based and trainablecomponents) appear to replace the established rule- and template-based methods which are to the most part employed in the in-dustrial setting To investigate this assumption and examine currentstate-of-the-art deep learning techniques we conducted experimentson our custom domain-specific data Since the available data werecollected in a trial-run and the obtained dataset was relatively smallto train a conventional machine learning model from scratch we uti-lized the Transfer Learning (TL) approach and fine-tuned the existingpre-trained model for our target domain

For the experiments we defined two tasks

bull First we considered the NER problem in a custom domain set-ting We defined a sequence labeling task and employed a Bidirec-tional Encoder Representations from Transformers (BERT) (De-vlin et al 2018) as one of the transfer learning methods widelyutilized in the NLP area We applied the model on our datasetand fine-tuned it for seven (7) domain-specific (ie e-learning)entities

bull Second we examined the effectiveness of transfer learning forthe dialogue manager core The DM is one of the fundamentalparts of the conversational agent and requires compelling learn-ing techniques to hold the conversation intelligently For thatexperiment we defined a classification task and applied the BERT

model to predict the systems next action for every given userinput in a conversation We then computed the macro F1-scorefor 14 possible classes (ie actions) and an average dialogueaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

42 related work 43

The main characteristics of both settings are

bull the set of named entities and labels (ie actions) between sourceand target domain is different

bull for both settings we utilized the BERT model in German andmultilingual versions with additional parameters

bull the dataset used to train the target model is small and consistsof highly domain-specific labels The latter is often appearingin industrial scenarios

We finally verified that the selected method performed reasonablywell on both tasks We reached the performance of 093 macro F1

points for Named Entity Recognition (NER) and 075 macro F1 pointsfor the Next Action Prediction (NAP) task Therefore we concludethat both NER and NAP components could be employed to substituteor extend the existing rule-based modules

42 related work

As discussed in the previous section the main benefit of TransferLearning (TL) is that it allows the domains and tasks used duringtraining and testing phases to be different Initial steps in the for-mulation of transfer learning were made after NeurIPS-951 workshopthat was focused on the demand for lifelong machine-learning meth-ods that preserve and reuse previously learned knowledge (Chen etal 2018) Research on transfer learning has attracted more and moreattention since then especially in the computer vision domain andin particular for image recognition tasks (Donahue et al 2014 SharifRazavian et al 2014)

Recent studies have shown that TL can also be successfully appliedwithin the Natural Language Processing (NLP) domain especially onsemantically equivalent tasks (Dai et al 2019 Devlin et al 2018Mou et al 2016) Several research works were carried out specificallyon the Named Entity Recognition (NER) task and explored the capa-bilities of transfer learning when applied to different named entitycategories (ie different output spaces) For instance Qu et al (2016)pre-trained a linear-chain Conditional Random Fields (CRF) on a largeamount of labeled data in the source domain Then they introduceda two linear layer neural network that learns the difference betweenthe source and target label distributions Finally the authors initial-ized another CRF with parameters learned in linear layers to predictthe labels of the target domain Kim et al (2015) conducted experi-ments with transferring features and model parameters between re-lated domains where the label classes are different but may have asemantic similarity Their primary approach was to construct labelembeddings to automatically map the source and target label classesto improve the transfer (Chen et al 2018)

1Conference on Neural Information Processing Systems Formerly NIPS

[ September 4 2020 at 1017 ndash classicthesis ]

44 deep learning re-implementation of units

However there are also models trained in a manner such that theycould be applied to multiple NLP tasks simultaneously Among sucharchitectures are ULMFit (Howard and Ruder 2018) BERT (Devlin etal 2018) GPT2 (Radford et al 2019) and TransformerXL (Dai et al2019) ndash powerful models that were recently proposed for transferlearning within the NLP domain These architectures are called lan-guage models since they attempt to learn language structure from largetextual collections in an unsupervised manner and then predict thenext word in a sequence based on previously observed context wordsThe Bidirectional Encoder Representations from Transformers (BERT)(Devlin et al 2018) model is one of the most popular architectureamong the abovementioned and it is also one of the first that showeda close to human-level performance on the General Language Un-derstanding Evaluation (GLUE) benchmark2 (Wang et al 2018) thatincludes tasks on sentiment analysis question answering semanticsimilarity and language inference The architecture of BERT consistsof stacked transformer blocks that are pre-trained on a large corpusconsisting of 800M words from modern English books (ie BooksCor-pus by Zhu et al) and 2 500M words of text from pre-processed (iewithout markup) English Wikipedia articles Overall BERT advancedthe state-of-the-art performance for eleven (11) NLP tasks We describethis approach in detail in the subsequent section

43 bert bidirectional encoder representations from

transformers

Bidirectional Encoder Representations from Transformers (BERT) ar-chitecture is divided into two steps pre-training and fine-tuning Thepre-training step was enabled through two following tasks

bull Masked Language Model ndash the idea behind this approach is totrain a deep bidirectional representation that is different fromconventional language models which are either trained left-to-right or right-to-left To train a bidirectional model the authorsmask out a certain number of tokens in a sequence The modelis then asked to predict the original words for masked tokens Itis essential that in contrast to denoising auto-encoders (Vincentet al 2008) the model does not need to reconstruct the entireinput instead it attempts to predict the masked words Sincethe model does not know which words exactly it will be askedabout it learns a representation for every token in a sequence(Devlin et al 2018)

bull Next Sentence Prediction ndash aims to capture relationships be-tween two sentences A and B In order to train the model theauthors pre-trained a binarized next sentence prediction taskwhere pairs of sequences were labeled according to the criteriaA and B either follow each other directly in the corpus (label

2httpsgluebenchmarkcom

[ September 4 2020 at 1017 ndash classicthesis ]

43 bert bidirectional encoder representations from transformers 45

IsNext) or are both taken from random places (label NotNext)The model is then asked to predict whether the former or thelatter is the case The authors demonstrated that pre-trainingtowards this task is extremely useful for both question answeringand natural language inference tasks (Devlin et al 2018)

Furthermore BERT uses WordPiece tokenization for embeddings(Wu et al 2016) which segments words into subword-level tokensThis approach allows the model to make additional inferences basedon word structure (ie -ing ending denotes similar grammatical cat-egories)

The fine-tuning step follows the pre-training process For that theBERT model is initialized with parameters obtained during the pre-training step and a task-specific fully-connected layer is used aftertransformer blocks which maps the general representations to cus-tom labels (employing softmax operation)

self-attention The key operation of BERT architecture is theself-attention which is a sequence-to-sequence operation with conven-tional encoder-decoder structure (Vaswani et al 2017) The encodermaps an input sequence (x1 x2 xn) to a sequence of continuousrepresentations z = (z1 z2 zn) Given z the decoder then gener-ates an output sequence (y1y2 ym) of symbols one element ata time (Vaswani et al 2017) To produce output vector yi the selfattention operation takes a weighted average over all input vectors

yi =sumj

wijxj (16)

where j is an index over the entire sequence and the weights sumto one (1) over all j The weight wij is derived from a dot productfunction over xi and xj (Bloem 2019 Vaswani et al 2017)

w primeij = xTi xj (17)

The dot product returns a value between negative and positive in-finity and a softmax maps these values to [0 1] to ensure that theysum to one (1) over the entire sequence (Bloem 2019 Vaswani et al2017)

wij =expwlsquoijsumj expwlsquoij

(18)

A self-attention is not the only operation in transformers but is theessential one since it propagates information between vectors Otheroperations in the transformer are applied to every vector in the inputsequence without interactions between vectors (Bloem 2019 Vaswaniet al 2017)

[ September 4 2020 at 1017 ndash classicthesis ]

46 deep learning re-implementation of units

44 data amp descriptive statistics

The benchmark that we accumulated through the trial-run includes300 structured and partially labeled dialogues with the average lengthof dialogue being six (6) utterances All the conversations were heldin the German language Detailed general statistics can be found inTable 14

Value

Max Len Dialogue (in utterances) 15

Avg Len Dialogue (in utterances) 6

Max Len Utterance (in tokens) 100

Avg Len Utterance (in tokens) 9

Overall Unique Action Labels 13

Overall Unique Entity Labels 7

Train ndash Dialogues ( Utterances) 200 (1161)

Eval ndash Dialogues ( Utterances) 50 (279)

Test ndash Dialogues ( Utterances) 50 (300)

Table 14 General statistics for conversational dataset

The dataset covers 14 unique next actions and seven (7) uniquenamed entities Detailed information on next actions and entities canbe found in Table 15 and Table 16 Below we listed possible actionswith the corresponding explanation and predefined mapped state-ments

unk ndash requests to define the intent of the question mathematicalorganizational or contextual (ie fallback policy in case if theautomated intent recognition unit fails) rarr ldquoHast du eine Fragezu einer Aufgabe (MATH) zu einem Text im Kurs (TEXT) oder eineorganisatorische Frage (ORG)rdquo

org ndash handovers discussion to a human tutor in case of organiza-tional question rarr ldquoEs scheint sich wohl um eine organisatorischeFrage zu handeln Bitte fasse deine Frage in einem kurzen Text zusam-men damit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

text ndash handovers discussion to a human tutor in case of contextualquestion rarr ldquoEs scheint sich wohl um eine inhaltliche Frage zuhandeln Bitte fasse deine Frage in einem kurzen Text zusammendamit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

[ September 4 2020 at 1017 ndash classicthesis ]

44 data amp descriptive statistics 47

topic ndash requests information on the topic rarr ldquoWelches Kapitel bear-beitest du gerade Antworte bitte mit der Kapitelnummer zB IA IB II oder mit dem Kapitelnamen zB Geometrierdquo

examination ndash requests information about the examination moderarrldquoBearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfungrdquo

subtopic ndash requests information about the subtopic in case of quizor section-level trainingexercise modes rarr ldquoWie heiszligt der Ab-schnitt in dem sich die Aufgabe befindet auf die sich deine Fragebeziehtrdquo

level ndash requests information about the level of examination modechapter or section rarr ldquoGeht es um eine Aufgabe auf Kapitel-Ebene(zB um eine Trainingsaufgabe im Kapitel Geometrie) oder um eineAufgabe auf der Ebene eines Abschnitts in einem Kapitel (zB umeine Aufgabe zum Abschnitt Winkel im Kapitel Geometrie) Antwortebitte mit KAP fuumlr Kapitel und SEK fuumlr Abschnittrdquo

question number ndash requests information about the question num-ber rarr ldquoWelche Aufgabe bearbeitest du gerade Wenn du zB beiTeilaufgabe c in der zweiten Trainingsaufgabe (oder der zweiten Uumlbungs-aufgabe) bist antworte mit 2c Bearbeitest du gerade einen Quiz odereine Schlusspruumlfung gib bitte nur die Teilaufgabe anrdquo

final request ndash considers the Informational Dictionary (ID) to becomplete and initiates the verification step rarr ldquoDeine Angabenwaren wie folgt Habe ich dich richtig verstanden Bitte antwortemit Ja oder Neinrdquo

verify request ndash requests the student to verify the assembled in-formation rarr ldquoWelche der folgenden Punkte habe ich falsch erkanntSchreibe bitte welche der folgenden Punkte es sind [a b c ]3 rdquo

correct request ndash requests the student to provide a correct infor-mation if there is an error in the assembled data rarr ldquoGib bittedie korrekte Information fuumlr [a]4 einrdquo

exact question ndash requests the student to provide the detailed ex-planation of the problem rarr ldquoBitte fasse deine Frage in einemkurzen Text zusammen damit ich sie an meinen menschlichen Kolle-gen weiterleiten kannrdquo

human handover ndash finalizes the conversation rarr ldquoVielen Dankunsere menschlichen Kollegen melden sich gleich bei dirrdquo

3Variables that depend on the state of Informational Dictionary (ID)4Variable that was selected as erroneous and needs to be corrected

[ September 4 2020 at 1017 ndash classicthesis ]

48 deep learning re-implementation of units

Action Counts Action Counts

Final Request 321 Unk 80

Human Handover 300 Subtopic 55

Exact Question 286 Correct Request 40

Question Number 176 Verify Request 34

Examination 175 Org 17

Topic 137 Text 13

Level 130

Table 15 Detailed statistics on possible systems actions Columns Countsdenote the number of occurrences of each action in the entiredataset

Entity Counts

Question Nr 317

Chapter 311

Examination 303

Subtopic 198

Level 80

Intent 70

Table 16 Detailed statistics on possible named entities Column Counts de-note the number of occurrences of each entity in the entire dataset

45 named entity recognition

We defined a sequence labeling task to extract custom entities fromuser input We considered seven (7) viable entities (see Table 16) tobe recognized by the model topic subtopic examination mode and levelquestion number intent as well as the entity other for remaining wordsin the utterance Since the data collected from the rule-based sys-tem already includes information on the entities we able to train adomain-specific NER unit However since the original user-input wasinformal the same information could be provided in different writ-ing styles That means that a single entity could have different surfaceforms (eg synonyms writing styles) Therefore entities that we ex-tracted from the rule-based system were transformed into a universalstandard (eg official chapter names) To consider all of the variableentity forms while post-labeling the original dataset we determinedgeneric entity names (eg chapter question nr) and mapped vari-ations of entities from the user input (eg Chapter = [ElementaryCalculus Chapter I ]) to them The overall dataset consists of 300labeled dialogues where 200 (with 1161 utterances) were employedfor training and 100 for evaluation and test sets (50 dialogues withca 300 utterances for each set respectively)

[ September 4 2020 at 1017 ndash classicthesis ]

46 next action prediction 49

451 Model Settings

We conducted experiments with German and multilingual BERT im-plementations5 The capitalization of words is significant for the Ger-man language therefore we run the experiments on the capitalizeddata while preserving the original punctuation We employed the avail-able base model for both multilingual and German BERT implemen-tations in the cased version We initiated the learning rate for bothmodels to 1e minus 4 and the maximum length of the tokenized inputwas set to 128 tokens We conducted the experiments multiple timeswith different seeds for a maximum of 50 epochs with the trainingbatch size set to 32 We utilized AdamW as the optimizer and appliedearly stopping if the performance did not improve significantly after5 epochs

46 next action prediction

We defined a classification problem where we aim to predict the sys-temrsquos next action according to the given user input We assumed 14custom actions (see Table 15) that we considered being our classesWhile collecting the toy dataset every input was automatically la-beled by the rule-based system with the corresponding next actionand the dialogue-id Thus no additional post-labeling was requiredon this step

We investigated two settings for this task

bull Default Setting Utilizing a user input and the correspondingclass (ie next action) without any supplementary context Bydefault we conduct all of our experiments in this setting

bull Extended Setting Employing a user input corresponding nextaction and previous systems action as a source of supplementarycontext For this setting we experimented with the best per-forming model from the default setting

The overall dataset consists of 300 labeled dialogues where 200(with 1161 utterances) were employed for training and 100 for evalu-ation and test sets (50 dialogues with ca 300 utterances for each setrespectively)

461 Model Settings

For the NAP task we carried out experiments with German and multi-lingual BERT implementations as well We examined the performanceof both capitalized and lower-cased inputs as well as plain and prepro-cessed data For the multilingual BERT we used the base model inboth cased and uncased modifications For the German BERT we

5httpsgithubcomhuggingfacetransformers

[ September 4 2020 at 1017 ndash classicthesis ]

50 deep learning re-implementation of units

utilized the base model in the cased modification only6 For bothmodels we initiated the learning rate to 4e minus 5 and the maximumlength of the tokenized input was set to 128 tokens We conductedthe experiments multiple times with different seeds for a maximumof 300 epochs with the training batch size set to 32 We utilized AdamW

as the optimizer and applied early stopping if the performance didnot improve significantly after 15 epochs

47 evaluation and results

To evaluate the models we computed word-level macro F1 score forthe Named Entity Recognition (NER) task and utterance-level macro F1

score for the Next Action Prediction (NAP) task The word-level F1

is estimated as the average of the F1 scores per class each computedfrom all words in the evaluation and test sets The outcomes for theNER task are represented in Table 17 For utterance-level F1 a singleclass label (ie next action) is taken for the whole utterance Theresults for the NAP task are shown in Table 18

We additionally computed average dialogue accuracy for the best per-forming NAP models This score indicates how well the predictednext actions compose the conversational flow The average dialogueaccuracy was estimated for 50 conversations in the evaluation andtest sets respectively The results are displayed in Table 19

The obtained results for the NER task revealed that German BERT

performed significantly better than the multilingual BERT model Theperformance of the custom NER unit is at 093 macro F1 points for allpossible named entities (see Table 17) In contrast for the NAP taskthe multilingual BERT model obtained better performance than theGerman BERT model The best performing method in the default set-ting gained a macro F1 of 0677 points for 14 possible classes whereasthe model in the extended setting performed better ndash its best macroF1 score is 0752 for the equal number of classes (see Table 18)

Regarding the dialogue accuracy the extended system fine-tunedwith multilingual BERT obtained better outcomes as the default one(see Table 19) The overall conclusion for the NAP is that the capi-talized setting increased the performance of the model whereas theinclusion of punctuation has not influenced the results

6Uncased pre-trained modification of model was not available

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 51

Task Model Cased Punct F1

Eval Test

NER GER X X 0971 0930

NER Mult X X 0926 0905

Table 17 Word-level F1 for the Named Entity Recognition (NER) task Inbold best performance for evaluation and test sets

Task Model Cased Punct Extended F1

Eval Test

NAP GER X times times 0711 0673

NAP GER X X times 0701 0606

NAP Mult X X times 0688 0625

NAP Mult X times times 0769 0677

NAP Mult X times X 0810 0752

NAP Mult times X times 0664 0596

NAP Mult times times times 0742 0502

Table 18 Utterance-level F1 for the Next Action Prediction (NAP) task Un-derlined best performance for evaluation and test sets for defaultsetting (without previous action context) In bold best perfor-mance for evaluation and test sets on extended setting (with previ-ous action context)

Model Accuracy

Eval Test

NAP default 0765 0724

NAP extended 0813 0801

Table 19 Average dialogue accuracy computed for the Next Action Predic-tion (NAP) task for best preforming models In bold best perfor-mance for evaluation and test sets

[ September 4 2020 at 1017 ndash classicthesis ]

52 deep learning re-implementation of units

Figure5M

acroF1

evaluationN

AP

ndashdefault

modelC

omparison

between

Germ

anand

multilingual

BERT

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 53

Figu

re6

Mac

roF1

test

NA

Pndash

defa

ult

mod

elC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

54 deep learning re-implementation of units

Figure7M

acroF1

evaluationN

AP

ndashextended

model

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 55

Figu

re8

Mac

roF1

test

NA

Pndash

exte

nded

mod

el

[ September 4 2020 at 1017 ndash classicthesis ]

56 deep learning re-implementation of units

Figure9M

acroF1

evaluationN

ERndash

Com

parisonbetw

eenG

erman

andm

ultilingualBER

T

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 57

Figu

re1

0M

acroF1

test

NER

ndashC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

58 deep learning re-implementation of units

48 error analysis

We then investigated the cases where models failed to predict thecorrect action or labeled the named entity span erroneously Belowwe outline the most common errors

next action prediction One of the prevalent flaws in the de-fault model was the mismatch between two consecutive actionsndash the action question number and subtopic That is due to theposition of these actions in the conversational flow The occur-rence of both actions in the dialogue is not strict and largelydepends on the previous action To explain this we illustratethe following possible conversational scenarios

bull The system can either directly proceed with the request forthe question number if the examination mode is the finalexamination

bull If the examination mode is training or exercise the systemhas to ask about the level first (ie chapter or section)

bull If it is a chapter-level the system can again directly pro-ceed with the request for the question number

bull In the case of section-level the system has to ask aboutthe subsection first (if this information was not providedbefore)

The examination of the extended model showed that the inductionof supplementary context (ie previous action) improved theperformance of the system by about 60

named entity recognition The cases where the system failedcover mismatches between the labels chapter and other andthe labels question number and other This failure arose dueto the imperfectly labeled span of a multi-word named entity(eg ldquoElementary Calculus Proportionality Percentagerdquo) The firstor last words in the named entity were excluded from the spanand erroneously labeled with the other label

49 summary

In this chapter we re-implemented two of three fundamental unitsof our IPA conversational agent ndash Named Entity Recognition (NER)and Dialogue Manager (DM) ndash using the methods of Transfer Learn-ing (TL) specifically the Bidirectional Encoder Representations fromTransformers (BERT) model

We believe that the obtained results are reasonably high consider-ing a comparatively little amount of training data we employed tofine-tune the models Therefore we conclude that both NAP and NER

segments could be used to replace or extend the existing rule-based

[ September 4 2020 at 1017 ndash classicthesis ]

410 future work 59

modules Rule-based components as well as ElasticSearch (ES) arelimited in their capacities and not flexible to novel patterns Whereasthe trainable units generalize better and could possibly decrease theamount of inaccurate predictions in case of accidental dialogue be-havior Furthermore to increase the overall robustness rule-basedand trainable components could be used synchronously when onesystem fails (eg ElasticSearch (ES) module can not extract an entity)the conversation continues on the prediction obtained from the othermodel

410 future work

There are possible directions for future work

bull At the time of writing of this thesis both units were trained andinvestigated on its own ndash that means without being incorpo-rated into the initial IPA conversational system Thus one of thepotential directions of future work would be the embodimentof these units into the original system and the examination ofpossible failure cases Furthermore the resulting hybrid sys-tem could then imply the constraint of additional meta policiesto prevent an unexpected systems behavior

bull Further investigation could be towards the multi-language mo-dality for the re-implemented units Since the Online Mathe-matik Bruumlckenkurs (OMB+) platform also supports English andChinese it would be interesting to examine whether the sim-ple translation from target language (ie English Chinese) tosource language (ie German) would be sufficient to employalready-assembled dataset and pre-trained units Presumablyit should be achievable in the case of the Next Action Predic-tion (NAP) unit but could lead to a noticeable performance dropfor the Named Entity Recognition (NER) task since there couldbe a considerable gap between translated and original entitiesspans

bull Last but not least one of the further extensions for the IPA

would be the eventual applicability for more cognitively de-manding tasks For instance it is of interest to extend the modelfor more complex domains like handling of mathematical ques-tions Would it be possible to automate the human assistance atleast by less complicated mathematical inquiries or is it still un-achievable with currently existing NLP techniques and machinelearning methods

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

Part II

C L U S T E R I N G - B A S E D T R E N D A N A LY Z E R

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

5F O U N D AT I O N S

As discussed in Chapter 1 predictive analytics is one of the poten-tial Intelligent Process Automation (IPA) tasks due to its capabilityto forecast future events by using current and historical data andtherethrough assist knowledge workers with for instance sales or in-vestments Modern machine learning algorithms allow an intelligentand fast analysis of large amounts of data However the evaluationof such algorithms and models remains one of the grave problemsin this domain This is mostly due to the lack of publicly availablebenchmarks for the evaluation of topic modeling and trend detectionalgorithms Hence in this part of the thesis we are addressing thischallenge and assembling the benchmark for detecting both trendsand downtrends (ie topics that become less frequent overtime) Tothe best of our knowledge the task of downtrend detection has notbeen well addressed before

In this chapter we introduce the concepts required by the secondpart of the thesis We begin with the enlightenment to the topic mod-eling task and define the emerging trend in Section 51 Next weexamine the related work existing approaches to topic modeling andtrend detection and their limitations in Section 52 In Chapter 6we introduce our TRENDNERT benchmark for trend and downtrenddetection and describe the method we employed for its assembly

51 motivation and challenges

Science changes and evolves rapidly novel research areas emergewhile others fade away Keeping pace with these changes is challeng-ing Therefore recognizing and forecasting emerging research trends isof significant importance for researchers academic publishers as wellas for funding agencies stakeholders and innovation companies

Previously the task of trend detection was mostly solved by do-main experts who used specialized and cumbersome tools to inves-tigate the data to get useful insights from it (eg through data visu-alization or statistics) However manual analysis of large amountsof data could be time-consuming and hiring domain experts is verycostly Furthermore the overall increase of research data (ie GoogleScholar PubMed DBLP) in the past decade makes the approachbased on human domain experts less and less scalable In contrastautomated approaches become a more desirable solution for the taskof emerging trend identification (Kontostathis et al 2004 Salatino2015) Due to a large number of electronic documents appearing on

63

[ September 4 2020 at 1017 ndash classicthesis ]

64 foundations

the web daily several machine learning techniques were developed tofind patterns of word occurrences in those documents using hierarchi-cal probabilistic models These approaches are called ndash topic modelsand they allow the analysis of extensive textual collections and valu-able insights from them The main advantage of topic models is theirability to discover hidden patterns of word-use and connecting docu-ments containing similar patterns In other words the idea of a topicmodel is that documents are a mixture of topics where a topic is de-fined as a probability distribution over words (Alghamdi and Alfalqi2015)

Topics have a property of being able to evolve thus modeling top-ics without considering time may confuse the precise topic identifi-cation Whereas modeling topics by considering time is called topicevolution modeling and it can reveal important hidden informationin the document collection allowing recognizing topics with the ap-pearance of time This approach also provides the ability to check theevolution of topics during a given time window and examine whethera given topic is gaining interest and popularity over time (ie becom-ing an emerging trend) or if its development is stable Considering thelatter approach we turn to the task of emerging trend identification

How is an emerging trend defined An emerging trend is a topicthat is gaining interest and utility over time and it has two attributesnovelty and growth (Small Boyack and Klavans 2014 Tu and Seng2012) Rotolo Hicks and Martin (2015) explain the correlations ofthese two attributes in the following way Up to the point of theemergence a research topic is specified by a high level of novelty Atthat time it does not attract any considerable attention from the sci-entific community and due to the limited impact its growth is nearlyflat After some turning points (ie the appearance of significant sci-entific publications) the research topic starts to grow faster and maybecome a trend if the growth is fast however the level of a noveltydecreases continuously once the emergence becomes clear Then atthe final phase the topic becomes well-established while its noveltylevels out (He and Chen 2018)

To detect trends in textual data one employs computational anal-ysis techniques and Natural Language Processing (NLP) In the sub-sequent section we give an overview of the existing approaches fortopic modeling and trend detection

52 detection of emerging research trends

The task of trend detection is closely associated with the topic mod-eling task since in the first instance it detects the latent topics in thesubstantial collection of data It secondarily employs techniques toinvestigate the evolution of the topic and whether it has the potentialto become a trend

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 65

Topic modeling has an extensive history of applications in the scien-tific domain including but not limited to studies of temporal trends(Griffiths and Steyvers 2004 Wang and McCallum 2006) and investi-gation of the impact prediction (Yogatama et al 2011) Below we givea general overview of the most significant methods used in this do-main explain their importance and if any the potential limitations

521 Methods for Topic Modeling

In order to extract a topic from textual documents (ie research pa-pers blog posts) the precise definition of the model for topic repre-sentation is crucial

latent semantic analysis or formerly called Latent SemanticIndexing is one of the fundamental methods in the topic model-ing domain proposed by Deerwester et al in 1990 Its primarygoal is to generate vector-based representations for documentsand terms and then employ these representations to computethe similarity between documents (resp terms) in order to se-lect the most related ones In this regard Latent Semantic Anal-ysis (LSA) takes a matrix of documents and terms and then de-composes it into a separate document-topic matrix and a topic-term matrix Before turning to the central task of finding latenttopics that capture the relationships among the words and docu-ments it also performs dimensionality reduction utilizing trun-cated Singular Value Decomposition This technique is usuallyapplied to reduce the sparseness noisiness and redundancyacross dimensions in the document-topic matrix Finally usingthe resulting document and term vectors one applies measuressuch as cosine similarity to evaluate the similarity of differentdocuments words or queries

One of the extensions to traditional LSA is probabilistic LatentSemantic Analysis (pLSA) that was introduced by Hofmann in1999 to fix several shortcomings The foremost advantage ofthe follow-up method was that it could successfully automatedocument indexing which in turn improved the LSA in a prob-abilistic sense by using a generative model (Alghamdi and Al-falqi 2015) Furthermore the pLSA can differentiate betweendiverse contexts of word usage without utilizing dictionariesor a thesaurus That affects better polysemy disambiguationand discloses typical similarities by grouping words that sharea similar context Among the works that employ pLSA is theapproach proposed by Gohr et al (2009) where authors applypLSA in a window that slides across the stream of documents toanalyze the evolution of topics Also Mei et al (2008) uses thepLSA in order to create a network of topics

[ September 4 2020 at 1017 ndash classicthesis ]

66 foundations

latent dirichlet allocation was developed by Blei Ng andJordan (2003) to overcome the limitations of both LSA and pLSA

models Nowadays Latent Dirichlet Allocation (LDA) is ex-pected to be one of the most utilized and robust probabilistictopic models The fundamental concept of LDA is the assump-tion that every document is represented as a mixture of topicswhere each topic is a discrete probability distribution that de-fines the probability of each word to appear in a given topicThese topic probabilities provide a compressed representationof a document In other words LDA models all of d documentsin the collection as a mixture over n latent topics where eachof the topics describes a multinomial distribution over awwordin the vocabulary (Alghamdi and Alfalqi 2015)

LDA fixes such weaknesses of the pLSA like the incapacity toassign a probability to previously unseen documents (becausepLSA learns the topic mixtures only for documents seen in thetraining phase) and the overfitting problem (ie the number ofparameters in pLSA increases linearly with the size of the corpus)(Salatino 2015)

An extension of the conventional LDA is the hierarchical LDA

(Griffiths et al 2004) where topics are grouped in hierarchiesEach node in the hierarchy is associated with a topic and a topicis a distribution across words Another comparable approach isthe Relational Topic Model (Chang and Blei 2010) which is amixture of a topic model and a network model for groups oflinked documents The attribute of every document is its text(ie discrete observations taken from a fixed vocabulary) andthe links between documents are hyperlinks or citations

LDA is a prevalent method for discovering topics (Blei and Laf-ferty 2006 Bolelli Ertekin and Giles 2009 Bolelli et al 2009Hall Jurafsky and Manning 2008 Wang and Blei 2011 Wangand McCallum 2006) However it was explored that to trainand fine-tune a stable LDA model could be very challenging andtime-consuming (Agrawal Fu and Menzies 2018)

correlated topic model One of the main drawbacks of the LDA

is its incapability of capturing dependencies among the topics(Alghamdi and Alfalqi 2015 He et al 2017) Therefore Bleiand Lafferty (2007) proposed the Correlated Topic Model as anextension of the LDA approach In this model the authors re-placed the Dirichlet distribution with a logistic-normal distribu-tion that models pairwise topic correlations with the Gaussiancovariance matrix (He et al 2017)

However one of the shortcomings of this model is the compu-tational cost The number of parameters in the covariance ma-trix increases quadratic to the number of topics and parameterestimation for the full-rank matrix can be inaccurate in high-dimensional space (He et al 2017) Therefore He et al (2017)

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 67

proposed their method to overcome this problem The authorsintroduced the model which learns dense topic embeddingsand captures topic correlations through the similarity betweenthe topic vectors According to He et al their method allowsefficient inference in the low-dimensional embedding space andsignificantly reduces previous cubic or quadratic time complex-ity to linear concerning the topic number

522 Topic Evolution of Scientific Publications

Although the topic modeling task is essential it has a significant off-spring task namely the modeling of topic evolution Methods able toidentify topics within the context of time are highly applicable in thescientific domain They allow understanding of the topic lineage andreveal how research on one topic influences another Generally thesemethods could be classified according to the primary sources of in-formation they employ text citations or key phrases

text-based Extensive textual collections have dynamic co-occurre-nces of word patterns that change over time Thus models thattrack topics evolution over time should consider both word co-occurrence patterns and the timestamps Blei and Lafferty 2006

In 2006 Wang and McCallum proposed a Non-Markov Continu-ous Time model for the detection of topical trends in the col-lection of NeurIPS publications (the overall timestamp of 17years was considered) The authors made the following as-sumption If the pattern of the word co-occurrence exists fora short time then the system creates a narrow-time-distributiontopic Whereas for the long-term patterns system generated abroad- time-distribution topic The principal objective of this ap-proach is that it models topic evolution without discretizing thetime ie the state at time t + t1 is independent of the state attime t (Wang and McCallum 2006)

Another work done by Blei and Lafferty (2006) proposed the Dy-namic Topic Models The authors assumed that the collectionof documents is organized based on certain time spans and thedocuments belonging to every time span are represented witha K-component model Thereby the topics are associated withtime span t and emerge from topics corresponding to span timetminus 1 One of the main differences of this model compared toother existing approaches is that it utilizes Gaussian distribu-tion for the topic parameters instead of the conventional Dirich-let distribution and therefore can capture the evolution overvarious time spans

[ September 4 2020 at 1017 ndash classicthesis ]

68 foundations

citation-based analysis is the second popular direction that isconsidered to be effective for trend identification He et al 2009Le Ho and Nakamori 2005 Shibata Kajikawa and Takeda2009 Shibata et al 2008 Small 2006 This method assumes thatcitations in their various forms (ie bibliographic citations co-citation networks citation graphs) indicate the meaningful rela-tionship between topics and uses citations to model the topicevolution in the scientific domain Nie and Sun (2017) utilizethis approach along with the Latent Dirichlet Allocation (LDA)and k-means clustering to identify research trends The authorsfirst use LDA to extract features and determine the optimal num-ber of topics Then they use k-means to obtain thematic clus-ters and finally they compute citation functions to identify thechanges of clusters over time

Though the citation-based approachrsquos main drawback is that aconsistent collection of publications associated with a specificresearch topic is required for these techniques to detect the clus-ter and novelty of the particular topic He and Chen 2018 Fur-thermore there are also many trend detection scenarios (egresearch news articles or research blog posts) in which citationsare not readily available That makes this approach not applica-ble to this kind of data

keyphrase-based Another standard solution is based on the useof keyphrase information extracted from research papers In thiscase every keyword is representative of a single research topicSystems like Saffron (Monaghan et al 2010) as well as Saffron-based models use this approach Saffron is a research toolkitthat provides algorithms for keyphrase extraction entity link-ing and taxonomy extraction Asooja et al (2016) utilize Saffronto extract the keywords from LREC proceedings and then pro-posed trend forecast based on regression models as an extensionto Saffron

However this method raises many questions including the fol-lowing how to deal with the noisiness of keywords (ie notrelevant keywords) and how to handle keywords that have theirhierarchies (ie sub-areas) Other drawbacks of this approachinclude the ambiguity of keywords (ie ldquoJavardquo as an island andldquoJavardquo as a programming language) or necessity to deal withsynonyms which could be treated as different topics

53 summary

An emerging trend detection algorithm aims to recognize topics thatwere earlier inappreciable but now gaining the importance and in-terest within the specific domain Knowledge of emerging trends isespecially essential for stakeholders researchers academic publish-ers and funding organizations Assume a business analyst in the

[ September 4 2020 at 1017 ndash classicthesis ]

53 summary 69

biotech company who is interested in the analysis of technical arti-cles for recent trends that could potentially impact the companiesinvestments A manual review of all the available information in aspecific domain would be extremely time-consuming or even not fea-sible However this process could be solved through Intelligent Pro-cess Automation (IPA) algorithms Such software would assist humanexperts who are tasked with identifying emerging trends through au-tomatic analysis of extensive textual collections and visualization ofoccurrences and tendencies of a given topic

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

6B E N C H M A R K F O R S C I E N T I F I C T R E N D D E T E C T I O N

This chapter covers work already published at interna-tional peer-reviewed conferences The relevant publica-tion is Moiseeva and Schuumltze 2020 The research describedin this chapter was carried out in its entirety by the authorof the thesis The other author of the publication acted asan advisor

Computational analysis and modeling of the evolution of trendsis a significant area of research because of its socio-economic impactHowever no large publicly available benchmark for trend detectioncurrently exists making a comparative evaluation of methods impos-sible We remedy this situation by publishing the benchmark TREND-NERT consisting of a set of gold trends (resp downtrends) and doc-ument labels that is available as an unlimited download and a largeunderlying document collection that can also be obtained for free Wepropose Mean Average Precision (MAP) as an evaluation measure fortrend detection and apply this measure in an investigation of severalbaselines

61 motivation

What is an emerging research topic and how is it defined in the litera-ture At the time of emergence a research topic often does not attractmuch recognition from the scientific society and is exemplified by justa few publications He and Chen (2018) associate the appearance ofa trend to some triggering event eg the publication of an article ina high-impact journal Later the research topic starts to grow fasterand becomes a trend (He and Chen 2018)

We adopt this as our key representation of trend in this paper atrend is a research topic with a strongly increasing number of publi-cations in a particular time interval In opposite to trends there arealso downtrends which refer to the topics that move lower in theirdemand or growth as they evolve Therefore we define a downtrendas the converse to a trend a downtrend is a research topic that has astrongly decreasing number of publications in a particular time inter-val Downtrends can be as important to detect as trends eg for afunding agency planning its budget

To detect both types of topics (ie trends and downtrends) intextual data one employs computational analysis techniques and NLPThese methods enable the analysis of extensive textual collectionsand gain valuable insights into topical trendiness and evolution over

71

[ September 4 2020 at 1017 ndash classicthesis ]

72 benchmark for scientific trend detection

time Despite the overall importance of trend analysis there is to ourknowledge no large publicly available benchmark for trend detectionThis makes a comparative evaluation of methods and algorithms im-possible Most of the previous work has used proprietary or moder-ate datasets or evaluation measures were not directly bound to theobjectives of trend detection (Gollapalli and Li 2015)

We remedy this situation by publishing the benchmark TREND-NERT1 The benchmark consists of the following

1 A set of gold trends compiled based on an extensive analysis ofthe meta-literature

2 A set of labeled documents (based on more than one millionscientific publications) where each document is assigned to atrend a downtrend or the class of flat topics and supported bythe topic-title

3 An evaluation script to compute Mean Average Precision (MAP)as the measure for the accuracy of trend detection

We make the benchmark available as unlimited downloads2 Theunderlying research corpus can be obtained for free from SemanticScholar3 (Ammar et al 2018)

611 Outline and Contributions

In this work we address the challenge of the benchmark creationfor (down)trend detection This chapter is structured as follows Sec-tion 62 introduces our model for corpus creation and its fundamen-tal components In Section 63 we present the crowdsourcing proce-dure for the benchmark labeling Section 65 outlines the proposedbaseline for trend detection our experimental setup and outcomesFinally we illustrate the details on the distributed benchmark in Sec-tion 66 and conclude in Section 67

Our contributions within this work are as follows

bull TRENDNERT is the first publicly available benchmark for (down)-trend detection

bull TRENDNERT is based on a collection of more than one milliondocuments It is among the largest that has been used for trenddetection and therefore offers a realistic setting for developingtrend detection algorithms

bull TRENDNERT addresses the task of detecting both trends anddowntrends To the best of our knowledge the task of down-trend detection has not been addressed before

1The name is a concatenation of trend and its ananym dnert Because it supportsthe evaluation of both trends and downtrends

2httpsdoiorg1017605OSFIODHZCT3httpsapisemanticscholarorgcorpus

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 73

62 corpus creation

In this section we explain the methodology we used to create theTRENDNERT benchmark

621 Underlying Data

We employ a corpus provided by Semantic Scholar4 It includes morethan one million papers published mostly between 2000 and 2015

in about 5000 computer science journals and conference proceedingsEvery record in the collection consists of a title key phrases abstractfull text and metadata

622 Stratification

The distribution of documents over time in the entire original datasetis skewed (see Figure 11) We discovered that clustering the entire col-lection as well as a random sample produces corrupt results becausemore weight is given to later years than to earlier years Thereforewe first generated a stratified sample of the original document collec-tion To this end we randomly pick 10 000 documents for each yearbetween 2000 and 2016 for an overall sample size of 160 000

Figure 11 Overall distribution of papers in the entire dataset Years 1975 to2016

4httpswwwsemanticscholarorg

[ September 4 2020 at 1017 ndash classicthesis ]

74 benchmark for scientific trend detection

623 Document representations amp Clustering

As we mentioned this workrsquos primary focus was the creation of abenchmark not the development of new algorithms or models fortrend detection Therefore for our experiments we have selected analgorithm based on k-means clustering as a simple baseline methodfor trend detection

In conventional document clustering documents are usually repre-sented as bag-of-words (BOW) feature vectors (Manning Raghavanand Schuumltze 2008) Those however have a major weakness theyneglect the semantics of words (Le and Mikolov 2014) Still the re-cent work in representation learning domain and particularly doc2vecndash the method proposed by Le and Mikolov (2014) ndash can provide rep-resentations to document clustering that overcome this weakness

The doc2vec5 algorithm is inspired by techniques for learning wordvectors (Mikolov et al 2013) and is capable of capturing semantic reg-ularities in textual collections This is an unsupervised approach forlearning continuous distributed vector representations for text (ieparagraphs documents) This approach maps documents into a vec-tor space such that semantically related documents are assigned sim-ilar vector representations (eg an article about ldquogenomicsrdquo is closerto an article about ldquogene expressionrdquo than to an article about ldquofuzzysetsrdquo) Formally each paragraph is mapped to the individual vectorrepresented by a column in matrix D and every word is also mappedto the individual vector represented by a column in matrix W Theparagraph vector and word vectors are concatenated to predict thenext word in a context (Le and Mikolov 2014) Other work has al-ready successfully employed this type of document representationsfor topic modeling combining them with both Latent Dirichlet Allo-cation (LDA) and clustering approaches (Curiskis et al 2019 DiengRuiz and Blei 2019 Moody 2016 Xie and Xing 2013)

In our work we run doc2vec on the stratified sample of 160 000papers and represent each document as a length-normalized vector(ie document embedding) These vectors are then clustered intok = 1000 clusters using the scikit-learn6 implementation of k-means(MacQueen 1967) with default parameters The combination of doc-ument representations with a clustering algorithm is conceptuallysimple and interpretable Also the comprehensive comparative eval-uation of topic modeling methods utilizing document embeddingsperformed by Curiskis et al (2019) showed that doc2vec feature repre-sentations with k-means clustering outperform several other methods7

on three evaluation measures (Normalized Mutual Information Ad-justed Mutual Information and Adjusted Rand Index)

5We use the implementation provided by Gensim httpsradimrehurekcom

gensimmodelsdoc2vechtml6httpsscikit-learnorgstable7hierarchical clustering k-medoids NMF and LDA

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 75

We run 10 trials of stratification and clustering resulting in 10 dif-ferent clusterings We do this to protect against the variability ofclustering and because we do not want to rely on a single clusteringfor proposing (down)trends for the benchmark

624 Trend amp Downtrend Estimation

Recalling our definitions of trend and downtrend A trend (resp down-trend) is defined as a research topic that has a strongly increasing(resp decreasing) number of publications in a particular time inter-val

Linear trend estimation is a widely used technique in predictive(time-series) analysis that proves statements about tendencies in thedata by correlating the measurements to the times at which they oc-curred (Hess Iyer and Malm 2001) Considering the definition oftrend (resp downtrend) and the main concept of linear trend esti-mation models we utilize the linear regression to identify trend anddowntrend candidates in resulting clustering sets

Specifically we count the number of documents per year for a clus-ter and then estimate the parameters of the best fitting line (ie slope)Clusters are then ranked according to the resulting slope and then = 150 clusters with the largest (resp smallest) slope are selectedas trend (resp downtrend) candidates Thus our definition of a single-clustering (down)trend candidate is a cluster with extreme ascendingor descending slope Figure 12 and Figure 13 give two examples of(down)trend candidates

625 (Down)trend Candidates Validation

As detailed in Section 623 we run ten series of clustering result-ing in ten discrete clustering sets We then estimated the (down)trendcandidates according to the procedure described in Section 624 Con-sequently for each of the ten clustering sets we obtained 150 trendand 150 downtrend candidates

Nonetheless for the benchmark among all the (down)trend can-didates across the ten clustering sets we want to sort out only thosethat are consistent overall sets To obtain consistent (down)trend candi-dates we apply a Jaccard coefficient metric that is used for measuringthe similarity and diversity of sample sets We define an equivalencerelation simR two candidates (ie clusters) are equivalent if their Jac-card coefficient is gt τsim where τsim = 05 Following this notion wecompute the equivalence of (down)trend candidates across the tensets

Finally we define an equivalence class as an equivalence class trendcandidate (resp equivalence class downtrend candidate) if it contains trend(resp downtrends) candidates in at least half of the clusterings This

[ September 4 2020 at 1017 ndash classicthesis ]

76 benchmark for scientific trend detection

Figure 12 A trend candidate (positive slope) Topic Sentiment and EmotionAnalysis (using Social Media Channels)

Figure 13 A downtrend candidate (negative slope) Topic XML

[ September 4 2020 at 1017 ndash classicthesis ]

63 benchmark annotation 77

procedure gave us in total 110 equivalence class trend candidates and107 equivalence class downtrend candidates that we then annotatefor our benchmark8

63 benchmark annotation

To annotate the equivalence class (down)trend candidates obtainedfrom our model we used the Figure-Eight9 platform It provides anintegrated quality-control mechanism and a training phase before theactual annotation (Kolhatkar Zinsmeister and Hirst 2013) whichminimizes the rate of poorly qualified annotators

We design our annotation task as follows A worker is shown adescription of a particular (down)trend candidate along with the list ofpossible (down)trend tags Descriptions are mixed and presented inrandom order The temporal presentation order of candidates is alsoarbitrary The worker is asked to pick from the list with (down)trendtags an item that matches the cluster (resp candidate) best Evenif there are several items that are a possible match the worker mustchoose only one If there is no matching item the worker can selectthe option other Three different workers evaluated each cluster andthe final label is the majority label If there is no majority label wepresent the cluster repeatedly to the workers until there is a majority

The descriptors of candidates are collected through the followingprocedure

bull First we review the documents assigned to a candidate cluster

bull Then we extract keywords and titles of the papers attributed tothe cluster Semantic Scholar provides keywords and titles as apart of the metadata

bull Finally we compute the top 15 most frequent keywords andrandomly select 15 titles These keywords and titles are thedescriptor of the cluster candidate presented to crowd-workers

We create the (down)trend tags based on keywords titles and meta-data such as publication venue The cluster candidates were manu-ally labeled by graduate employees (ie domain experts in the com-puter science domain) considering the abovementioned items

631 Inter-annotator Agreement

To ensure the quality of obtained annotations we computed the InterAnnotator Agreement (IAA) - a measure of how well two (or more)

8Note that the overall number of 217 refers to the number of the (down)trendcandidates and not to the number of documents Each candidate contains hundredsof documents

9Formerly called CrowdFlower

[ September 4 2020 at 1017 ndash classicthesis ]

78 benchmark for scientific trend detection

annotators can make the same decision for a particular category Todo that we used the following metrics

krippendorffrsquos α is a reliability coefficient developed to measurethe agreement among observers or measuring instruments draw-ing distinctions among typically unstructured phenomena orassign computable values to them (Krippendorff 2011) Wechoose Krippendorffrsquos α as an inter-annotator agreement mea-sure because it ndash unlike other specialized coefficients ndash is a gen-eralization of several reliability criteria and applies to situationslike ours where we have more than two annotators and a givenannotator only labels a subset of the data (ie we have to dealwith missing values)

Krippendorff (2011) has proposed several metric variations eachof which is associated with a specific set of weights We useKrippendorffrsquos α considering any number of observers and miss-ing data In this case Krippendorffrsquos α for the overall numberof 217 annotated documents is α = 0798 According to Landisand Koch (1977) this value corresponds to a substantial level ofagreement

figure-eight agreement rate Figure-Eight10 provides an agree-ment score c for each annotated unit u which is based onthe majority vote of the trusted workers The score has beenproven to perform well compared to the classic metrics (Kol-hatkar Zinsmeister and Hirst 2013) We computed the averageC for the 217 annotated documents C is 0799

Since both Krippendorffrsquos α and internal Figure Eight IAA are ratherhigh we consider the obtained annotations for the benchmark to beof good quality

632 Gold Trends

We compiled a list of computer science trends for the years 2016based on the analysis of the twelve survey publications (Ankerholz2016 Augustin 2016 Brooks 2016 Frot 2016 Harriet 2015 IEEEComputer Society 2016 Markov 2015 Meyerson and Mariette 2016Nessma 2015 Reese 2015 Rivera 2016 Zaino 2016) For this yearwe identified a total of 31 trends see Table 20

Since there is a variation as to the name of a specific trend wecreated a unique name for each of them Out of the 31 trends 28(90 3) are instantiated as trends by our model that we confirmedin the crowdsourcing procedure11 We searched for the three missingtrends using a variety of techniques (ie inspection of non-trendsdowntrends random browsing of documents not assigned to trend

10Former CrowdFlower platform11The author verified this based on her judgment of equivalence of one of the 31

gold trend names in the literature with one of the trend tags

[ September 4 2020 at 1017 ndash classicthesis ]

64 evaluation measure for trend detection 79

candidates and keyword searches) Two of the missing trends donot seem to occur in the collection at all ldquoHuman Augmentationrdquo andldquoFood and Water Technologyrdquo The third missing trend ldquoCryptocurrencyand Cryptographyrdquo does occur in the collection but the occurrencerate is very small (ca 001) We do not consider these three trendsin the rest of the paper

Computer Science Trend Computer Science Trend

Autonomous Agents and Systems Natural Language Processing

Autonomous Vehicles Open Source

Big Data Analytics Privacy

Bioinformatics and (Bio) Neuroscience Quantum Computing

Biometrics and Personal Identification Recommender Systems and Social Networks

Cloud Computing and Software as a Service Reinforcement Learning

Cryptocurrency and Cryptographylowast Renewable Energy

Cyber Security Robotics

E-business Semantic Web

Food and Water Technologylowast Sentiment and Emotion Analysis

Game-based Learning Smart Cities

Games and (Virtual) Augmented Reality Supply Chains and RFIDs

Human Augmentationlowast Technology for Climate Change

MachineDeep Learning Transportation and Energy

Medical Advances and DNA Computing Wearables

Mobile Computing

Table 20 31 areas identified as computer science trends for year 2016 in themedia 28 of these trends are instantiated by our model as welland thus covered in our benchmark Topics marked with asterisk(lowast) were not found in our underlying data collection

In summary based on our analysis of meta-literature we identified28 gold trends that also occur in our collection We will use them asthe gold standard that trend detection algorithms should aim to discover

64 evaluation measure for trend detection

Whether something is a trend is a graded concept For this reasonwe adopt an evaluation measure based on a ranking of candidatesspecifically Mean Average Precision (MAP) The score denotes theaverage of the precision values of ranks of correctly identified andnon-redundant trends

We approach trend detection as computing a ranked list of sets ofdocuments where each set is a trend candidate We refer to docu-ments as trendy downtrendy and flat depending on which class theyare assigned to in our benchmark We consider a trend candidate

[ September 4 2020 at 1017 ndash classicthesis ]

80 benchmark for scientific trend detection

c as a correct recognition of gold trend t if it satisfies the followingrequirements

bull c is trendy

bull t is the largest trend in c

bull |tcap c||c| gt ρ where ρ is a coverage parameter

bull t was not earlier recognized in the list

These criteria give us a true positive (ie recognition of a gold trend)or false positive (otherwise) for every position of the ranked list anda precision score for each trend Precision for a trend that was notobserved is 0 We then compute MAP see Table 21 for an example

Gold Trends Gold Downtrends

Cloud Computing Compilers

Bioinformatics Petri Nets

Sentiment Analysis Fuzzy Sets

Privacy Routing Protocols

Trend Candidate |tcap c||c| tpfp P

c1 Cloud Computing 060 tp 100

c2 Privacy 020 fp 050

c3 Cloud Computing 070 fp 033

c4 Bioinformatics 055 tp 050

c5 Sentiment Analysis 095 tp 060

c6 Sentiment Analysis 059 fp 050

c7 Bioinformatics 041 fp 043

c8 Compilers 060 fp 038

c9 Petri Nets 080 fp 033

c10 Privacy 033 fp 030

Table 21 Example of proposed MAP evaluation (with ρ = 05) MAP is theaverage of the precision values 10 (Cloud Computing) 05 (Bioin-formatics) 06 (Sentiment Analysis) and 00 (Privacy) ie MAP =214 = 0525 True positive tp False positive fp Precision P

65 trend detection baselines

In this section we investigate the impact of four different configura-tion choices on the performance of trend detection by way of ablation

[ September 4 2020 at 1017 ndash classicthesis ]

65 trend detection baselines 81

651 Configurations

abstracts vs full texts The underlying data collection containsboth abstracts and full texts12 Our initial expectation was thatthe full text of a document comprises more information thanthe abstract and it should be a more reliable basis for trend de-tection This is one of the configurations we test in the ablationWe employ our proposed method on full text (solely documenttexts without abstracts) and abstract (only abstract texts) collec-tion separately and observe the results

stratification Due to the uneven distribution of documents overthe years the last years (with increased volume of publications)may get too much impact on results compared to earlier yearsTo investigate this we conduct an ablation experiment wherewe compare (i) randomly sampled 160k documents from theentire collection and (ii) stratified sampling In stratified sam-pling we select 10k documents for each of the years 2000 ndash 2015resulting in a sampled collection of 160k

length Lt of interval Clusters are ranked by trendiness and theresulting ranked list is evaluated To measure the trendiness orgrowth of topics over time we fit a line to an interval (ini|i0 6i lt i0 + Lt by linear regression where Lt is the length of theinterval i is one of the 16 years (2000 2015) and ni is thenumber of documents that were assigned to cluster c in thatyear As a simple default baseline we apply the regression to ahalf of an entire interval ie Lt = 8 There are nine such inter-vals in our 16-year period As the final measure of growth forthe cluster we take the maximal of the nine individual slopesTo determine how much this configuration choice affects our re-sults we also test a linear regression over the entire time spanie Lt = 16 In this case there is a single interval

clustering method We consider Gaussian Mixture Models (GMM)and k-means We use GMM with spherical type of covariancewhere each component has the same variance For both weproceed as described in Section 623

652 Experimental Setup and Results

We conducted experiments on the Semantic Scholar corpus and eval-uated a ranked list of trend candidates against the benchmark con-sidering the configurations mentioned above We adopted Mean Av-erage Precision (MAP) as the principal evaluation measure As a sec-ondary evaluation measure we computed recall at 50 (R50) whichestimates the percentage of gold trends found in the list of 50 highest-ranked trend candidates We found that the configuration (0) in Ta-

12Semantic Scholar provided the underlying dataset at the beginning of our workin 2016

[ September 4 2020 at 1017 ndash classicthesis ]

82 benchmark for scientific trend detection

ble 22 works best We then conducted ablation experiments to dis-cover the importance of the configuration choices

(0) (1) (2) (3) (4)

document part A F A A A

stratification yes yes no yes yes

Lt 8 8 8 16 8

clustering GMMsph GMMsph GMMsph GMMsph kmicro

MAP 36 (03) 07 (19) 30 (03) 32 (04) 34 (21)

R50 avg 50 (012) 25 (018) 50 (016) 46 (018) 53(014)

R50 max 61 37 53 61 61

Table 22 Ablation results Standard deviations are in parentheses GMMclustering performed with the spherical (sph) type of covarianceK-means clustering is denoted as kmicro in the ablation table

comparing (0) and (1) 13 we observed that abstracts (A) are amore reliable representation for our trend detection baselinethan full texts (F) The possible explanation for that observa-tion is that an abstract is a summary of a scientific paper thatcovers only the central points and is semantically very conciseand rich In opposite the full text contains numerous parts thatare secondary (ie future work related work) and thus mayskew the documentrsquos overall meaning

comparing (0) and (2) we recognized that stratification (yes) im-proves results compared to the setting with non-stratified (no)randomly sampled data We would expect the effect of stratifi-cation to be even more substantial for collections in which thedistribution of documents over time is more skewed than in our

comparing (0) and (3) the length of the interval is a vital config-uration choice As we assumed the 16 year interval is ratherlong and 8 year interval could be a better choice primarily ifone aims to find short-term trends

comparing (0) and (4) we recognized that topics we obtain fromk-means have a similar nature to those from GMM HoweverGMM that take variance and covariance into account and esti-mate a soft assignment still perform slightly better

66 benchmark description

Below we report the contents of the TRENDNERT benchmark14

1 Two records containing information on documents One filewith the documents assigned to (down)trends and the otherfile with documents assigned to flat topics

13Numbers (0) (1) (2) (3) (4) are the configuration choices in the ablation table14httpsdoiorg1017605OSFIODHZCT

[ September 4 2020 at 1017 ndash classicthesis ]

67 summary 83

2 A script to produce document hash codes

3 A file to map every hash code to internal IDs in the benchmark

4 A script to compute MAP (default setting is ρ = 25)

5 READMEmd a file with guidance notes

Every document in the benchmark contains the following informa-tion

bull Paper ID internal ID of documents assigned to each documentin the original Semantic Scholar collection

bull Cluster ID unique ID of meta-cluster where the document wasobserved

bull LabelTag Name of the gold (down)trend assigned by crowd-workers (eg Cyber Security Fuzzy Sets and Systems)

bull Type of (down)trend candidate trend (T ) downtrend (D) or flattopic (F)

bull Hash ID15 of each paper from the original Semantic Scholarcollection

67 summary

Emerging trend detection is a promising task for Intelligent ProcessAutomation (IPA) that allows companies and funding agencies to au-tomatically analyze extensive textual collections in order to detectemerging fields and forecast the future employing machine learningalgorithms Thus the availability of publicly available benchmarksfor trend detection is of significant importance as only thereby canone evaluate the performance and robustness of the techniques em-ployed

Within this work we release TRENDNERT ndash the first publicly availablebenchmark for (down)trend detection that offers a realistic setting fordeveloping trend detection algorithms TRENDNERT also supportsdowntrend detection ndash a significant problem that was not addressedbefore We also present several experimental findings on trend detec-tion First stratification improves trend detection if the distribution ofdocuments is skewed over time Third abstract-based trend detectionperforms better than full text if straightforward models are used

68 future work

There are possible directions for future work

bull The presented approach to estimate the trendiness of the topicby means of linear regression is straightforward and can be

15MD5-hash of First Author + Title + Year

[ September 4 2020 at 1017 ndash classicthesis ]

84 benchmark for scientific trend detection

replaced by more sophisticated versions like Kendallrsquos τ cor-relation coefficient or the Least Squares Regression Smoother(LOESS) (Cleveland 1979)

bull It also would be challenging to replace the k-means clustering al-gorithm that we adopted within this work with the LDA modelwhile retaining the document representations as it was donein Dieng Ruiz and Blei 2019 The case to investigate wouldbe then whether the topic modeling outperforms the simpleclustering algorithm in this particular setting And would theresulting system benefit from the topic modeling algorithm

bull Furthermore it would be of interest to conduct more researchon the following What kind of document representation canmake effective use of the information in full texts Recentlyproposed contextualised word embeddings could be a possibledirection for these experiments

[ September 4 2020 at 1017 ndash classicthesis ]

7D I S C U S S I O N A N D F U T U R E W O R K

As we observed in the two parts of this thesis Intelligent Process Au-tomation (IPA) is a challenging research area due to its multifaceted-ness and appliance in a real-world scenario Its main objective is toautomatize time-consuming and routine tasks and free up the knowl-edge worker for more cognitively demanding tasks However thistask addresses many difficulties such as a lack of resources for bothtraining and evaluation of algorithms as well as an insufficient per-formance of machine learning techniques when applied to real-worlddata and practical problems In this thesis we investigated two IPA

applications within the Natural Language Processing (NLP) domainndash conversational agents and predictive analytics ndash as well as addressedsome of the most common issues

bull In the first part of the thesis we have addressed the challenge ofimplementing a robust conversational system for IPA purposeswithin the practical e-learning domain considering the condi-tions of missing training data The primary purpose of the re-sulting system is to perform the onboarding process betweenthe tutors and students This onboarding reduces repetitiveand time-consuming activities while allowing tutors to focuson solely mathematical questions which is the core task withthe most impact on students Besides that by interacting withusers the system augments the resources with structured andlabeled training data that we used to implement learnable dia-logue components

The realization of such a system was associated with severalchallenges Among others were missing structured data andambiguous user-generated content that requires a precise lan-guage understanding Also the system that simultaneouslycommunicates with a large number of users has to maintainadditional meta policies such as fallback and check-up policiesto hold the conversation intelligently

bull Moreover we have shown the rule-based dialogue systemrsquos po-tential extension using transfer learning We utilized a smalldataset of structured dialogues obtained in a trial-run of therule-based system and applied the current state-of-the-art Bidi-rectional Encoder Representations from Transformers (BERT) ar-chitecture to solve the domain-specific tasks Specifically weretrained two of three components in the conversational systemndash Named Entity Recognition (NER) and Dialogue Manager (DM)(ie NAP task) ndash and confirmed the applicability of such algo-

85

[ September 4 2020 at 1017 ndash classicthesis ]

86 discussion and future work

rithms in a real-world low-resource scenario by showing thereasonably high performance of resulting models

bull Last but not least in the second part of the thesis we addressedthe issue of missing publicly available benchmarks for the eval-uation of machine learning algorithms for the trend detectiontask To the best of our knowledge we released the first openbenchmark for this purpose which offers a realistic setting fordeveloping trend detection algorithms Besides that this bench-mark supports downtrend detection ndash a significant problemthat was not sufficiently addressed before We additionally pro-posed the evaluation measure and presented several experimen-tal findings on trend detection

The ultimate objective of Intelligent Process Automation (IPA) shouldbe a creation of robust algorithms in low-resource and domain-specificconditions This is still a challenging task however some of the theexperiments presented in this thesis revealed that this goal could bepotentially achieved by means of Transfer Learning (TL) techniques

[ September 4 2020 at 1017 ndash classicthesis ]

B I B L I O G R A P H Y

Agrawal Amritanshu Wei Fu and Tim Menzies (2018) bdquoWhat iswrong with topic modeling And how to fix it using search-basedsoftware engineeringldquo In Information and Software Technology 98pp 74ndash88 (cit on p 66)

Alan M (1950) bdquoTuringldquo In Computing machinery and intelligenceMind 59236 pp 433ndash460 (cit on p 7)

Alghamdi Rubayyi and Khalid Alfalqi (2015) bdquoA survey of topicmodeling in text miningldquo In Int J Adv Comput Sci Appl(IJACSA)61 (cit on pp 64ndash66)

Ammar Waleed Dirk Groeneveld Chandra Bhagavatula Iz BeltagyMiles Crawford Doug Downey Jason Dunkelberger Ahmed El-gohary Sergey Feldman Vu Ha et al (2018) bdquoConstruction ofthe literature graph in semantic scholarldquo In arXiv preprint arXiv1805 02262 (cit on p 72)

Ankerholz Amber (2016) 2016 Future of Open Source Survey Says OpenSource Is The Modern Architecture httpswwwlinuxcomnews2016-future-open-source-survey-says-open-source-modern-

architecture (cit on p 78)Asooja Kartik Georgeta Bordea Gabriela Vulcu and Paul Buitelaar

(2016) bdquoForecasting emerging trends from scientific literatureldquoIn Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) pp 417ndash420 (cit on p 68)

Augustin Jason (2016) Emergin Science and Technology Trends A Syn-thesis of Leading Forecast httpwwwdefenseinnovationmarket-placemilresources2016_SciTechReport_16June2016pdf (citon p 78)

Blei David M and John D Lafferty (2006) bdquoDynamic topic modelsldquoIn Proceedings of the 23rd international conference on Machine learn-ing ACM pp 113ndash120 (cit on pp 66 67)

Blei David M John D Lafferty et al (2007) bdquoA correlated topic modelof scienceldquo In The Annals of Applied Statistics 11 pp 17ndash35 (citon p 66)

Blei David M Andrew Y Ng and Michael I Jordan (2003) bdquoLatentdirichlet allocationldquo In Journal of machine Learning research 3Janpp 993ndash1022 (cit on p 66)

Bloem Peter (2019) Transformers from Scratch httpwwwpeterbloemnlblogtransformers (cit on p 45)

Bocklisch Tom Joey Faulkner Nick Pawlowski and Alan Nichol(2017) bdquoRasa Open source language understanding and dialoguemanagementldquo In arXiv preprint arXiv171205181 (cit on p 15)

Bolelli Levent Seyda Ertekin and C Giles (2009) bdquoTopic and trenddetection in text collections using latent dirichlet allocationldquo InAdvances in Information Retrieval pp 776ndash780 (cit on p 66)

87

[ September 4 2020 at 1017 ndash classicthesis ]

88 Bibliography

Bolelli Levent Seyda Ertekin Ding Zhou and C Lee Giles (2009)bdquoFinding topic trends in digital librariesldquo In Proceedings of the 9thACMIEEE-CS joint conference on Digital libraries ACM pp 69ndash72

(cit on p 66)Bordes Antoine Y-Lan Boureau and Jason Weston (2016) bdquoLearning

end-to-end goal-oriented dialogldquo In arXiv preprint arXiv160507683(cit on pp 16 17)

Brooks Chuck (2016) 7 Top Tech Trends Impacting Innovators in 2016httpinnovationexcellencecomblog201512267-top-

tech-trends-impacting-innovators-in-2016 (cit on p 78)Burtsev Mikhail Alexander Seliverstov Rafael Airapetyan Mikhail

Arkhipov Dilyara Baymurzina Eugene Botvinovsky NickolayBushkov Olga Gureenkova Andrey Kamenev Vasily Konovalovet al (2018) bdquoDeepPavlov An Open Source Library for Conver-sational AIldquo In (cit on p 15)

Chang Jonathan David M Blei et al (2010) bdquoHierarchical relationalmodels for document networksldquo In The Annals of Applied Statis-tics 41 pp 124ndash150 (cit on p 66)

Chen Hongshen Xiaorui Liu Dawei Yin and Jiliang Tang (2017)bdquoA survey on dialogue systems Recent advances and new fron-tiersldquo In Acm Sigkdd Explorations Newsletter 192 pp 25ndash35 (citon pp 14ndash16)

Chen Lingzhen Alessandro Moschitti Giuseppe Castellucci AndreaFavalli and Raniero Romagnoli (2018) bdquoTransfer Learning for In-dustrial Applications of Named Entity Recognitionldquo In NL4AIAI IA pp 129ndash140 (cit on p 43)

Cleveland William S (1979) bdquoRobust locally weighted regression andsmoothing scatterplotsldquo In Journal of the American statistical asso-ciation 74368 pp 829ndash836 (cit on p 84)

Collobert Ronan Jason Weston Leacuteon Bottou Michael Karlen KorayKavukcuoglu and Pavel Kuksa (2011) bdquoNatural language pro-cessing (almost) from scratchldquo In Journal of machine learning re-search 12Aug pp 2493ndash2537 (cit on p 16)

Cuayaacutehuitl Heriberto Simon Keizer and Oliver Lemon (2015) bdquoStrate-gic dialogue management via deep reinforcement learningldquo InarXiv preprint arXiv151108099 (cit on p 14)

Cui Lei Shaohan Huang Furu Wei Chuanqi Tan Chaoqun Duanand Ming Zhou (2017) bdquoSuperagent A customer service chat-bot for e-commerce websitesldquo In Proceedings of ACL 2017 SystemDemonstrations pp 97ndash102 (cit on p 8)

Curiskis Stephan A Barry Drake Thomas R Osborn and Paul JKennedy (2019) bdquoAn evaluation of document clustering and topicmodelling in two online social networks Twitter and Redditldquo InInformation Processing amp Management (cit on p 74)

Dai Zihang Zhilin Yang Yiming Yang William W Cohen Jaime Car-bonell Quoc V Le and Ruslan Salakhutdinov (2019) bdquoTransformer-xl Attentive language models beyond a fixed-length contextldquo InarXiv preprint arXiv190102860 (cit on pp 43 44)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 89

Daniel Ben (2015) bdquoBig Data and analytics in higher education Op-portunities and challengesldquo In British journal of educational tech-nology 465 pp 904ndash920 (cit on p 22)

Deerwester Scott Susan T Dumais George W Furnas Thomas K Lan-dauer and Richard Harshman (1990) bdquoIndexing by latent seman-tic analysisldquo In Journal of the American society for information sci-ence 416 pp 391ndash407 (cit on p 65)

Deoras Anoop Kaisheng Yao Xiaodong He Li Deng Geoffrey Ger-son Zweig Ruhi Sarikaya Dong Yu Mei-Yuh Hwang and Gre-goire Mesnil (2015) Assignment of semantic labels to a sequence ofwords using neural network architectures US Patent App 14016186

(cit on p 13)Devlin Jacob Ming-Wei Chang Kenton Lee and Kristina Toutanova

(2018) bdquoBert Pre-training of deep bidirectional transformers forlanguage understandingldquo In arXiv preprint arXiv181004805 (citon pp 42ndash45)

Dhingra Bhuwan Lihong Li Xiujun Li Jianfeng Gao Yun-NungChen Faisal Ahmed and Li Deng (2016) bdquoTowards end-to-end re-inforcement learning of dialogue agents for information accessldquoIn arXiv preprint arXiv160900777 (cit on p 16)

Dieng Adji B Francisco JR Ruiz and David M Blei (2019) bdquoTopicModeling in Embedding Spacesldquo In arXiv preprint arXiv190704907(cit on pp 74 84)

Donahue Jeff Yangqing Jia Oriol Vinyals Judy Hoffman Ning ZhangEric Tzeng and Trevor Darrell (2014) bdquoDecaf A deep convolu-tional activation feature for generic visual recognitionldquo In Inter-national conference on machine learning pp 647ndash655 (cit on p 43)

Eric Mihail and Christopher D Manning (2017) bdquoKey-value retrievalnetworks for task-oriented dialogueldquo In arXiv preprint arXiv 170505414 (cit on p 16)

Fang Hao Hao Cheng Maarten Sap Elizabeth Clark Ari HoltzmanYejin Choi Noah A Smith and Mari Ostendorf (2018) bdquoSoundingboard A user-centric and content-driven social chatbotldquo In arXivpreprint arXiv180410202 (cit on p 10)

Friedrich Fabian Jan Mendling and Frank Puhlmann (2011) bdquoPro-cess model generation from natural language textldquo In Interna-tional Conference on Advanced Information Systems Engineering pp 482

ndash496 (cit on p 2)Frot Mathilde (2016) 5 Trends in Computer Science Research https

wwwtopuniversitiescomcoursescomputer-science-info-

rmation-systems5-trends-computer-science-research (citon p 78)

Glasmachers Tobias (2017) bdquoLimits of end-to-end learningldquo In arXivpreprint arXiv170408305 (cit on pp 16 17)

Goddeau David Helen Meng Joseph Polifroni Stephanie Seneffand Senis Busayapongchai (1996) bdquoA form-based dialogue man-ager for spoken language applicationsldquo In Proceeding of FourthInternational Conference on Spoken Language Processing ICSLPrsquo96Vol 2 IEEE pp 701ndash704 (cit on p 14)

[ September 4 2020 at 1017 ndash classicthesis ]

90 Bibliography

Gohr Andre Alexander Hinneburg Rene Schult and Myra Spiliopou-lou (2009) bdquoTopic evolution in a stream of documentsldquo In Pro-ceedings of the 2009 SIAM International Conference on Data MiningSIAM pp 859ndash870 (cit on p 65)

Gollapalli Sujatha Das and Xiaoli Li (2015) bdquoEMNLP versus ACLAnalyzing NLP research over timeldquo In EMNLP pp 2002ndash2006

(cit on p 72)Griffiths Thomas L and Mark Steyvers (2004) bdquoFinding scientific top-

icsldquo In Proceedings of the National academy of Sciences 101suppl 1pp 5228ndash5235 (cit on p 65)

Griffiths Thomas L Michael I Jordan Joshua B Tenenbaum andDavid M Blei (2004) bdquoHierarchical topic models and the nestedChinese restaurant processldquo In Advances in neural information pro-cessing systems pp 17ndash24 (cit on p 66)

Hall David Daniel Jurafsky and Christopher D Manning (2008) bdquoStu-dying the history of ideas using topic modelsldquo In Proceedingsof the conference on empirical methods in natural language processingAssociation for Computational Linguistics pp 363ndash371 (cit onp 66)

Harriet Taylor (2015) Privacy will hit tipping point in 2016 httpwwwcnbccom20151109privacy-will-hit-tipping-point-

in-2016html (cit on p 78)He Jiangen and Chaomei Chen (2018) bdquoPredictive Effects of Novelty

Measured by Temporal Embeddings on the Growth of ScientificLiteratureldquo In Frontiers in Research Metrics and Analytics 3 p 9

(cit on pp 64 68 71)He Junxian Zhiting Hu Taylor Berg-Kirkpatrick Ying Huang and

Eric P Xing (2017) bdquoEfficient correlated topic modeling with topicembeddingldquo In Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ACM pp 225ndash233 (cit on pp 66 67)

He Qi Bi Chen Jian Pei Baojun Qiu Prasenjit Mitra and Lee Giles(2009) bdquoDetecting topic evolution in scientific literature How cancitations helpldquo In Proceedings of the 18th ACM conference on In-formation and knowledge management ACM pp 957ndash966 (cit onp 68)

Henderson Matthew Blaise Thomson and Jason D Williams (2014)bdquoThe second dialog state tracking challengeldquo In Proceedings of the15th Annual Meeting of the Special Interest Group on Discourse andDialogue (SIGDIAL) pp 263ndash272 (cit on p 15)

Henderson Matthew Blaise Thomson and Steve Young (2013) bdquoDeepneural network approach for the dialog state tracking challengeldquoIn Proceedings of the SIGDIAL 2013 Conference pp 467ndash471 (cit onp 14)

Hess Ann Hari Iyer and William Malm (2001) bdquoLinear trend analy-sis a comparison of methodsldquo In Atmospheric Environment 3530Visibility Aerosol and Atmospheric Optics pp 5211 ndash5222 issn1352-2310 doi httpsdoiorg101016S1352- 2310(01)00342-9 url httpwwwsciencedirectcomsciencearticlepiiS1352231001003429 (cit on p 75)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 91

Hofmann Thomas (1999) bdquoProbabilistic latent semantic analysisldquo InProceedings of the Fifteenth conference on Uncertainty in artificial in-telligence Morgan Kaufmann Publishers Inc pp 289ndash296 (cit onp 65)

Honnibal Matthew and Ines Montani (2017) bdquospacy 2 Natural lan-guage understanding with bloom embeddings convolutional neu-ral networks and incremental parsingldquo In To appear 7 (cit onp 15)

Howard Jeremy and Sebastian Ruder (2018) bdquoUniversal languagemodel fine-tuning for text classificationldquo In arXiv preprint arXiv180106146 (cit on p 44)

IEEE Computer Society (2016) Top 9 Computing Technology Trends for2016 httpswwwscientificcomputingcomnews201601top-9-computing-technology-trends-2016 (cit on p 78)

Ivancic Lucija Dalia Suša Vugec and Vesna Bosilj Vukšic (2019) bdquoRobo-tic Process Automation Systematic Literature Reviewldquo In In-ternational Conference on Business Process Management Springerpp 280ndash295 (cit on pp 1 2)

Jain Mohit Pratyush Kumar Ramachandra Kota and Shwetak NPatel (2018) bdquoEvaluating and informing the design of chatbotsldquoIn Proceedings of the 2018 Designing Interactive Systems ConferenceACM pp 895ndash906 (cit on p 8)

Khanpour Hamed Nishitha Guntakandla and Rodney Nielsen (2016)bdquoDialogue act classification in domain-independent conversationsusing a deep recurrent neural networkldquo In Proceedings of COL-ING 2016 the 26th International Conference on Computational Lin-guistics Technical Papers pp 2012ndash2021 (cit on p 13)

Kim Young-Bum Karl Stratos Ruhi Sarikaya and Minwoo Jeong(2015) bdquoNew transfer learning techniques for disparate label setsldquoIn Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference onNatural Language Processing (Volume 1 Long Papers) pp 473ndash482

(cit on p 43)Kolhatkar Varada Heike Zinsmeister and Graeme Hirst (2013) bdquoAn-

notating anaphoric shell nouns with their antecedentsldquo In Pro-ceedings of the 7th Linguistic Annotation Workshop and Interoperabil-ity with Discourse pp 112ndash121 (cit on pp 77 78)

Kontostathis April Leon M Galitsky William M Pottenger SomaRoy and Daniel J Phelps (2004) bdquoA survey of emerging trend de-tection in textual data miningldquo In Survey of text mining Springerpp 185ndash224 (cit on p 63)

Krippendorff Klaus (2011) bdquoComputing Krippendorffrsquos alpha- relia-bilityldquo In Annenberg School for Communication (ASC) DepartmentalPapers 43 (cit on p 78)

Kurata Gakuto Bing Xiang Bowen Zhou and Mo Yu (2016) bdquoLever-aging sentence-level information with encoder lstm for semanticslot fillingldquo In arXiv preprint arXiv160101530 (cit on p 13)

Landis J Richard and Gary G Koch (1977) bdquoThe measurement ofobserver agreement for categorical dataldquo In biometrics pp 159ndash174 (cit on p 78)

[ September 4 2020 at 1017 ndash classicthesis ]

92 Bibliography

Le Minh-Hoang Tu-Bao Ho and Yoshiteru Nakamori (2005) bdquoDe-tecting emerging trends from scientific corporaldquo In InternationalJournal of Knowledge and Systems Sciences 22 pp 53ndash59 (cit onp 68)

Le Quoc V and Tomas Mikolov (2014) bdquoDistributed Representationsof Sentences and Documentsldquo In ICML Vol 14 pp 1188ndash1196

(cit on p 74)Lee Ji Young and Franck Dernoncourt (2016) bdquoSequential short-text

classification with recurrent and convolutional neural networksldquoIn arXiv preprint arXiv160303827 (cit on p 13)

Leopold Henrik Han van der Aa and Hajo A Reijers (2018) bdquoIden-tifying candidate tasks for robotic process automation in textualprocess descriptionsldquo In Enterprise Business-Process and Informa-tion Systems Modeling Springer pp 67ndash81 (cit on p 2)

Levenshtein Vladimir I (1966) bdquoBinary codes capable of correctingdeletions insertions and reversalsldquo In Soviet physics doklady 8pp 707ndash710 (cit on p 25)

Liao Vera Masud Hussain Praveen Chandar Matthew Davis Yasa-man Khazaeni Marco Patricio Crasso Dakuo Wang Michael Mu-ller N Sadat Shami Werner Geyer et al (2018) bdquoAll Work andNo Playldquo In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems ACM p 3 (cit on p 8)

Lowe Ryan Thomas Nissan Pow Iulian Vlad Serban Laurent Char-lin Chia-Wei Liu and Joelle Pineau (2017) bdquoTraining end-to-enddialogue systems with the ubuntu dialogue corpusldquo In Dialogueamp Discourse 81 pp 31ndash65 (cit on p 22)

Luger Ewa and Abigail Sellen (2016) bdquoLike having a really bad PAthe gulf between user expectation and experience of conversa-tional agentsldquo In Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems ACM pp 5286ndash5297 (cit on p 8)

MacQueen James (1967) bdquoSome methods for classification and analy-sis of multivariate observationsldquo In Proceedings of the fifth Berkeleysymposium on mathematical statistics and probability Vol 1 OaklandCA USA pp 281ndash297 (cit on p 74)

Manning Christopher D Prabhakar Raghavan and Hinrich Schuumltze(2008) Introduction to Information Retrieval Web publication at in-formationretrievalorg Cambridge University Press (cit on p 74)

Markov Igor (2015) 13 Of 2015rsquos Hottest Topics In Computer ScienceResearch httpswwwforbescomsitesquora2015042213-of-2015s-hottest-topics-in-computer-science-research

fb0d46b1e88d (cit on p 78)Martin James H and Daniel Jurafsky (2009) Speech and language pro-

cessing An introduction to natural language processing computationallinguistics and speech recognition PearsonPrentice Hall Upper Sad-dle River

Mei Qiaozhu Deng Cai Duo Zhang and ChengXiang Zhai (2008)bdquoTopic modeling with network regularizationldquo In Proceedings ofthe 17th international conference on World Wide Web ACM pp 101ndash110 (cit on p 65)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 93

Meyerson Bernard and DiChristina Mariette (2016) These are the top10 emerging technologies of 2016 httpwww3weforumorgdocsGAC16_Top10_Emerging_Technologies_2016_reportpdf (cit onp 78)

Mikolov Tomas Kai Chen Greg Corrado and Jeffrey Dean (2013)bdquoEfficient estimation of word representations in vector spaceldquo InarXiv preprint arXiv13013781 (cit on p 74)

Miller Alexander Adam Fisch Jesse Dodge Amir-Hossein KarimiAntoine Bordes and Jason Weston (2016) bdquoKey-value memorynetworks for directly reading documentsldquo In (cit on p 16)

Mohanty Soumendra and Sachin Vyas (2018) bdquoHow to Compete inthe Age of Artificial Intelligenceldquo In (cit on pp III V 1 2)

Moiseeva Alena and Hinrich Schuumltze (2020) bdquoTRENDNERT A Bench-mark for Trend and Downtrend Detection in a Scientific DomainldquoIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial In-telligence (cit on p 71)

Moiseeva Alena Dietrich Trautmann Michael Heimann and Hin-rich Schuumltze (2020) bdquoMultipurpose Intelligent Process Automa-tion via Conversational Assistantldquo In arXiv preprint arXiv200102284 (cit on pp 21 41)

Monaghan Fergal Georgeta Bordea Krystian Samp and Paul Buite-laar (2010) bdquoExploring your research Sprinkling some saffron onsemantic web dog foodldquo In Semantic Web Challenge at the Inter-national Semantic Web Conference Vol 117 Citeseer pp 420ndash435

(cit on p 68)Monostori Laacuteszloacute Andraacutes Maacuterkus Hendrik Van Brussel and E West-

kaumlmpfer (1996) bdquoMachine learning approaches to manufactur-ingldquo In CIRP annals 452 pp 675ndash712 (cit on p 15)

Moody Christopher E (2016) bdquoMixing dirichlet topic models andword embeddings to make lda2vecldquo In arXiv preprint arXiv160502019 (cit on p 74)

Mou Lili Zhao Meng Rui Yan Ge Li Yan Xu Lu Zhang and ZhiJin (2016) bdquoHow transferable are neural networks in nlp applica-tionsldquo In arXiv preprint arXiv160306111 (cit on p 43)

Mrkšic Nikola Diarmuid O Seacuteaghdha Tsung-Hsien Wen Blaise Thom-son and Steve Young (2016) bdquoNeural belief tracker Data-drivendialogue state trackingldquo In arXiv preprint arXiv160603777 (citon p 14)

Nessma Joussef (2015) Top 10 Hottest Research Topics in Computer Sci-ence http www pouted com top - 10 - hottest - research -

topics-in-computer-science (cit on p 78)Nie Binling and Shouqian Sun (2017) bdquoUsing text mining techniques

to identify research trends A case study of design researchldquo InApplied Sciences 74 p 401 (cit on p 68)

Pan Sinno Jialin and Qiang Yang (2009) bdquoA survey on transfer learn-ingldquo In IEEE Transactions on knowledge and data engineering 2210pp 1345ndash1359 (cit on p 41)

Qu Lizhen Gabriela Ferraro Liyuan Zhou Weiwei Hou and Tim-othy Baldwin (2016) bdquoNamed entity recognition for novel types

[ September 4 2020 at 1017 ndash classicthesis ]

94 Bibliography

by transfer learningldquo In arXiv preprint arXiv161009914 (cit onp 43)

Radford Alec Jeffrey Wu Rewon Child David Luan Dario Amodeiand Ilya Sutskever (2019) bdquoLanguage models are unsupervisedmultitask learnersldquo In OpenAI Blog 18 (cit on p 44)

Reese Hope (2015) 7 trends for artificial intelligence in 2016 rsquoLike 2015on steroidsrsquo httpwwwtechrepubliccomarticle7-trends-for-artificial-intelligence-in-2016-like-2015-on-steroids

(cit on p 78)Rivera Maricel (2016) Is Digital Game-Based Learning The Future Of

Learning httpselearningindustrycomdigital-game-based-

learning-future (cit on p 78)Rotolo Daniele Diana Hicks and Ben R Martin (2015) bdquoWhat is an

emerging technologyldquo In Research Policy 4410 pp 1827ndash1843

(cit on p 64)Salatino Angelo (2015) bdquoEarly detection and forecasting of research

trendsldquo In (cit on pp 63 66)Serban Iulian V Alessandro Sordoni Yoshua Bengio Aaron Courville

and Joelle Pineau (2016) bdquoBuilding end-to-end dialogue systemsusing generative hierarchical neural network modelsldquo In Thirti-eth AAAI Conference on Artificial Intelligence (cit on p 11)

Sharif Razavian Ali Hossein Azizpour Josephine Sullivan and Ste-fan Carlsson (2014) bdquoCNN features off-the-shelf an astoundingbaseline for recognitionldquo In Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops pp 806ndash813 (citon p 43)

Shibata Naoki Yuya Kajikawa and Yoshiyuki Takeda (2009) bdquoCom-parative study on methods of detecting research fronts using dif-ferent types of citationldquo In Journal of the American Society for in-formation Science and Technology 603 pp 571ndash580 (cit on p 68)

Shibata Naoki Yuya Kajikawa Yoshiyuki Takeda and KatsumoriMatsushima (2008) bdquoDetecting emerging research fronts basedon topological measures in citation networks of scientific publica-tionsldquo In Technovation 2811 pp 758ndash775 (cit on p 68)

Small Henry (2006) bdquoTracking and predicting growth areas in sci-enceldquo In Scientometrics 683 pp 595ndash610 (cit on p 68)

Small Henry Kevin W Boyack and Richard Klavans (2014) bdquoIden-tifying emerging topics in science and technologyldquo In ResearchPolicy 438 pp 1450ndash1467 (cit on p 64)

Su Pei-Hao Nikola Mrkšic Intildeigo Casanueva and Ivan Vulic (2018)bdquoDeep Learning for Conversational AIldquo In Proceedings of the 2018Conference of the North American Chapter of the Association for Com-putational Linguistics Tutorial Abstracts pp 27ndash32 (cit on p 12)

Sutskever Ilya Oriol Vinyals and Quoc V Le (2014) bdquoSequence tosequence learning with neural networksldquo In Advances in neuralinformation processing systems pp 3104ndash3112 (cit on pp 16 17)

Trautmann Dietrich Johannes Daxenberger Christian Stab HinrichSchuumltze and Iryna Gurevych (2019) bdquoRobust Argument Unit Recog-nition and Classificationldquo In arXiv preprint arXiv190409688 (citon p 41)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 95

Tu Yi-Ning and Jia-Lang Seng (2012) bdquoIndices of novelty for emerg-ing topic detectionldquo In Information processing amp management 482pp 303ndash325 (cit on p 64)

Turing Alan (1949) London Times letter to the editor (cit on p 7)Ultes Stefan Lina Barahona Pei-Hao Su David Vandyke Dongho

Kim Inigo Casanueva Paweł Budzianowski Nikola Mrkšic Tsu-ng-Hsien Wen et al (2017) bdquoPydial A multi-domain statistical di-alogue system toolkitldquo In Proceedings of ACL 2017 System Demon-strations pp 73ndash78 (cit on p 15)

Vaswani Ashish Noam Shazeer Niki Parmar Jakob Uszkoreit LlionJones Aidan N Gomez Łukasz Kaiser and Illia Polosukhin (2017)bdquoAttention is all you needldquo In Advances in neural information pro-cessing systems pp 5998ndash6008 (cit on p 45)

Vincent Pascal Hugo Larochelle Yoshua Bengio and Pierre-AntoineManzagol (2008) bdquoExtracting and composing robust features withdenoising autoencodersldquo In Proceedings of the 25th internationalconference on Machine learning ACM pp 1096ndash1103 (cit on p 44)

Vodolaacuten Miroslav Rudolf Kadlec and Jan Kleindienst (2015) bdquoHy-brid dialog state trackerldquo In arXiv preprint arXiv151003710 (citon p 14)

Wang Alex Amanpreet Singh Julian Michael Felix Hill Omer Levyand Samuel R Bowman (2018) bdquoGlue A multi-task benchmarkand analysis platform for natural language understandingldquo InarXiv preprint arXiv180407461 (cit on p 44)

Wang Chong and David M Blei (2011) bdquoCollaborative topic modelingfor recommending scientific articlesldquo In Proceedings of the 17thACM SIGKDD international conference on Knowledge discovery anddata mining ACM pp 448ndash456 (cit on p 66)

Wang Xuerui and Andrew McCallum (2006) bdquoTopics over time anon-Markov continuous-time model of topical trendsldquo In Pro-ceedings of the 12th ACM SIGKDD international conference on Knowl-edge discovery and data mining ACM pp 424ndash433 (cit on pp 65ndash67)

Wang Zhuoran and Oliver Lemon (2013) bdquoA simple and generic be-lief tracking mechanism for the dialog state tracking challengeOn the believability of observed informationldquo In Proceedings ofthe SIGDIAL 2013 Conference pp 423ndash432 (cit on p 14)

Weizenbaum Joseph et al (1966) bdquoELIZAmdasha computer program forthe study of natural language communication between man andmachineldquo In Communications of the ACM 91 pp 36ndash45 (cit onp 7)

Wen Tsung-Hsien Milica Gasic Nikola Mrksic Pei-Hao Su DavidVandyke and Steve Young (2015) bdquoSemantically conditioned lstm-based natural language generation for spoken dialogue systemsldquoIn arXiv preprint arXiv150801745 (cit on p 14)

Wen Tsung-Hsien David Vandyke Nikola Mrksic Milica Gasic LinaM Rojas-Barahona Pei-Hao Su Stefan Ultes and Steve Young(2016) bdquoA network-based end-to-end trainable task-oriented dia-logue systemldquo In arXiv preprint arXiv160404562 (cit on pp 1116 17 22)

[ September 4 2020 at 1017 ndash classicthesis ]

96 Bibliography

Werner Alexander Dietrich Trautmann Dongheui Lee and RobertoLampariello (2015) bdquoGeneralization of optimal motion trajecto-ries for bipedal walkingldquo In pp 1571ndash1577 (cit on p 41)

Williams Jason D Kavosh Asadi and Geoffrey Zweig (2017) bdquoHybridcode networks practical and efficient end-to-end dialog controlwith supervised and reinforcement learningldquo In arXiv preprintarXiv170203274 (cit on p 17)

Williams Jason Antoine Raux and Matthew Henderson (2016) bdquoThedialog state tracking challenge series A reviewldquo In Dialogue ampDiscourse 73 pp 4ndash33 (cit on p 12)

Wu Chien-Sheng Richard Socher and Caiming Xiong (2019) bdquoGlobal-to-local Memory Pointer Networks for Task-Oriented DialogueldquoIn arXiv preprint arXiv190104713 (cit on pp 15ndash17)

Wu Yonghui Mike Schuster Zhifeng Chen Quoc V Le Moham-mad Norouzi Wolfgang Macherey Maxim Krikun Yuan CaoQin Gao Klaus Macherey et al (2016) bdquoGooglersquos neural ma-chine translation system Bridging the gap between human andmachine translationldquo In arXiv preprint arXiv160908144 (cit onp 45)

Xie Pengtao and Eric P Xing (2013) bdquoIntegrating Document Cluster-ing and Topic Modelingldquo In Proceedings of the 30th Conference onUncertainty in Artificial Intelligence (cit on p 74)

Yan Zhao Nan Duan Peng Chen Ming Zhou Jianshe Zhou andZhoujun Li (2017) bdquoBuilding task-oriented dialogue systems foronline shoppingldquo In Thirty-First AAAI Conference on Artificial In-telligence (cit on p 14)

Yogatama Dani Michael Heilman Brendan OrsquoConnor Chris DyerBryan R Routledge and Noah A Smith (2011) bdquoPredicting a scien-tific communityrsquos response to an articleldquo In Proceedings of the Con-ference on Empirical Methods in Natural Language Processing Asso-ciation for Computational Linguistics pp 594ndash604 (cit on p 65)

Young Steve Milica Gašic Blaise Thomson and Jason D Williams(2013) bdquoPomdp-based statistical spoken dialog systems A reviewldquoIn Proceedings of the IEEE 1015 pp 1160ndash1179 (cit on p 17)

Zaino Jennifer (2016) 2016 Trends for Semantic Web and Semantic Tech-nologies http www dataversity net 2017 - predictions -

semantic-web-semantic-technologies (cit on p 78)Zhou Hao Minlie Huang et al (2016) bdquoContext-aware natural lan-

guage generation for spoken dialogue systemsldquo In Proceedingsof COLING 2016 the 26th International Conference on ComputationalLinguistics Technical Papers pp 2032ndash2041 (cit on pp 14 15)

Zhu Yukun Ryan Kiros Rich Zemel Ruslan Salakhutdinov RaquelUrtasun Antonio Torralba and Sanja Fidler (2015) bdquoAligningbooks and movies Towards story-like visual explanations by watch-ing movies and reading booksldquo In Proceedings of the IEEE interna-tional conference on computer vision pp 19ndash27 (cit on p 44)

[ September 4 2020 at 1017 ndash classicthesis ]

  • Abstract
  • Abstract
  • Acknowledgements
  • Contents
  • List of Figures
    • List of Figures
      • List of Tables
        • List of Tables
          • Acronyms
            • Acronyms
              • 1 Introduction
                • 11 Outline
                  • E-learning Conversational Assistant
                    • 2 Foundations
                      • 21 Introduction
                      • 22 Conversational Assistants
                        • 221 Taxonomy
                        • 222 Conventional Architecture
                        • 223 Existing Approaches
                          • 23 Target Domain amp Task Definition
                          • 24 Summary
                            • 3 Multipurpose IPA via Conversational Assistant
                              • 31 Introduction
                                • 311 Outline and Contributions
                                  • 32 Model
                                    • 321 OMB+ Design
                                    • 322 Preprocessing
                                    • 323 Natural Language Understanding
                                    • 324 Dialogue Manager
                                    • 325 Meta Policy
                                    • 326 Response Generation
                                      • 33 Evaluation
                                        • 331 Automated Evaluation
                                        • 332 Human Evaluation amp Error Analysis
                                          • 34 Structured Dialogue Acquisition
                                          • 35 Summary
                                          • 36 Future Work
                                            • 4 Deep Learning Re-Implementation of Units
                                              • 41 Introduction
                                                • 411 Outline and Contributions
                                                  • 42 Related Work
                                                  • 43 BERT Bidirectional Encoder Representations from Transformers
                                                  • 44 Data amp Descriptive Statistics
                                                  • 45 Named Entity Recognition
                                                    • 451 Model Settings
                                                      • 46 Next Action Prediction
                                                        • 461 Model Settings
                                                          • 47 Evaluation and Results
                                                          • 48 Error Analysis
                                                          • 49 Summary
                                                          • 410 Future Work
                                                              • Clustering-based Trend Analyzer
                                                                • 5 Foundations
                                                                  • 51 Motivation and Challenges
                                                                  • 52 Detection of Emerging Research Trends
                                                                    • 521 Methods for Topic Modeling
                                                                    • 522 Topic Evolution of Scientific Publications
                                                                      • 53 Summary
                                                                        • 6 Benchmark for Scientific Trend Detection
                                                                          • 61 Motivation
                                                                            • 611 Outline and Contributions
                                                                              • 62 Corpus Creation
                                                                                • 621 Underlying Data
                                                                                • 622 Stratification
                                                                                • 623 Document representations amp Clustering
                                                                                • 624 Trend amp Downtrend Estimation
                                                                                • 625 (Down)trend Candidates Validation
                                                                                  • 63 Benchmark Annotation
                                                                                    • 631 Inter-annotator Agreement
                                                                                    • 632 Gold Trends
                                                                                      • 64 Evaluation Measure for Trend Detection
                                                                                      • 65 Trend Detection Baselines
                                                                                        • 651 Configurations
                                                                                        • 652 Experimental Setup and Results
                                                                                          • 66 Benchmark Description
                                                                                          • 67 Summary
                                                                                          • 68 Future Work
                                                                                            • 7 Discussion and Future Work
                                                                                            • Bibliography
Page 10: Statistical Natural Language Processing Methods for

X contents

49 Summary 58

410 Future Work 59

ii clustering-based trend analyzer 61

5 foundations 63

51 Motivation and Challenges 63

52 Detection of Emerging Research Trends 64

521 Methods for Topic Modeling 65

522 Topic Evolution of Scientific Publications 67

53 Summary 68

6 benchmark for scientific trend detection 71

61 Motivation 71

611 Outline and Contributions 72

62 Corpus Creation 73

621 Underlying Data 73

622 Stratification 73

623 Document representations amp Clustering 74

624 Trend amp Downtrend Estimation 75

625 (Down)trend Candidates Validation 75

63 Benchmark Annotation 77

631 Inter-annotator Agreement 77

632 Gold Trends 78

64 Evaluation Measure for Trend Detection 79

65 Trend Detection Baselines 80

651 Configurations 81

652 Experimental Setup and Results 81

66 Benchmark Description 82

67 Summary 83

68 Future Work 83

7 discussion and future work 85

bibliography 87

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F F I G U R E S

Figure 1 Taxonomy of dialogue systems 10

Figure 2 Example of a conventional dialogue architecture 12

Figure 3 OMB+ online learning platform 24

Figure 4 Ruled-Based System Dialogue flow 36

Figure 5 Macro F1 evaluation NAP ndash default modelComparison between German and multilingualBERT 52

Figure 6 Macro F1 test NAP ndash default model Compari-son between German and multilingual BERT 53

Figure 7 Macro F1 evaluation NAP ndash extended model 54

Figure 8 Macro F1 test NAP ndash extended model 55

Figure 9 Macro F1 evaluation NER ndash Comparison be-tween German and multilingual BERT 56

Figure 10 Macro F1 test NER ndash Comparison between Ger-man and multilingual BERT 57

Figure 11 (Down)trend detection Overall distribution ofpapers in the entire dataset 73

Figure 12 A trend candidate (positive slope) Topic Sen-timent and Emotion Analysis (using Social Me-dia Channels) 76

Figure 13 A downtrend candidate (negative slope) TopicXML 76

XI

[ September 4 2020 at 1017 ndash classicthesis ]

L I S T O F TA B L E S

Table 1 Natural language representation of a sentence 13

Table 2 Rule 1 Admissible configurations 27

Table 3 Rule 2 Admissible configurations 27

Table 4 Rule 3 Admissible configurations 27

Table 5 Rule 4 Admissible configurations 28

Table 6 Rule 5 Admissible configurations 28

Table 7 Case 1 ID completeness 31

Table 8 Case 2 ID completeness 31

Table 9 Showcase 1 Short flow 33

Table 10 Showcase 2 Organisational question 33

Table 11 Showcase 3 Fallback and human-request poli-cies 34

Table 12 Showcase 4 Contextual question 34

Table 13 Showcase 5 Long flow 35

Table 14 General statistics for conversational dataset 46

Table 15 Detailed statistics on possible systems actionsin conversational dataset 48

Table 16 Detailed statistics on possible named entitiesin conversational dataset 48

Table 17 F1-results for the Named Entity Recognition task 51

Table 18 F1-results for the Next Action Prediction task 51

Table 19 Average dialogue accuracy computed for theNext Action Prediction task 51

Table 20 Trend detection List of gold trends 79

Table 21 Trend detection Example of MAP evaluation 80

Table 22 Trend detection Ablation results 82

XII

[ September 4 2020 at 1017 ndash classicthesis ]

acronyms XIII

AI Artificial Intelligence

API Application Programming Interface

BERT Bidirectional Encoder Representations from Transformers

BiLSTM Bidirectional Long Short Term Memory

CRF Conditional Random Fields

DM Dialogue Manager

DST Dialogue State Tracker

DSTC Dialogue State Tracking Challenge

EC Equivalence Classes

ES ElasticSearch

GM Gaussian Models

IAA Inter Annotator Agreement

ID Informational Dictionary

IPA Intelligent Process Automation

ITS Intelligent Tutoring System

LDA Latent Dirichlet Allocation

LSA Latent Semantic Analysis

LSTM Long Short Term Memory

MAP Mean Average Precision

MER Mutually Exclusive Rules

NAP Next Action Prediction

NER Named Entity Recognition

NLG Natural Language Generation

NLP Natural Language Processing

NLU Natural Language Understanding

OMB+ Online Mathematik Bruumlckenkurs

PL Policy Learning

pLSA probabilistic Latent Semantic Analysis

REST Representational State Transfer

RNN Recurrent Neural Network

RPA Robotic Process Automation

RegEx Regular Expressions

TL Transfer Learning

ToDS Task-oriented Dialogue System

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

1I N T R O D U C T I O N

Robotic Process Automation (RPA) is a type of software bots that im-itates manual human activities like entering data into a system log-ging into accounts or executing simple but repetitive workflows (Mo-hanty and Vyas 2018) In other words RPA-systems allow businessapplications to work without human involvement by replicating hu-man worker activities A further benefit of RPA-bots is that they aremuch cheaper compared to the employment of a human worker andthey are also able to work around-the-clock Overall advantages ofsoftware bots are hugely appealing and include cost savings accuracyand compliance while executing processes improved responsiveness(ie bots are faster than human workers) and finally such bots areusually agile and multi-skilled (Ivancic Vugec and Vukšic 2019)

However one of the main benefits and at the same time ndash draw-backs of the RPA-systems is their ability to precisely fulfill the as-signed task On the one hand the bot can carry out the task accu-rately and diligently On the other hand it is highly susceptible tochanges in defined scenarios Being designed for a particular taskthe RPA-bot is often not adaptable to other domains or even simplechanges in a workflow (Mohanty and Vyas 2018) This inability toadapt to changing conditions gave rise to a further area of improve-ment for RPA-bots ndash Intelligent Process Automation (IPA) systems

IPA-bots combine Robotic Process Automation (RPA) with ArtificialIntelligence (AI) and thus can perform complex and more cognitivelydemanding tasks that require reasoning and language understand-ing Hence IPA-bots went beyond automating simple ldquoclick tasksrdquo (asin RPA) and can accomplish jobs more intelligently ndash employing ma-chine learning algorithms and advanced analytics Such IPA-systemsundertake time-consuming and routine tasks and thus enable smartworkflows and free up skilled workers to accomplish higher-valueactivities

Previously IPA techniques were primarily employed within the com-puter vision domain Starting with the automation of simple displayclick-tasks and going towards sophisticated self-driving cars withtheir ability to interpret a stream of situational data obtained fromsensors and cameras However in recent times Natural LanguageProcessing (NLP) became one of the potential applications for IPA aswell due to its ability to understand and interpret human languageNLP methods are used to analyze large amounts of textual data andto respond to various inquiries Furthermore if the available datais unstructured and does not have any predefined format (ie cus-tomer emails) or if it is available in a variable format (ie invoices

1

[ September 4 2020 at 1017 ndash classicthesis ]

2 introduction

legal documents) then the Natural Language Processing (NLP) tech-niques are applied to extract the multiple relevant information Thisdata then can be utilized to solve distinct problems (ie natural lan-guage understanding predictive analytics) (Mohanty and Vyas 2018)For instance in the case of predictive analytics such a bot wouldmake forecasts about the future using current and historical data andtherethrough assist with sales or investments

However NLP in the context of Intelligent Process Automation (IPA)is not limited to the extraction of relevant data and insights from tex-tual documents One of the central applications of NLP within the IPA

domain ndash are conversational interfaces (eg chatbots) that are used toenable human-to-machine interaction This type of software is com-monly used for customer support or within recommendation- andbooking systems Conversational agents gain enormous demand dueto their ability to simultaneously support a large number of userswhile communicating in a natural language In a conventional chat-bot system a user provides input in a natural language the bot thenprocesses this query to extract potentially useful information and re-sponds with a relevant reply This process comprises multiple stagesand involves different types of NLP subtasks starting with NaturalLanguage Understanding (NLU) (eg intent recognition named en-tity extraction) and going towards dialogue management (ie deter-mining the next possible bots action considering the dialogue his-tory) and response generation (eg converting the semantic repre-sentation of next bots action to a natural language utterance) Typicaldialogue system for Intelligent Process Automation (IPA) purposesusually undertakes shallow customer support requests (eg answer-ing of FAQs) allowing human workers to focus on more sophisticatedinquiries

Natural Language Processing (NLP) techniques may also be usedindirectly for IPA purposes There are attempts to automatically iden-tify and classify tasks extracted from textual process descriptions asmanual user or automated The goal of such an NLP applicationis to reduce the effort required to identify suitable candidates for fur-ther robotic process automation (Friedrich Mendling and Puhlmann2011 Leopold Aa and Reijers 2018)

Intelligent Process Automation (IPA) is an emerging technology andevolves mainly due to the recognized potential benefit of combiningRobotic Process Automation (RPA) and Artificial Intelligence (AI) tech-niques to solve industrial problems (Ivancic Vugec and Vukšic 2019Mohanty and Vyas 2018) Industries are typically interested in be-ing responsive to customers and markets (Mohanty and Vyas 2018)Thus most customer support applications need to be real-time activeand highly robust to achieve higher client satisfaction rates The man-ual accomplishment of such tasks is often time-consuming expensiveand error-prone Furthermore due to the massive amounts of textualdata available for analysis it seems not to be feasible to process thisdata entirely by human means

[ September 4 2020 at 1017 ndash classicthesis ]

11 outline 3

11 outline

This thesis covers two topics united by the broader theme which isIntelligent Process Automation (IPA) utilizing statistical Natural Lan-guage Processing (NLP) methods

the first block of the thesis (Chapter 2 ndash Chapter 4) addressesthe challenge of implementing a conversational agent for Intelli-gent Process Automation (IPA) purposes within the e-learningfield As mentioned above chatbots are one of the central ap-plications of the IPA domain since they can effectively performtime-consuming tasks requiring natural language understand-ing allowing human workers to concentrate on more valuabletasks The IPA conversational bot that was realized within thisthesis takes care of routine and time-consuming tasks performedby human tutors within an online mathematical course Thisbot is deployed in a real-world setting and interact with stu-dents within the OMB+ mathematical platform Conducting ex-periments we observed two possibilities to build the conversa-tional agent in industrial settings ndash first with purely rule-basedmethods considering the missing training data and individualaspects of the target domain (ie e-learning) Second we re-implemented two of the main system components (ie Natu-ral Language Understanding (NLU) and Dialogue Manager (DM)units) using the current state-of-the-art deep-learning algorithm(ie Bidirectional Encoder Representations from Transformers(BERT)) and investigated their performance and potential use asa part of a hybrid model (ie containing both rule-based andmachine learning methods)

the second block of the thesis (Chapter 5 ndash Chapter 6) con-siders an Intelligent Process Automation (IPA) problem withinthe predictive analytics domain and addresses the task of scien-tific trend detection Predictive analytics makes forecasts aboutfuture outcomes based on historical and current data Henceorganizations can benefit through the advanced analytics mod-els by detecting trends and their behaviors and then employ-ing this information while making crucial business decisions(ie investments) In this work we deal with the subproblemof trend detection task ndash specifically we address the lack ofpublicly available benchmarks for the evaluation of trend de-tection algorithms Therefore we assembled the benchmark forboth trends and downtrends (ie topics that become less frequentovertime) detection To the best of our knowledge the task ofdowntrend detection has not been well addressed before Fur-thermore the resulting benchmark is based on a collection ofmore than one million documents which is among the largestthat has been used for trend detection before and therefore of-fers a realistic setting for developing trend detection algorithms

[ September 4 2020 at 1017 ndash classicthesis ]

4 introduction

All main chapters contain their specific introduction contribution listrelated work section and summary

[ September 4 2020 at 1017 ndash classicthesis ]

Part I

E - L E A R N I N G C O N V E R S AT I O N A L A S S I S TA N T

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

2F O U N D AT I O N S

In this block we address the challenge of designing and implement-ing a conversational assistant for Intelligent Process Automation (IPA)purposes within the e-learning domain The system aims to supporthuman tutors of the Online Mathematik Bruumlckenkurs (OMB+) plat-form during the tutoring round by undertaking routine and time-consuming responsibilities Specifically the system intelligently auto-mates the process of information accumulation and validation thatprecede every conversation Therethrough the IPA-bot frees up tutorsand allows them to concentrate on more complicated and cognitivelydemanding tasks

In this chapter we introduce the fundamental concepts required bythe topic of the first part of the thesis We begin with the foundationsto conversational assistants in Section 22 where we present the taxon-omy of existing dialogue systems in Section 221 give an overview ofthe conventional dialogue architecture in Section 222 with its mostessential components and describe the existing approaches for build-ing a conversational agent in Section 223 We finally introduce thetarget domain for our experiments in Section 23

21 introduction

Conversational Assistants - a type of software that interacts with peo-ple via written or spoken natural language have become widely pop-ularized in recent times Todayrsquos conversational assistants support abroad range of everyday skills including but not limited to creatinga reminder answering factoid questions and controlling smart homedevices

Initially the notion of artificial companion capable of conversingin a natural language was anticipated by Alan Turing in 1949 (Tur-ing 1949) Still the first significant step in this direction was donein 1966 with the appearance of Weizenbaums system ELIZA (Weizen-baum 1966) Its successor was the ALICE (Artificial LinguisticInternet Computer Entity) ndash a natural language processing dialoguesystem that conversed with a human by applying heuristical patternmatching rules Despite being remarkably popular and technologi-cally advanced for their days both systems were unable to pass theTuring test (Alan 1950) Following a long silent period the nextwave of growth of intelligent agents was caused by the advances inNatural Language Processing (NLP) This was enabled by access tolow-cost memory higher processing speed vast amounts of available

7

[ September 4 2020 at 1017 ndash classicthesis ]

8 foundations

data and sophisticated machine learning algorithms Hence one ofthe most notable leaps in the history of conversational agents beganin the year 2011 with intelligent personal assistants In that time Ap-ples Siri followed by Microsofts Cortana and Amazons Alexa in 2014and then Google Assistant in 2016 came to the scene The main focusof these agents was not to mimic humans as in ELIZA or ALICEbut to assist a human by answering simple factoid questions and set-ting notification tasks across a broad range of content Another typeof conversational agent called Task-oriented Dialogue System (ToDS)became a focus of industries in the year 2016 For the most part thesebots aimed to support customer service (Cui et al 2017) and respondto Frequently Asked Questions (Liao et al 2018) Their conversa-tional style provided more depth than those of personal assistantsbut a narrower range since they were patterned for task solving andnot for open-domain discussions

One of the central advantages of conversational agents and the rea-son why they are attracting considerable interest is their ability togive attention to several users simultaneously while supporting nat-ural language communication Due to this conversational assistantsgain momentum in numerous kinds of practical support applications(ie customer support) where a high number of users are involved atthe same time Classical engagement of human assistants is very cost-efficient and such support is usually not full-time accessible There-fore to automate human assistance is desirable but also an ambi-tious task Due to the lack of solid Natural Language Understand-ing (NLU) advancing beyond primary tasks is still challenging (Lugerand Sellen 2016) and even the most prosperous dialogue systemsoften fail to meet userrsquos expectations (Jain et al 2018) Significantchallenges involved in the design and implementation of a conversa-tional agent varying from domain engineering to Natural LanguageUnderstanding (NLU) and designing of an adaptive domain-specificconversational flow The quintessential problems and how-to ques-tions while building conversational assistants include

bull How to deal with variability and flexibility of language

bull How to get high-quality in-domain data

bull How to build meaningful representations

bull How to integrate commonsense and domain knowledge

bull How to build a robust system

Besides that current Natural Language Processing (NLP) and Ar-tificial Intelligence (AI) techniques have weaknesses when applied todomain-specific task-oriented problems especially if underlying datacomes from an industrial source As machine learning techniquesheavily rely on structured and cleaned training data to deliver consis-tent performance typically noisy real-world data without access toadditional knowledge databases negatively impacts the classificationaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 9

Despite that conversational agents could be successfully employedto solve less cognitive but still highly relevant tasks In this case adialogue agent could be operated as a part of the Intelligent ProcessAutomation (IPA) system where it takes care of straightforward butroutine and time-consuming functions performed in a natural lan-guage while allowing a human worker to focus on more cognitivedemanding processes

22 conversational assistants

Development and design of a particular dialogue system usually varyby its objectives among the most fundamental are

bull Purpose Should it be an open-ended dialogue system (egchit-chat) task-oriented system (eg customer support) or apersonal assistant (eg Google Assistant)

bull Skills What kind of behavior should a system demonstrate

bull Data Which data is required to develop (resp train) specificblocks and how could it be assembled and curated

bull Deployment Will the final system be integrated into a partic-ular software platform What are the challenges in choosingsuitable tools How will the deployment happen

bull Implementation Should it be a rule-based information-retrievalmachine-learning or a hybrid system

In the following chapter we will take a look at the most commontaxonomies of dialogue systems and different design and implemen-tation strategies

221 Taxonomy

According to the goals conversational agents could be classified intotwo main categories task-oriented systems (eg for customer support)and social bots (eg for chit-chat purposes) Furthermore systemsalso vary regarding their implementation characteristics Below weprovide a detailed overview of the most common variations

task-oriented dialogue systems are famous for their abilityto lead a dialogue with the ultimate goal of solving a userrsquos problem(see Figure 1) A customer support agent or a flightrestaurant book-ing system exemplify this class of conversational assistants Suchsystems are more meaningful and useful as those with an entirely so-cial goal however they usually affect more complications during theimplementation stage Since Task-oriented Dialogue System (ToDS) re-quire a precise understanding of the userrsquos intent (ie goal) the Natu-ral Language Understanding (NLU) segment must perform flawlessly

[ September 4 2020 at 1017 ndash classicthesis ]

10 foundations

Besides that if applied in an industrial setting ToDS are usually builtmodular which means they mostly rely on handcrafted rules and pre-defined templates and are restricted in their abilities Personal assis-tants (ie Google Assistant) are members of the task-oriented familyas well Though compared to standard agents they involve more so-cial conversation and could be used for entertainment purposes (egldquoOk Google tell me a jokerdquo) The latter is typically not available inconventional task-oriented systems

social bots and chit-chat systems in contrast regularly donot hold any specific goal and are designed for small-talk and enter-tainment conversations (see Figure 1) Those agents are usually end-to-end trainable and highly data-driven which means they requirelarge amounts of training data However due to entirely learnabletraining such systems often produce inappropriate responses mak-ing them extremely unreliable for practical applications Alike chit-chat systems social bots are rich in social conversation however theypossess an ability to solve uncomplicated user requests and conducta more meaningful dialogue (Fang et al 2018) Those requests arenot the same as in task-oriented systems and are mostly pointed toanswer simple factoid questions

Figure 1 Taxonomy of dialogue systems according to the measure of theirtask accomplishment and social contribution Figure originatesfrom Fang et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 11

The aforementioned system types could be decomposed farther ac-cording to the implementation characteristics

open domain amp closed domain A Task-oriented Dialogue Sys-tem (ToDS) is typically built within a closed domain setting Thespace of potential inputs and outputs is restricted since the sys-tem attempts to achieve a specific goal Customer support orshopping assistants are cases of closed domain problems (Wenet al 2016) These systems do not need to be able to talk aboutoff-site topics but they have to fulfill their particular tasks asefficiently as possible

Whereas chit-chat systems do not assume a well-defined goalor intention and thus could be built within an open domainThis means the conversation can progress in any direction andwithout any or minimal functional purpose (Serban et al 2016)However the infinite number of various topics and the fact thata certain amount of commonsense is required to create reason-able responses makes an open-domain setting a hard problem

content-driven amp user-centric Social bots and chit-chat sys-tems are typically utilizing a content-driven dialogue mannerThat means that their responses are based on the large and dy-namic content derived from the daily web mining and knowl-edge graphs This approach builds a dialogue based on trendycontent from diverse sources and thus it is an ideal way forsocial bots to catch usersrsquo attention and bring a user into thedialogue

User-centric approaches in turn assume precise language un-derstanding to detect the sentiment of usersrsquo statements Ac-cording to the sentiment the dialogue manager learns usersrsquopersonalities handles rapid topic changes and tracks engage-ment In this case the dialogue builds on specific user interests

long amp short conversations Models with a short conversationstyle attempt to create a single response to a single input (egweather forecast) In the case with a long conversation a systemhas to go through multiple turns and to keep track of what hasbeen said before (ie dialogue history) The longer the conver-sation the more challenging it is to train a system Customersupport conversations are typically long conversational threadswith multiple questions

response generation One of the essential elements of a dialoguesystem is a response generation This could be either done in aretrieval-based manner or by using generative models (ie NaturalLanguage Generation (NLG))

The former usually utilizes predefined responses and heuristicsto select a relevant response based on the input and context Theheuristic could be a rule-based expression match (ie RegularExpressions (RegEx)) or an ensemble of machine learning clas-

[ September 4 2020 at 1017 ndash classicthesis ]

12 foundations

sifiers (eg entity extraction prediction of next action) Suchsystems do not generate any original text instead they select aresponse from a fixed set of predefined candidates These mod-els are usually used in an industrial setting where the systemmust show high robustness performance and interpretabilityof the outcomes

The latter in contrast do not rely on predefined replies Sincesuch models are typically based on (deep-) machine learningtechniques they attempt to generate new responses from scratchThis is however a complicated task and is mostly used forvanilla chit-chat systems or in a research domain where thestructured training data is available

222 Conventional Architecture

In this and the following subsection we review the pipeline and meth-ods for Task-oriented Dialogue System (ToDS) Such agents are typi-cally composed of several components (Figure 2) addressing diversetasks required to facilitate a natural language dialogue (WilliamsRaux and Henderson 2016)

Figure 2 Example of a conventional dialogue architecture with its maincomponents in the bounding box Language Understanding (NLU)Dialogue Manager (DM) Response Generation (NLG) Figure origi-nates from NAACL 2018 Tutorial Su et al 2018

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 13

A conventional dialogue architecture implements this with the fol-lowing components

natural language understanding unit has the following ob-jectives it parses the user input classifies the domain (ie cus-tomer support restaurant booking not required for a singledomain systems) and intent (ie find-flight book-taxi) and ex-tracts entities (ie Italian food Berlin)

Generally NLU unit maps every user utterance into semanticframes that are predefined according to different scenarios Ta-ble 1 illustrates an example of sentence representation wherethe departure-location (Berlin) and the arrival-location (Munich)are specified as slot values and the domain (airline travel) andintent (find-flight) are specified as well Typically there are twokinds of possible representations The first one is the utterancecategory represented in the form of the userrsquos intent The sec-ond could be the word-level information extraction in the formof Named Entity Recognition (NER) or Slot Filling tasks

Sentence Find flight from Berlin to Munich today

SlotsConcepts O O O B-dept O B-arr B-date

Named Entity O O O B-city O B-city O

Intent Find_Flight

Domain Airline Travel

Table 1 An illustrative example of a sentence representation1

A dialog act classification sub-module also known as intent clas-sification is dedicated to detecting the primary userrsquos goal atevery dialogue state It labels each userrsquos utterance with oneof the predefined intents Deep neural networks can be directlyapplied to this conventional classification problem (KhanpourGuntakandla and Nielsen 2016 Lee and Dernoncourt 2016)

Slot filling is another sub-module for language understandingUnlike intent detection (ie classification problem) slot fillingis defined as a sequence labeling problem where words in thesequence are labeled with semantic tags The input is a se-quence of words and the output is a sequence of slots onefor every word Variations of Recurrent Neural Network (RNN)architectures (LSTM BiLSTM) (Deoras et al 2015) and (attention-based) encoder-decoder architectures (Kurata et al 2016) havebeen employed for this task

[ September 4 2020 at 1017 ndash classicthesis ]

14 foundations

dialogue manager unit is subdivided into two components ndashfirst is a Dialogue State Tracker (DST) that holds the conver-sational state and second is a Policy Learning (PL) that definesthe systems next action

Dialogue State Tracker (DST) is the core component to guaranteea robust dialogue flow since it manages the userrsquos intent at ev-ery turn The conventional methods which are broadly appliedin commercial implementations often adopt handcrafted rulesto pick the most probable candidate (Goddeau et al 1996 Wangand Lemon 2013) among the predefined Though a variety ofstatistical approaches have emerged since the Dialogue StateTracking Challenge (DSTC)2 was arranged by Jason D Williamsin year 2013 Among the deep-learning based approaches isthe Belief Tracker proposed by Henderson Thomson and Young(2013) It employs a sliding window to produce a sequence ofprobability distributions over an arbitrary number of possiblevalues (Chen et al 2017) In the work of Mrkšic et al (2016) au-thors introduced a Neural Belief Tracker to identify the slot-valuepairs The system used the user intents preceding the user in-put the user utterance itself and a candidate slot-value pairwhich it needs to decide about as the input Then the systemiterated overall candidate slot-value pairs to determine whichones have just been expressed by the user (Chen et al 2017)Finally Vodolaacuten Kadlec and Kleindienst proposed a Hybrid Di-alog State Tracker that achieved the state-of-the-art performancefor DSTC2 shared task Their model used a separate-learned de-coder coupled with a rule-based system

Based on the state representation from the Dialogue State Tracker(DST) the PL unit generates the next system action that is thendecoded by the Natural Language Generation (NLG) moduleUsually a rule-based agent is used to initialize the system (Yanet al 2017) and then supervised learning is applied to theactions generated by these rules Alternatively the dialoguepolicy can be trained end-to-end with reinforcement learningto manage the system making policies toward the final perfor-mance (Cuayaacutehuitl Keizer and Lemon 2015)

natural language generation unit transforms the next actioninto a natural language response Conventional approaches toNatural Language Generation (NLG) typically adopt a template-based style where a set of rules is defined to map every nextaction to a predefined natural language response Such ap-proaches are simple generally error-free and easy to controlhowever their implementation is time-consuming and the styleis repetitive and hardly scalable

Among the deep learning techniques is the work by Wen etal (2015) that introduced neural network-based approaches toNLG with an LSTM-based structure Later Zhou and Huang

2Currently Dialog System Technology Challenge

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 15

(2016) adopted this model along with the question informa-tion semantic slot values and dialogue act types to generatemore precise answers Finally Wu Socher and Xiong (2019)proposed Global to Local Memory Pointer Networks (GLMP) ndash anarchitecture that incorporates knowledge bases directly into alearning framework (due to the Memory Networks component)and is able to re-use information from the user input (due tothe Pointer Networks component)

In particular cases dialogue flow can include speech recognitionand speech synthesis units (if managing a spoken natural languagesystem) and connection to third-party APIs to request informationstored in external databases (eg weather forecast) (as depicted inFigure 2)

223 Existing Approaches

As stated above a conventional task-oriented dialogue system is im-plemented in a pipeline manner That means that once the systemreceives a user query it interprets it and acts according to the dia-logue state and the corresponding policy Additionally based on un-derstanding it may access the knowledge base to find the demandedinformation there Finally a system transforms a systemrsquos next ac-tion (and if given an extracted information) into its surface form as anatural language response (Monostori et al 1996)

Individual components (ie Natural Language Understanding (NLU)Dialogue Manager (DM) Natural Language Generation (NLG)) of aparticular conversational system could be implemented using differ-ent approaches starting with entirely rule- and template-based meth-ods and going towards hybrid approaches (using learnable componentsalong with handcrafted units) and end-to-end trainable machine learn-ing methods

rule-based approaches Though many of the latest researchapproaches handle Natural Language Understanding (NLU) and Nat-ural Language Generation (NLG) units by using statistical NaturalLanguage Processing (NLP) models (Bocklisch et al 2017 Burtsevet al 2018 Honnibal and Montani 2017) most of the industriallydeployed dialogue systems still utilise handcrafted features and rulesfor the state tracking action prediction intent recognition and slotfilling tasks (Chen et al 2017 Ultes et al 2017) So most of thePyDial3 framework modules offer rule-based implementations usingRegular Expressions (RegEx) rule-based dialogue trackers (Hender-son Thomson and Williams 2014) and handcrafted policies Alsoin order to map the next action to text Ultes et al (2017) suggest rulesalong with a template-based response generation

3PyDial Framework httpwwwcamdialorgpydial

[ September 4 2020 at 1017 ndash classicthesis ]

16 foundations

The rule-based approach ensures robustness and stable performancethat is crucial for industrial systems that communicate with a largenumber of users simultaneously However it is highly expensive andtime-consuming to deploy a real dialogue system built in that man-ner The major disadvantage is that the usage of handcrafted systemsis restricted to a specific domain and possible adaptation requiresextensive manual engineering

end-to-end learning approaches Due to the recent advanceof end-to-end neural generative models (Collobert et al 2011) manyefforts have been made to build an end-to-end trainable architecturefor conversational agents Rather than using the traditional pipeline(see Figure 2) the end-to-end model is conceived as a single module(Chen et al 2017)

Wen et al (2016) followed by Bordes Boureau and Weston (2016)were among the pioneers who adapted an end-to-end trainable ap-proach for conversational systems Their approach was to treat dia-logue learning as the problem of learning a mapping from dialoguehistories to system responses and to apply an encoder-decoder model(Sutskever Vinyals and Le 2014) to train the entire system The pro-posed model still had its significant limitations (Glasmachers 2017)such as missing policies to learn a dialogue flow and the inability toincorporate additional knowledge that is especially crucial for train-ing a task-oriented agent To overcome the first problem Dhingraet al (2016) introduced an end-to-end reinforcement learning approachthat jointly trains dialogue state tracker and policy learning unitsThe second weakness was addressed by Eric and Manning (2017)who augmented existing recurrent network architectures with a dif-ferentiable attention-based key-value retrieval mechanism over theentries of a knowledge base This approach was inspired by Key-Value Memory Networks (Miller et al 2016) Later on Wu Socherand Xiong (2019) proposed Global to Local Memory Pointer Networks(GLMP) where the authors incorporated knowledge bases directlyinto a learning framework In their model a global memory encoderand a local memory decoder are employed to share external knowl-edge The encoder encodes dialogue history modifies global con-textual representation and generates a global memory pointer Thedecoder first produces a sketch response with empty slots Next ittransfers the global memory pointer to filter the external knowledgefor relevant information and then instantiates the slots via the localmemory pointers

Despite having better adaptability compared to any rule-based sys-tem and being simpler to train end-to-end approaches remain unattain-able for commercial conversational agents that operating on real-world data A well and carefully constructed task-oriented dialoguesystem in an observed domain using handcrafted rules (in NLU andDST units) and predefined responses for NLG still outperforms the

[ September 4 2020 at 1017 ndash classicthesis ]

22 conversational assistants 17

end-to-end systems due to its robustness (Glasmachers 2017 WuSocher and Xiong 2019)

hybrid approaches Though end-to-end learning is an attrac-tive solution for conversational systems current techniques are data-intensive and require large amounts of dialogues to learn simple be-haviors To overcome this barrier Williams Asadi and Zweig (2017)introduce Hybrid Code Networks (HCNs) which is an ensemble ofretrieval and trainable units System utilizes an Recurrent NeuralNetwork (RNN) for Natural Language Understanding (NLU) blockdomain-specific knowledge that are encoded as software (ie API

calls) and custom action templates used for Natural Language Gen-eration (NLG) unit Compared to existing end-to-end methods theauthors report that their approach considerably reduces the amountof data required for training (Williams Asadi and Zweig 2017) Ac-cording to the authors HCNs achieve state-of-the-art performance onthe bAbI dialog dataset (Bordes Boureau and Weston 2016) and out-perform two commercial customer dialogue systems4

Along with this work Wen et al (2016) propose a neural network-based model that is end-to-end trainable but still modularly con-nected In their work authors treat dialogue as a sequence to se-quence mapping problem (Sutskever Vinyals and Le 2014) aug-mented with the dialogue history and the current database searchoutcome (Wen et al 2016) The authors explain the learning processas follows At every turn the system takes a user input representedas a sequence of tokens and converts it into two internal represen-tations a distributed representation of the intent and a probabilitydistribution over slot-value pairs called the belief state (Young et al2013) The database operator then picks the most probable values inthe belief state to form the search result At the same time the intentrepresentation and the belief state are transformed and combined bya Policy Learning (PL) unit to form a single vector representing thenext system action This system action is then used to generate asketch response token by the token The final system response iscomposed by substituting the actual values of the database entriesinto the sketch sentence structure (Wen et al 2016)

Hybrid models appear to replace the established rule- and template-based approaches currently utilized in an industrial setting The per-formance of the most Natural Language Understanding (NLU) com-ponents (ie Named Entity Recognition (NER) domain and intentclassification) is reasonably high to apply them for the developmentof commercial conversational agents Whereas the Natural LanguageGeneration (NLG) unit could still be realized as a template-based so-lution by employing predefined response candidates and predictingthe next system action employing deep learning systems

4Internal Microsoft Research datasets in customer support and troubleshootingdomains were employed for training and evaluation

[ September 4 2020 at 1017 ndash classicthesis ]

18 foundations

23 target domain amp task definition

MUMIE5 is an open-source e-learning platform for studying and teach-ing mathematics and computer science Online Mathematik Bruumlck-enkurs (OMB+) is one of the projects powered by MUMIE SpecificallyOnline Mathematik Bruumlckenkurs (OMB+)6 is a German online learningplatform that assists students who are preparing for an engineeringor computer science study at a university

The coursersquos central purpose is to assist students in reviving theirmathematical skills so that they can follow the upcoming universitycourses This online course is thematically segmented into 13 partsand includes free mathematical classes with theoretical and practi-cal content Every chapter begins with an overview of its sectionsand consists of explanatory texts with built-in videos examples andquick checks of understanding Furthermore various examinationmodes to practice the content (ie training exercises) and to checkthe understanding of the theory (ie quiz) are available Besidesthat OMB+ provides a possibility to get assistance from a human tu-tor Usually the students and tutors interact via a chat interface inwritten form The language of communication is German

The current dilemma of the OMB+ platform is that the number ofstudents grows significantly every year but it is challenging to findqualified tutors ndash and it is expensive to hire more human tutors Thisresults in a longer waiting period for students until their problemscan be considered and processed In general all student questionscan be grouped into three main categories Organizational questions(ie course certificate) contextual questions (ie content theorem) andmathematical questions (exercises solutions) To assist a student witha mathematical question a tutor has to know the following regularinformation What kind of topic (or subtopic) a student has a problemwith At which examination mode (quiz chapter level training or exer-cise section level training or exercise or final examination) a studentis working right now And finally the exact question number and exactproblem formulation This means that a tutor has to ask the same on-boarding questions every time a new conversation starts This is how-ever very time-consuming and could be successfully solved within anIntelligent Process Automation (IPA) system

The work presented in Chapter 3 and Chapter 4 was enabled by thecollaboration with MUMIE and Online Mathematik Bruumlckenkurs (OMB+)team and addresses the abovementioned problem Within Chapter 3we implemented an Intelligent Process Automation (IPA) dialogue sys-tem with rule- and retrieval-based algorithms that interacts with stu-dents and performs the onboarding This system is multipurposeon the one hand it eases the assistance process for human tutors byreducing repetitive and time-consuming activities and therefore al-lows workers to focus on more challenging tasks Second interacting

5Mumie httpswwwmumienet6httpswwwombplusde

[ September 4 2020 at 1017 ndash classicthesis ]

24 summary 19

with students it augments the resources with structured and labeledtraining data The latter in turn allowed us to re-train specific sys-tem components utilizing machine and deep learning methods Wedescribe this procedure in Chapter 4

We report that the earlier collected human-to-human data from theOMB+ platform could not be used for training a machine learning sys-tem since it is highly unstructured and contains no direct question-answer matching which is crucial for trainable conversational algo-rithms We analyzed these conversations to get insight from themthat we used to define the dialogue flow

We also do not consider the automatization of mathematical ques-tions in this work since this task would require not only preciselanguage understanding and extensive commonsense knowledge butalso the accurate perception of mathematical notations The tutor-ing process at OMB+ differentiates a lot from the standard tutoringmethods Here instructors attempt to navigate students by givinghints and tips instead of providing an out-of-the-box solution Exist-ing human-to-human dialogues revealed that such kind of task couldbe even challenging for human tutors since it is not always directlyperceptible what precisely the student does not understand or hasdifficulties with Furthermore dialogues containing mathematical ex-planations could last for a long time and the topic (ie intent andfinal goal) can shift multiple times during the conversation

To our knowledge this task would not be feasible to sufficientlysolve with current machine learning algorithms especially consider-ing the missing training data A rule-based system would not bethe right solution for this purpose as well due to a large number ofcomplex rules which would need to be defined Thus in this workwe focus on the automatization of less cognitively demanding taskswhich are still relevant and must be solved in the first instance to easethe tutoring process for instructors

24 summary

Conversational agents are a highly demanding solution for indus-tries due to their potential ability to support a large number of usersmulti-threaded while performing a flawless natural language conver-sation Many messenger platforms are created and custom chatbotsin various constellations emerge daily Unfortunately most of thecurrent conversational agents are often disappointing to use due tothe lack of substantial natural language understanding and reason-ing Furthermore Natural Language Processing (NLP) and ArtificialIntelligence (AI) techniques have limitations when applied to domain-specific and task-oriented problems especially if no structured train-ing data is available

Despite that conversational agents could be successfully applied tosolve less cognitive but still highly relevant tasks ndash that is within the

[ September 4 2020 at 1017 ndash classicthesis ]

20 foundations

Intelligent Process Automation (IPA) domain Such IPA-bots wouldundertake tedious and time-consuming responsibilities to free up theknowledge worker for more complex and intricate tasks IPA-botscould be seen as a member of goal-oriented dialogue family and arefor the most part performed either in a rule-based manner or throughhybrid models when it comes to a real-world setting

[ September 4 2020 at 1017 ndash classicthesis ]

3M U LT I P U R P O S E I PA V I A C O N V E R S AT I O N A LA S S I S TA N T

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research describedin this chapter was carried out in its entirety by the au-thor of this thesis The other authors of the publicationacted as advisers At the time of writing this thesis thesystem described in this work was deployed at the OnlineMathematik Bruumlckenkurs (OMB+) platform

As we outlined in the introduction Intelligent Process Automa-tion (IPA) is an emerging technology with the primary goal of assist-ing the knowledge worker by taking care of repetitive routine andlow-cognitive tasks Conversational assistants that interact with usersin a natural language are a potential application for the IPA domainSuch smart virtual agents can support the user by responding to var-ious inquiries and performing routine tasks carried out in a naturallanguage (ie customer support)

In this work we tackle the challenge of implementing an IPA con-versational agent in a real-world industrial context and conditions ofmissing training data Our system has two meaningful benefits Firstit decreases monotonous and time-consuming tasks and hence letsworkers concentrate on more intellectual processes Second interact-ing with users it augments the resources with structured and labeledtraining data The latter in turn can be used to retrain a systemutilizing machine and deep learning methods

31 introduction

Conversational dialogue systems can give attention to several userssimultaneously while supporting natural communication They arethus exceptionally needed for practical assistance applications wherea high number of users are involved at the same time Regularly di-alogue agents require a lot of natural language understanding andcommonsense knowledge to hold a conversation reasonably and in-telligently Therefore nowadays it is still challenging to substitutehuman assistance with an intelligent agent completely However adialogue system could be successfully employed as a part of the In-telligent Process Automation (IPA) application In this case it wouldtake care of simple but routine and time-consuming tasks performed

21

[ September 4 2020 at 1017 ndash classicthesis ]

22 multipurpose ipa via conversational assistant

in a natural language while allowing a human worker to focus onmore cognitive demanding processes

Recent research in the dialogue generation domain is conducted byemploying Artificial Intelligence (AI) techniques like machine- anddeep learning (Lowe et al 2017 Wen et al 2016) However thosemethods require high-quality data for training that is often not di-rectly available for domain-specific and industrial problems Eventhough companies gather and store vast volumes of data it is of-ten low-quality with regards to machine learning approaches (Daniel2015) Especially if it concerns dialogue data which has to be prop-erly structured as well annotated Whereas if trained on artificiallycreated datasets (which is often the case in the research setting) thesystems can solve only specific problems and are hardly adaptableto other domains Hence despite the popularity of deep learningend-to-end models one still needs to rely on traditional pipelines inpractical dialogue engineering mainly while introducing a new do-main

311 Outline and Contributions

This work addresses the challenge of implementing a dialogue systemfor IPA purposes within the practical e-learning domain under condi-tions of missing training data The chapter is structured as followsIn Section 32 we introduce our method and describe the design ofthe rule-based dialogue architecture with its main components suchas Natural Language Understanding (NLU) Dialogue Manager (DM)and Natural Language Generation (NLG) units In Section 33 wepresent the results of our dual-focused evaluation and Section 34 de-scribes the format of collected structured data Finally we concludein Section 35

Our contributions within this work are as follows

bull We implemented a robust conversational system for IPA pur-poses within a practical e-learning domain and under the con-ditions of missing training (ie dialogue) data

bull The system has two main objectives

ndash First it reduces repetitive and time-consuming activitiesand allows workers of the e-learning platform to focussolely on mathematical inquiries

ndash Second by interacting with users it augments the resourceswith structured and partially labeled training data for fur-ther implementation of trainable dialogue components

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 23

32 model

The central purpose of the introduced system is to interact with stu-dents at the beginning of every chat conversation and gather informa-tion on the topic (and sub-topic) examination mode and level questionnumber and exact problem formulation Such onboarding process savestime for human tutors and allows them to handle mathematical ques-tions solely Furthermore the system is realized in a way such that itaccumulates labeled and structured dialogues in the background

Figure 4 displays the intact conversation flow After the systemaccepts user input it analyzes it at different levels and extracts infor-mation if such is provided If some of the information required fortutoring is missing the system asks the student to provide it Whenall the information points are collected they are automatically vali-dated and forwarded to a human tutor The latter can then directlyproceed with the assistance process In the following we explain thekey components of the system

321 OMB+ Design

Figure 3 illustrates the internal structure and design of the OMB+ plat-form It has topics and sub-topics as well as four various examinationmodes Each topic (Figure 3 tag 1) correspond to a chapter level and al-ways has sub-topics (Figure 3 tag 2) that correspond to a section levelExamination modes training and exercise are ambiguous because theycorrespond to either a chapter (Figure 3 tag 3) or a section (Figure 3tag 5) level and it is important to differentiate between them sincethey contain different types of content The mode final examination(Figure 3 tag 4) always corresponds to a chapter level whereas quiz(Figure 3 tag 5) can belong only to a section level According to thedesign of the OMB+ platform there are several ways of how a possibledialogue flow can proceed

322 Preprocessing

In a natural language conversation one may respond in various formsmaking the extraction of data from user-generated text challengingdue to potential misspellings or confusable spellings (ie Aufgabe 1aAufgabe 1 (a)) Hence to facilitate a substantial recognition of intentsand extraction of entities we normalize (eg correct misspellingsdetect synonyms) and preprocess every user input before moving for-ward to the Natural Language Understanding (NLU) module

[ September 4 2020 at 1017 ndash classicthesis ]

24 multipurpose ipa via conversational assistant

Figure 3 OMB+ Online Learning Platform where 1 is the Topic (corre-sponds to a chapter level) 2 is a Sub-Topic (corresponds to a sectionlevel) 3 is chapter level examination mode 4 is the Final Exam-ination Mode (available only for chapter level) 5 are the Examina-tion Modes Exercise Training (available at section levels) and Quiz(available only at section level) and 6 is the OMB+ Chat

We perform both linguistic and domain-specific preprocessing whichinclude the following steps

bull lowercasing and stemming of words in the input query

bull removal of stop words and punctuation

bull all mentions of X in mathematical formulations are removed toavoid confusion with roman number 10 (ldquoXrdquo)

bull in a combination of the type word ldquoKapitelAufgaberdquo + digitwritten as a word (ie ldquoeinsrdquo ldquozweitesrdquo etc) word is replacedwith a digit (ldquoKapitel einsrdquo rarr ldquoKapitel 1rdquo ldquoim fuumlnften Kapitelrdquo rarrldquoim Kapitel 5rdquo) roman numbers are replaced with a digit as well(ldquoKapitel IVrdquorarr ldquoKapitel 4rdquo)

bull detected ambiguities are normalized (eg ldquoTrainingsaufgaberdquorarrldquoTrainingrdquo)

bull recognized misspellings resp type errors are corrected (egldquoDifeernzialrechnungrdquorarr ldquoDifferentialrechnungrdquo)

bull permalinks are parsed and analyzed From each permalink it ispossible to extract topic examination mode and question num-ber

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 25

323 Natural Language Understanding

We implement the Natural Language Understanding (NLU) unit uti-lizing handcrafted rules Regular Expressions (RegEx) and Elasctic-search1 API

intent classification According to the design of the OMB+platform all student questions can be classified into three cate-gories organizational questions (ie course certificate) contextualquestions (ie content theorem) and mathematical questions (exer-cises solutions) To classify the input query by its intent (ie cat-egory) we employ weighted key-word information in the formof handcrafted rules We assume that specific words are explic-itly associated with a corresponding intent (eg theorem or rootdenote the mathematical question feedback or certificate denoteorganizational inquire) If no intent could be ordered then itis assumed that the NLU unit was not capable of understand-ing and the intent is unknown In this case the virtual agentrequests the user to provide an intent manually by picking onefrom the mentioned three options The queries from organiza-tional and theoretical categories are directly handed over to ahuman tutor while mathematical questions are analyzed by theautomated agent for further information extraction

entity extraction On the next step the system retrieves the en-tities from a user message We specify five following prerequi-site entities which have to be collected before the dialogue canbe handed over to a human tutor topic sub-topic examina-tion mode and level and question number This part is imple-mented using ElasticSearch (ES) and RegEx To facilitate the useof ES we indexed the OMB+ web-page to an internal databaseBesides indexing the titles of topics and sub-topics we also pro-vided supplementary information on possible synonyms andwriting styles We additionally filed OMB+ permalinks whichdirect to the site pages To query the resulting database we em-ploy the internal Elasticsearch multi match function and set theminimum should match parameter to 20 This parameter spec-ifies the number of terms that must match for a document tobe considered relevant Besides that we adopted fuzziness withthe maximum edit distance set to 2 characters The fuzzy queryuses similarity based on Levenshtein edit distance (Levenshtein1966) Finally the system produces a ranked list of potentialmatching entries found in the database within the predefinedrelevance threshold (we set it to θ=15) We then pick the mostprobable entry as the right one and select the correspondingentity from the user input

Overall the Natural Language Understanding (NLU) module re-ceives the user input as a preprocessed text and examines it across all

1httpswwwelasticcoproductselasticsearch

[ September 4 2020 at 1017 ndash classicthesis ]

26 multipurpose ipa via conversational assistant

predefined RegEx statements and for a match in the ElasticSearch (ES)database We use ES to retrieve entities for the topic sub-topic andexamination mode whereas RegEx is used to extract the informationon a question number Every time the entity is extracted it is filled inthe Informational Dictionary (ID) The ID has the following six slotsto be filled in topic sub-topic examination level examination modequestion number and exact problem formulation (this information isrequested on further steps)

324 Dialogue Manager

A conventional Dialogue Manager (DM) consists of two interactingcomponents The first component is the Dialogue State Tracker (DST)that maintains a representation of the current conversational stateand the second is the Policy Learning (PL) that defines the next systemaction (ie response)

In our model every agentrsquos next action is determined by the stateof the previously obtained information accumulated in the Informa-tional Dictionary (ID) For instance if the system recognizes that thestudent works on the final examination it also knows (defined by hand-coded logic in the predefined rules) that there is no necessity to askfor sub-topic because the final examination always corresponds to achapter level (due to the layout of OMB+ platform) If the system rec-ognizes that the user struggles in solving quiz it has to ask for boththe corresponding topic and sub-topic because the quiz always relatesto a section level

To discover all of the potential conversational flows we implementMutually Exclusive Rules (MER) which indicate that two events e1and e2 are mutually exclusive or disjoint since they cannot both oc-cur at the same time (ie the intersection of these events is emptyP(A cap B) = 0) Additionally we defined transition and mapping rulesHence only particular entities can co-occur while others must beexcluded from the specific rule combination Assume a topic t =

Geometry According to MER this topic can co-occur with one of thepertaining sub-topics defined at the OMB+ (ie s = Angel) Thus thelevel of the examination would be l = Section (but l 6= Chapter)because the student already reported a sub-section he or she is work-ing on In this specific case the examination mode could either bee isin [TrainingExerciseQuiz] but not e = Final_ExaminationThis is because e = Final_Examination can co-occur only with atopic (chapter-level) but not with a sub-topic (section-level) The ques-tion number q could be arbitrary and has to co-occur with every othercombination

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 27

This allows the formal solution to be found Assume a list of alltheoretically possible dialogue states S = [topic sub-topic trainingexercise chapter level section level quiz final examination questionnumber] and for each element sn in S is true that

sn =

1 if information is given

0 otherwise(1)

This gives all general (resp possible) dialogue states without ref-erence to the design of the OMB+ platform However to make thedialogue states fully suitable for the OMB+ from the general stateswe select only those which are valid To define the validness of thestate we specify the following five MERrsquos

R1 Topic

not T T

Table 2 Rule 1 ndash Admissible topic configurations

Rule (R1) in Table 2 denotes admissible configurations for topic andmeans that the topic could be either given (T ) or not (notT )

R2 Examination Mode

not TR not E not Q not FE

TR not E not Q not FE

not TR E not Q not FE

not TR not E Q not FE

not TR not E not Q FE

Table 3 Rule 2 ndash Admissible examination mode configurations

Rule (R2) in Table 3 denotes that either no information on the ex-amination mode is given or examination mode is Training (TR) orExercise (E) or Quiz (Q) or Final Examination (FE) but not more thanone mode at the same time

R3 Level

not CL not SL

CL not SL

not CL SL

Table 4 Rule 3 ndash Admissible level configurations

Rule (R3) in Table 4 indicates that either no level information isprovided or the level corresponds to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

[ September 4 2020 at 1017 ndash classicthesis ]

28 multipurpose ipa via conversational assistant

R4 Examination amp Level

not TR not E not SL not CL

TR not E SL not CL

TR not E not SL CL

TR not E not SL not CL

not TR E SL not CL

not TR E not SL CL

not TR E not SL not CL

Table 5 Rule 4 ndash Admissible examination mode (only for Training and Exer-cise) and corresponding level configurations

Rule (R4) in Table 5 means that Training (TR) and Exercise (E) ex-amination modes can either belong to chapter level (CL) or to sectionlevel (SL) but not to both at the same time

R5 Topic amp Sub-Topic

not T not ST

T not ST

not T ST

T ST

Table 6 Rule 5 ndash Admissible topic and corresponding sub-topic configura-tions

Rule (R5) in Table 6 symbolizes that we could be either given onlya topic (T ) or the combination of topic and sub-topic (ST ) at the sametime or only sub-topic or no information on this point at all

We then define a valid dialogue state as a dialogue state that meetsall requirements of the abovementioned rules

foralls(State(s)and R1minus5(s))rarr Valid(s) (2)

Once we have the valid states for dialogues we perform a mappingfrom each valid dialogue state to the next possible systems action Forthat we first define five transition rules The order of transition rulesis important

T1 = notT (3)

means that no topic (T ) is found in the ID (ie could not be ex-tracted from user input)

T2 = notEM (4)

indicates that no examination mode (EM) is found in the ID

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 29

T3 = EM where EM isin [TRE] (5)

denotes that the extracted examination mode (EM) is either Train-ing (TR) or Exercise (E)

T4 =

T notST TR SL

T notST E SL

T notST Q

(6)

means that no sub-topic (ST ) is provided by a user but ID either al-ready contains the combination of topic (T ) training (TR) and sectionlevel (SL) or the combination of topic exercise (E) and section levelor the combination of topic and quiz (Q)

T5 = notQNR (7)

indicates that no question number (QNR) was provided by a stu-dent (or could not be successfully extracted)

Finally we assumed the list of possible next actions for the system

A =

Ask for Topic

Ask for Examination Mode

Ask for Level

Ask for Sub-Topic

Ask for Question Nr

(8)

Following the transition rules we map each valid dialogue state tothe possible next action am in A

existaforalls(Valid(s)and T1(s))rarr a(s T) (9)

in the case where we do not have any topic provided the nextaction is to ask for the topic (T )

existaforalls(Valid(s)and T2(s))rarr a(s EM) (10)

if no examination mode is provided by a user (or it could not besuccessfully extracted from the user query) the next action is definedas ask for examination mode (EM)

existaforalls(Valid(s)and T3(s))rarr a(s L) (11)

in case where we know the examination mode EM isin [TrainingExercise] we have to ask about the level (ie training at chapter levelor training at section level) thus the next action is ask for level (L)

[ September 4 2020 at 1017 ndash classicthesis ]

30 multipurpose ipa via conversational assistant

existaforalls(Valid(s)and T4(s))rarr a(s ST) (12)

if no sub-topic is provided but the examination mode EM isin [Train-ing Exercise] at section level the next action is defined as ask for sub-topic (ST )

existaforalls(Valid(s)and T5(s))rarr a(s QNR) (13)

if no question number is provided by a user then the next action isask for question number (QNR)

Following this scheme we generate 56 state transitions which spec-ify the next system actions Being on a new conversation state wecompare the extracted (or updated) data in the Informational Dictio-nary (ID) with the valid dialogue states and select the mapped actionas the next systemrsquos action

flexibility of dst The DST transitions outlined above are meantto maintain the current configuration of the OMB+ learning platformStill further MERs could be effortlessly generated to create new tran-sitions Examining this we performed tests with the previous de-sign of the platform where all the topics except for the first onehad sub-topics Whereas the first topic contained both sub-topics andsub-sub-topics We could produce the missing transitions with the de-scribed approach The number of permissible transitions in this caseincreased from 56 to 117

325 Meta Policy

Since the proposed system is expected to operate in a real-world set-ting we had to develop additional policies that manage the dialogueflow and verify the systemrsquos accuracy We explain these policies be-low

completeness of informational dictionary The model au-tomatically verifies the completeness of the Informational Dic-tionary (ID) which is determined by the number of essentialslots filled in the informational dictionary There are 6 discretecases when the ID is deemed to be complete For example ifa user works on a final examination mode the agent does nothas to request a sub-topic or examination level Hence the ID

has to be filled only with data for a topic examination typeand question number Whereas if the user works on trainingmode the system has to collect information about the topic sub-topic examination level examination mode and the questionnumber Once the ID is intact it is provided to the verificationstep Otherwise the agent proceeds according to the next ac-tion The system extracts the information in each dialogue stateand therefore if the user gives updated information on any sub-

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 31

ject later in the dialogue history the corresponding slot will beupdated in the ID

Below are examples of two final cases (out of six) where Infor-mational Dictionary (ID) is considered to be complete

Case1 =

intent = Math

topic 6= None

exam mode = Final Examination

level = None

question nr 6= None

(14)

Table 7 Case 1 ndash Any topic examination mode is final examination exami-nation level does not matter any question number

Case2 =

intent = Math

topic 6= None

sub-topic 6= None

exam mode isin [Training Exercise]

level = Section

question nr 6= None

(15)

Table 8 Case 2 Any topic any related sub-topic examination mode is ei-ther training or exercise examination level is section any questionnumber

verification step After the system has collected all the essentialdata (ie ID is complete) it continues to the final verification stepHere the obtained data is presented to the current student inthe dialogue session The student is asked to review the exactnessof the data and if any entries are faulty to correct them TheInformational Dictionary (ID) is where necessary updated withthe user-provided data This procedure repeats until the userconfirms the correctness of the compiled data

fallback policy In the cases where the system fails to retrieveinformation from a studentrsquos query it re-asks a user and at-tempts to retrieve information from a follow-up response Themaximum number of re-ask trials is a parameter and set to threetimes (r = 3) If the system is still incapable of extracting in-formation the user input is considered as the ground truth andfilled to the appropriate slot in ID An exception to this ruleapplies where the user has to specify the intent manually Afterthree unclassified trials a dialogues session is directly handedover to a human tutor

[ September 4 2020 at 1017 ndash classicthesis ]

32 multipurpose ipa via conversational assistant

human request In every dialogue state a user can shift to a hu-man tutor For this a user can enter the ldquohumanrdquo (or ldquoMenschrdquoin German language) key-word Therefore every user messageis investigated for the presence of this key-word

326 Response Generation

The semantic representation of the systems next action is transformedinto a natural language utterance Hence each possible action ismapped to precisely one response that is predefined in the templateSome of the responses are fixed (ie ldquoWelches Kapitel bearbeitest dugeraderdquo) 2 Others have placeholders for custom values In the lattercase the utterance can be formulated dependent on the InformationalDictionary (ID)

Assume predefined response ldquoDeine Angaben waren wie folgt a)Kapitel K b) Abschnitt A c) Aufgabentyp M d) Aufgabe Q e)Ebene E Habe ich dich richtig verstanden Bitte antworte mit Ja oderNeinrdquo3 The slots [KAMQE] are the placeholders for the informa-tion stored in the ID During the conversation the system selects thetemplate according to the next action and fills the slots with extractedvalues if required After the extracted information is filled in the cor-responding placeholders in the utterance the final response could beas follows ldquoDeine Angaben waren wie folgt a) Kapitel I ElementaresRechnen b) Abschnitt Rechenregel fuumlr Potenzen c) Aufgabentyp Quiz d)Aufgabe 1(a) e) Ebene Abschnitt Habe ich dich richtig verstanden Bitteantworte mit Ja oder Neinrdquo4

Below we demonstrate five sample dialogues in original Germanlanguage with variable flows

bull short dialogue where certain information is already providedin the first studentrsquos input

bull organizational question case

bull contextual question case

bull example with the manual intent classification

bull example with fallback and human policies

bull long flow with entry correction on the final verification phase

2Translates to ldquoWhich chapter are you currently working onrdquo3Translates to ldquoYour details were as follows a) Chapter b) Section c) Examina-

tion Mode d) Question Number e) Level Did I understand you correctly Pleaseanswer yes or no

4Translates to ldquoYour details were as follows a) Elementary Calculus b) PowerLaws c) Quiz d) Question 1 (a) e) Section Level Did I understand you correctlyPlease answer yes or no

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 33

Student Hallo ich habe Schwierigkeiten beim Kapitel I Elementares Rech-nen mit der Uumlbungsaufgabe 1 a

Next-Action [Ask for a Level Chapter or Section]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel ldquoGeometrierdquo) oder um eine Aufgabe auf der Ebene einesAbschnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt ldquoWinkelrdquoim Kapitel ldquoGeometrierdquo) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlrAbschnitt

Student Kapitel

Next-Action [Final-Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Aufgabentyp Exercisec) Aufgabe 1ad) Ebene KapitelHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yep

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 9 Showcase 1 Short flow most of the information was extracted fromthe initial user input In bold are the extracted entities

Student Hallo ich habe eine Frage bzgl eines Zertifikates und zwar wiekann ich es austellen lassen

Next-Action [Organisational Question]Bot Es scheint sich wohl um eine organisatorische Frage zu handeln Bittefasse deine Frage in einem kurzen Text zusammen damit ich sie an meinenmenschlichen Kollegen weiterleiten kann

Student Wie bekomme ich ein Zertifikat

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 10 Showcase 2 Organisational Question Underlined are the key-words which point on the organisational intent

[ September 4 2020 at 1017 ndash classicthesis ]

34 multipurpose ipa via conversational assistant

Student hallo

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student ma

Next-Action [UNK]Bot Hast du eine Frage zu einer Aufgabe (MATH) zu einem Text im Kurs(TEXT) oder eine organisatorische Frage (ORG)

Student math

Next-Action [Ask for Topic]Bot Welches Kapitel bearbeitest du gerade

Student Lineare Gleichungssysteme

Next-Action [Ask for Examination Mode]Bot Bearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfung

Student human

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 11 Showcase 3 Manual intent classification Fallback and Human Re-quest policies (in bold type)

Student Halli-Hallo koumlnnte mir bitte jemand die Regel 72 (vertikale streck-ung und stauchung des graphen) erklaumlren Danke im Voraus

Next-Action [Contextual Question]Bot Es scheint sich wohl um eine inhaltliche Frage zu handeln Bitte fassedeine Frage in einem kurzen Text zusammen damit ich sie an meinen men-schlichen Kollegen weiterleiten kann

Student Ich habe Schwierigkeiten beim Verstehen der Regel 72 Was istgenau die vertikale Streckung des Graphen

Next-Action [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 12 Showcase 4 Contextual Question Underlined are the key-wordswhich point on the contextual intent

[ September 4 2020 at 1017 ndash classicthesis ]

32 model 35

Student Hallo ich kann beim Kapitel 1 nicht weiterkommen Trainingsauf-gabe 1 (a)

Next-Action [Ask for Level]Bot Geht es um eine Aufgabe auf Kapitel-Ebene (zB um eine Trainingsauf-gabe im Kapitel Geometrie) oder um eine Aufgabe auf der Ebene eines Ab-schnitts in einem Kapitel (zB um eine Aufgabe zum Abschnitt Winkel imKapitel Geometrie) Antworte bitte mit KAP fuumlr Kapitel und SEK fuumlr Ab-schnitt

Student ich glaube es ist eine Sektion

Next-Action [Ask for Sub-Topic]Bot Wie heiszligt der Abschnitt in dem sich die Aufgabe befindet auf die sichdeine Frage bezieht

Student ich bearbeite rechenregeln und potenzen

Next-Action [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Quizd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student nope

Next-Action [Ask_For]Bot Welche der folgenden Punkten habe ich falsch erkannt Schreibe bittewelche der folgenden Punkte es sind a b c d e

Student c

State-Transition [Correct-Fields]Bot Gib bitte die korrekte Information fuumlr c) ein

Student Training

State-Transition [Final Statement]Bot Deine Angaben waren wie folgta) Kapitel I Elementares Rechnenb) Abschnitt Rechenregel fuumlr Potenzenc) Aufgabentyp Trainingd) Aufgabe 1 (a)e) Ebene AbschnittHabe ich dich richtig verstanden Bitte antworte mit Ja oder Nein

Student yes

State-Transition [Human Hand-Over]Bot Vielen Dank unsere menschlichen Kollegen melden sich gleich bei dir

Table 13 Showcase 5 Long Flow Correction of entries In bold type are theextracted entities

[ September 4 2020 at 1017 ndash classicthesis ]

36 multipurpose ipa via conversational assistant

Figure 4 Dialogue Flow Abbreviations UNK - unknown ID - informa-tional dictionary RegEx - regular expressions

[ September 4 2020 at 1017 ndash classicthesis ]

33 evaluation 37

33 evaluation

In order to get the feedback on the quality functionality and use-fulness of the proposed model we evaluated it in two steps firstwith an automated method (Section 331) utilizing 130 manually an-notated conversations to verify the robustness of the system Secondwith the help of human tutors (Section 332) from OMB+ to exam-ine the user experience We describe the details as well as the mostcommon errors below

331 Automated Evaluation

To manage an automated evaluation we manually assembled a datasetwith 130 conversational flows We predefined viable initial user ques-tions user answers as well as gold system responses These flowscover most frequent dialogues that we previously observed in human-to-human dialogue data We examined our system by implementinga self-chatting evaluation bot The evaluation cycle is as follows

bull The system accepts the first question from a predefined dia-logue via an API request preprocesses and analyzes it to de-termine the intent and extract entities

bull Then it estimates the appropriate next action and responds ac-cordingly

bull This response is then compared to the systems gold answerIf the predicted result is correct then the system receives thenext predefined user input from the templated dialogue and re-sponds again as defined above This procedure proceeds untilthe dialogue terminates (ie ID is complete) Otherwise thesystem reports an unsuccessful case number

The system successfully passed this evaluation for all 130 cases

332 Human Evaluation amp Error Analysis

To evaluate the system on irregular cases we carried out experimentswith human tutors from the OMB+ platform The tutors are experi-enced regarding rare or complex issues ambiguous responses mis-spellings and other infrequent but still relevant problems which oc-cur during a natural language dialogue In the following we reviewsome common errors and describe additional observations

misspellings and confusable spelling frequently befall in theuser-generated text Since we attempt to let the conversationremain natural to provide better user experience we do notinstruct students for formal writing Therefore we have to ac-count for various writing issues One of the prevalent problems

[ September 4 2020 at 1017 ndash classicthesis ]

38 multipurpose ipa via conversational assistant

are misspellings German words are commonly long and canbe complex and because users often type quickly it can lead tothe wrong order of characters within a given word To tacklethis challenge we utilized fuzzy match within ES Though themaximum allowed edit distance in ElasticSearch (ES) is set to 2characters which means that all the misspellings beyond thisthreshold could not be correctly identified by ES (eg Differenzial-rechnung vs Differnetialrechnung) Another representative exam-ple would be the formulation of the section or question num-ber The equivalent information can be communicated in sev-eral discrete ways which has to be considered by RegEx unit(eg Aufgabe 5 a Aufgabe V a Aufgabe 5 (a)) A similar prob-lem occurs with confusable spelling (ie Differenzialrechnung vsDifferenzialgleichung)

We analyzed the cases stated above and improved our modelby adding some of the most common misspellings to the Elas-ticSearch (ES) database or handling them with RegEx during thepreprocessing step

elasticsearch threshold We observed both cases where thesystem failed to extract information although the user providedit and examples where ES extracted information not mentionedin a user query According to our understanding these errorsoccur due to the relevancy scoring algorithm of ElasticSearch (ES)where a documentrsquos score is a combination of textual similarityand other metadata based scores Our examination revealedthat ES mostly fails to extract the information if the user mes-sage is rather short (eg about 5 tokens) To overcome this diffi-culty we coupled the current input ut with the dialogue history(h = utmiddotmiddotmiddot tminus2) That eliminated the problem and improved theretrieval quality

To solve the opposite problem where ES extracts false infor-mation (or information that was not mentioned in a query)was more challenging We learned that the problem comesfrom short words or sub-words (ie suffix prefix) which ES

locates in a database and considers them credible enough TheES documentation suggests getting rid of stop words to elim-inate this behavior However this did not improve the searchAlso fine-tuning of ES parameters such as the relevance thresholdprefix length5 and minimum should match6 parameter did not re-sult in notable improvements To cope with this problem weimplemented a verification step where a user can edit the erro-neously retrieved data

5The amount of fuzzified characters This parameter helps to decrease the num-ber of terms which must be reviewed

6Indicates a number of terms that must match for a document to be consideredrelevant

[ September 4 2020 at 1017 ndash classicthesis ]

34 structured dialogue acquisition 39

34 structured dialogue acquisition

As we stated the implemented system not only supports the humantutor by assisting students but also collects structured and labeledtraining data in the background In a trial run of the rule-based sys-tem we accumulated a toy-dataset with dialogues The assembleddataset includes the following

bull Plain dialogues with unique dialogue-indexes

bull Plain ID information collected for the whole conversation

bull Pairs of questions (ie user requests) and responses (ie botreplies) with the unique dialogue- and turn-indexes

bull Triples in the form of (Question Next Action Response) Informa-tion on the next systemrsquos action could be employed to train aDM unit with (deep-) machine learning algorithms

bull On every conversational state we keep the entities that the sys-tem was able to extract along with their position in the utter-ance This information could be used to train a custom domain-specific NER model

35 summary

In this work we realized a conversational agent for IPA purposesthat addresses two problems First it decreases repetitive and time-consuming activities which allows workers of the e-learning plat-form to focus solely on mathematical and hence cognitively demand-ing questions Second by interacting with users it augments theresources with structured and labeled training data The latter canbe utilized for further implementation of learnable dialogue compo-nents

The realization of such a system was connected with multiple chal-lenges Among others were missing structured conversational dataambiguous or erroneous user-generated text and the necessity todeal with existing corporate tools and their design

The proposed conversational agent enabled us to accumulate struc-tured labeled data without any special efforts from the human (ietutors) side (eg manual annotation and post-processing of existingdata change of the conversational structure within the tutoring pro-cess) Once we collected structured dialogues we were able to re-train particular segments of the system with deep learning methodsand achieved consistent performance for both of the proposed tasksWe report the re-implemented elements in Chapter 4

At the time of writing this thesis the system described in this workwas deployed at the OMB+ platform

[ September 4 2020 at 1017 ndash classicthesis ]

40 multipurpose ipa via conversational assistant

36 future work

Possible extensions of our work are

bull In this work we implemented a rule-based dialogue modelfrom scratch and adjusted it for the specific domain (ie e-learning) The core of the model ndash is a dialogue manager thatdetermines the current state of the conversation and the possi-ble next action Rule-based systems are generally consideredto be hardly adaptable to new domains however our dialoguemanager proved to be flexible to slight modifications in a work-flow One of the possible directions of future work would bethus the investigation of the general adaptability of the dialoguemanager core to other scenarios and domains (eg differentcourse)

bull A further possible extension would we the introduction of dif-ferent languages The current model is implemented for the Ger-man language but the OMB+ platform provides this course alsoin English and Chinese Therefore it would be interesting toinvestigate whether it is possible to adjust the existing systemto other languages without drastic changes in the model

[ September 4 2020 at 1017 ndash classicthesis ]

4D E E P L E A R N I N G R E - I M P L E M E N TAT I O N O F U N I T S

This chapter partially covers work already published at in-ternational peer-reviewed conferences The relevant pub-lication is Moiseeva et al 2020 The research described inthis chapter was carried out in its entirety by the authorof this thesis The other authors of the publication actedas advisors

In this chapter we address the challenge of re-implementation oftwo central components of our IPA dialogue system described in Chap-ter 3 by employing deep learning techniques Since the data for thetraining was collected in a (short) trial-run of the rule-based systemit is not sufficient to train a neural network from scratch Thereforewe utilize Transfer Learning (TL) methods and employ the previouslypre-trained Bidirectional Encoder Representations from Transform-ers (BERT) model that we fine-tune for two of our custom tasks

41 introduction

Data mining and machine learning technologies have gained notableadvancement in many research fields including but not limited tocomputer vision robotics advanced analytics and computational lin-guistics (Pan and Yang 2009 Trautmann et al 2019 Werner et al2015) Besides that in recent years deep neural networks with theirability to accurately map from inputs to outputs (ie labels imagessentences) while analyzing patterns in data became a potential solu-tion for many industrial automatization tasks (eg fake news detec-tion sentiment analysis)

Nevertheless conventional supervised models still have limitationswhen applied to real-world data and industrial tasks The first chal-lenge here is a training phase since a robust model requires an im-mense amount of structured and labeled data Still there are just afew cases where such data is publicly available (eg research datasets)but for most of the tasks domains and languages the data has tobe gathered over a long time The second hurdle is that if trainedon existing data from a distinct domain a supervised model cannotgeneralize to conditions that are different from those faced in thetraining dataset Most of the machine learning methods perform cor-rectly only under a general assumption the training and test dataare mapped to the same feature space and the same distribution (Panand Yang 2009) When the distribution changes most statistical mod-els need to be retrained from scratch using newly obtained training

41

[ September 4 2020 at 1017 ndash classicthesis ]

42 deep learning re-implementation of units

samples It is especially crucial if applying such approaches to real-world data that is usually very noisy and contains a large numberof scenarios In this case a model would be ill-trained to make deci-sions about novel patterns Furthermore for most of the real-worldapplications it is expensive or even not feasible at all to recollect therequired training data to retrain the model Hence the possibility totransfer the knowledge obtained during the training on existing datato the new unseen domain is one of the possible solutions for indus-trial problems Such a technique is called Transfer Learning (TL) andit allows dealing with the abovementioned scenarios by leveragingexisting labeled data of some related tasks or domains

411 Outline and Contributions

This work addresses the challenge of re-implementing two out ofthree central components in our rule-based IPA conversational assis-tant with deep learning methods As we discussed in Chapter 2 hy-brid dialogue models (ie consisting of both rule-based and trainablecomponents) appear to replace the established rule- and template-based methods which are to the most part employed in the in-dustrial setting To investigate this assumption and examine currentstate-of-the-art deep learning techniques we conducted experimentson our custom domain-specific data Since the available data werecollected in a trial-run and the obtained dataset was relatively smallto train a conventional machine learning model from scratch we uti-lized the Transfer Learning (TL) approach and fine-tuned the existingpre-trained model for our target domain

For the experiments we defined two tasks

bull First we considered the NER problem in a custom domain set-ting We defined a sequence labeling task and employed a Bidirec-tional Encoder Representations from Transformers (BERT) (De-vlin et al 2018) as one of the transfer learning methods widelyutilized in the NLP area We applied the model on our datasetand fine-tuned it for seven (7) domain-specific (ie e-learning)entities

bull Second we examined the effectiveness of transfer learning forthe dialogue manager core The DM is one of the fundamentalparts of the conversational agent and requires compelling learn-ing techniques to hold the conversation intelligently For thatexperiment we defined a classification task and applied the BERT

model to predict the systems next action for every given userinput in a conversation We then computed the macro F1-scorefor 14 possible classes (ie actions) and an average dialogueaccuracy

[ September 4 2020 at 1017 ndash classicthesis ]

42 related work 43

The main characteristics of both settings are

bull the set of named entities and labels (ie actions) between sourceand target domain is different

bull for both settings we utilized the BERT model in German andmultilingual versions with additional parameters

bull the dataset used to train the target model is small and consistsof highly domain-specific labels The latter is often appearingin industrial scenarios

We finally verified that the selected method performed reasonablywell on both tasks We reached the performance of 093 macro F1

points for Named Entity Recognition (NER) and 075 macro F1 pointsfor the Next Action Prediction (NAP) task Therefore we concludethat both NER and NAP components could be employed to substituteor extend the existing rule-based modules

42 related work

As discussed in the previous section the main benefit of TransferLearning (TL) is that it allows the domains and tasks used duringtraining and testing phases to be different Initial steps in the for-mulation of transfer learning were made after NeurIPS-951 workshopthat was focused on the demand for lifelong machine-learning meth-ods that preserve and reuse previously learned knowledge (Chen etal 2018) Research on transfer learning has attracted more and moreattention since then especially in the computer vision domain andin particular for image recognition tasks (Donahue et al 2014 SharifRazavian et al 2014)

Recent studies have shown that TL can also be successfully appliedwithin the Natural Language Processing (NLP) domain especially onsemantically equivalent tasks (Dai et al 2019 Devlin et al 2018Mou et al 2016) Several research works were carried out specificallyon the Named Entity Recognition (NER) task and explored the capa-bilities of transfer learning when applied to different named entitycategories (ie different output spaces) For instance Qu et al (2016)pre-trained a linear-chain Conditional Random Fields (CRF) on a largeamount of labeled data in the source domain Then they introduceda two linear layer neural network that learns the difference betweenthe source and target label distributions Finally the authors initial-ized another CRF with parameters learned in linear layers to predictthe labels of the target domain Kim et al (2015) conducted experi-ments with transferring features and model parameters between re-lated domains where the label classes are different but may have asemantic similarity Their primary approach was to construct labelembeddings to automatically map the source and target label classesto improve the transfer (Chen et al 2018)

1Conference on Neural Information Processing Systems Formerly NIPS

[ September 4 2020 at 1017 ndash classicthesis ]

44 deep learning re-implementation of units

However there are also models trained in a manner such that theycould be applied to multiple NLP tasks simultaneously Among sucharchitectures are ULMFit (Howard and Ruder 2018) BERT (Devlin etal 2018) GPT2 (Radford et al 2019) and TransformerXL (Dai et al2019) ndash powerful models that were recently proposed for transferlearning within the NLP domain These architectures are called lan-guage models since they attempt to learn language structure from largetextual collections in an unsupervised manner and then predict thenext word in a sequence based on previously observed context wordsThe Bidirectional Encoder Representations from Transformers (BERT)(Devlin et al 2018) model is one of the most popular architectureamong the abovementioned and it is also one of the first that showeda close to human-level performance on the General Language Un-derstanding Evaluation (GLUE) benchmark2 (Wang et al 2018) thatincludes tasks on sentiment analysis question answering semanticsimilarity and language inference The architecture of BERT consistsof stacked transformer blocks that are pre-trained on a large corpusconsisting of 800M words from modern English books (ie BooksCor-pus by Zhu et al) and 2 500M words of text from pre-processed (iewithout markup) English Wikipedia articles Overall BERT advancedthe state-of-the-art performance for eleven (11) NLP tasks We describethis approach in detail in the subsequent section

43 bert bidirectional encoder representations from

transformers

Bidirectional Encoder Representations from Transformers (BERT) ar-chitecture is divided into two steps pre-training and fine-tuning Thepre-training step was enabled through two following tasks

bull Masked Language Model ndash the idea behind this approach is totrain a deep bidirectional representation that is different fromconventional language models which are either trained left-to-right or right-to-left To train a bidirectional model the authorsmask out a certain number of tokens in a sequence The modelis then asked to predict the original words for masked tokens Itis essential that in contrast to denoising auto-encoders (Vincentet al 2008) the model does not need to reconstruct the entireinput instead it attempts to predict the masked words Sincethe model does not know which words exactly it will be askedabout it learns a representation for every token in a sequence(Devlin et al 2018)

bull Next Sentence Prediction ndash aims to capture relationships be-tween two sentences A and B In order to train the model theauthors pre-trained a binarized next sentence prediction taskwhere pairs of sequences were labeled according to the criteriaA and B either follow each other directly in the corpus (label

2httpsgluebenchmarkcom

[ September 4 2020 at 1017 ndash classicthesis ]

43 bert bidirectional encoder representations from transformers 45

IsNext) or are both taken from random places (label NotNext)The model is then asked to predict whether the former or thelatter is the case The authors demonstrated that pre-trainingtowards this task is extremely useful for both question answeringand natural language inference tasks (Devlin et al 2018)

Furthermore BERT uses WordPiece tokenization for embeddings(Wu et al 2016) which segments words into subword-level tokensThis approach allows the model to make additional inferences basedon word structure (ie -ing ending denotes similar grammatical cat-egories)

The fine-tuning step follows the pre-training process For that theBERT model is initialized with parameters obtained during the pre-training step and a task-specific fully-connected layer is used aftertransformer blocks which maps the general representations to cus-tom labels (employing softmax operation)

self-attention The key operation of BERT architecture is theself-attention which is a sequence-to-sequence operation with conven-tional encoder-decoder structure (Vaswani et al 2017) The encodermaps an input sequence (x1 x2 xn) to a sequence of continuousrepresentations z = (z1 z2 zn) Given z the decoder then gener-ates an output sequence (y1y2 ym) of symbols one element ata time (Vaswani et al 2017) To produce output vector yi the selfattention operation takes a weighted average over all input vectors

yi =sumj

wijxj (16)

where j is an index over the entire sequence and the weights sumto one (1) over all j The weight wij is derived from a dot productfunction over xi and xj (Bloem 2019 Vaswani et al 2017)

w primeij = xTi xj (17)

The dot product returns a value between negative and positive in-finity and a softmax maps these values to [0 1] to ensure that theysum to one (1) over the entire sequence (Bloem 2019 Vaswani et al2017)

wij =expwlsquoijsumj expwlsquoij

(18)

A self-attention is not the only operation in transformers but is theessential one since it propagates information between vectors Otheroperations in the transformer are applied to every vector in the inputsequence without interactions between vectors (Bloem 2019 Vaswaniet al 2017)

[ September 4 2020 at 1017 ndash classicthesis ]

46 deep learning re-implementation of units

44 data amp descriptive statistics

The benchmark that we accumulated through the trial-run includes300 structured and partially labeled dialogues with the average lengthof dialogue being six (6) utterances All the conversations were heldin the German language Detailed general statistics can be found inTable 14

Value

Max Len Dialogue (in utterances) 15

Avg Len Dialogue (in utterances) 6

Max Len Utterance (in tokens) 100

Avg Len Utterance (in tokens) 9

Overall Unique Action Labels 13

Overall Unique Entity Labels 7

Train ndash Dialogues ( Utterances) 200 (1161)

Eval ndash Dialogues ( Utterances) 50 (279)

Test ndash Dialogues ( Utterances) 50 (300)

Table 14 General statistics for conversational dataset

The dataset covers 14 unique next actions and seven (7) uniquenamed entities Detailed information on next actions and entities canbe found in Table 15 and Table 16 Below we listed possible actionswith the corresponding explanation and predefined mapped state-ments

unk ndash requests to define the intent of the question mathematicalorganizational or contextual (ie fallback policy in case if theautomated intent recognition unit fails) rarr ldquoHast du eine Fragezu einer Aufgabe (MATH) zu einem Text im Kurs (TEXT) oder eineorganisatorische Frage (ORG)rdquo

org ndash handovers discussion to a human tutor in case of organiza-tional question rarr ldquoEs scheint sich wohl um eine organisatorischeFrage zu handeln Bitte fasse deine Frage in einem kurzen Text zusam-men damit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

text ndash handovers discussion to a human tutor in case of contextualquestion rarr ldquoEs scheint sich wohl um eine inhaltliche Frage zuhandeln Bitte fasse deine Frage in einem kurzen Text zusammendamit ich sie an meinen menschlichen Kollegen weiterleiten kannrdquo

[ September 4 2020 at 1017 ndash classicthesis ]

44 data amp descriptive statistics 47

topic ndash requests information on the topic rarr ldquoWelches Kapitel bear-beitest du gerade Antworte bitte mit der Kapitelnummer zB IA IB II oder mit dem Kapitelnamen zB Geometrierdquo

examination ndash requests information about the examination moderarrldquoBearbeitest du eine Uumlbungsaufgabe eine Trainingsaufgabe einen Quizoder eine Schlusspruumlfungrdquo

subtopic ndash requests information about the subtopic in case of quizor section-level trainingexercise modes rarr ldquoWie heiszligt der Ab-schnitt in dem sich die Aufgabe befindet auf die sich deine Fragebeziehtrdquo

level ndash requests information about the level of examination modechapter or section rarr ldquoGeht es um eine Aufgabe auf Kapitel-Ebene(zB um eine Trainingsaufgabe im Kapitel Geometrie) oder um eineAufgabe auf der Ebene eines Abschnitts in einem Kapitel (zB umeine Aufgabe zum Abschnitt Winkel im Kapitel Geometrie) Antwortebitte mit KAP fuumlr Kapitel und SEK fuumlr Abschnittrdquo

question number ndash requests information about the question num-ber rarr ldquoWelche Aufgabe bearbeitest du gerade Wenn du zB beiTeilaufgabe c in der zweiten Trainingsaufgabe (oder der zweiten Uumlbungs-aufgabe) bist antworte mit 2c Bearbeitest du gerade einen Quiz odereine Schlusspruumlfung gib bitte nur die Teilaufgabe anrdquo

final request ndash considers the Informational Dictionary (ID) to becomplete and initiates the verification step rarr ldquoDeine Angabenwaren wie folgt Habe ich dich richtig verstanden Bitte antwortemit Ja oder Neinrdquo

verify request ndash requests the student to verify the assembled in-formation rarr ldquoWelche der folgenden Punkte habe ich falsch erkanntSchreibe bitte welche der folgenden Punkte es sind [a b c ]3 rdquo

correct request ndash requests the student to provide a correct infor-mation if there is an error in the assembled data rarr ldquoGib bittedie korrekte Information fuumlr [a]4 einrdquo

exact question ndash requests the student to provide the detailed ex-planation of the problem rarr ldquoBitte fasse deine Frage in einemkurzen Text zusammen damit ich sie an meinen menschlichen Kolle-gen weiterleiten kannrdquo

human handover ndash finalizes the conversation rarr ldquoVielen Dankunsere menschlichen Kollegen melden sich gleich bei dirrdquo

3Variables that depend on the state of Informational Dictionary (ID)4Variable that was selected as erroneous and needs to be corrected

[ September 4 2020 at 1017 ndash classicthesis ]

48 deep learning re-implementation of units

Action Counts Action Counts

Final Request 321 Unk 80

Human Handover 300 Subtopic 55

Exact Question 286 Correct Request 40

Question Number 176 Verify Request 34

Examination 175 Org 17

Topic 137 Text 13

Level 130

Table 15 Detailed statistics on possible systems actions Columns Countsdenote the number of occurrences of each action in the entiredataset

Entity Counts

Question Nr 317

Chapter 311

Examination 303

Subtopic 198

Level 80

Intent 70

Table 16 Detailed statistics on possible named entities Column Counts de-note the number of occurrences of each entity in the entire dataset

45 named entity recognition

We defined a sequence labeling task to extract custom entities fromuser input We considered seven (7) viable entities (see Table 16) tobe recognized by the model topic subtopic examination mode and levelquestion number intent as well as the entity other for remaining wordsin the utterance Since the data collected from the rule-based sys-tem already includes information on the entities we able to train adomain-specific NER unit However since the original user-input wasinformal the same information could be provided in different writ-ing styles That means that a single entity could have different surfaceforms (eg synonyms writing styles) Therefore entities that we ex-tracted from the rule-based system were transformed into a universalstandard (eg official chapter names) To consider all of the variableentity forms while post-labeling the original dataset we determinedgeneric entity names (eg chapter question nr) and mapped vari-ations of entities from the user input (eg Chapter = [ElementaryCalculus Chapter I ]) to them The overall dataset consists of 300labeled dialogues where 200 (with 1161 utterances) were employedfor training and 100 for evaluation and test sets (50 dialogues withca 300 utterances for each set respectively)

[ September 4 2020 at 1017 ndash classicthesis ]

46 next action prediction 49

451 Model Settings

We conducted experiments with German and multilingual BERT im-plementations5 The capitalization of words is significant for the Ger-man language therefore we run the experiments on the capitalizeddata while preserving the original punctuation We employed the avail-able base model for both multilingual and German BERT implemen-tations in the cased version We initiated the learning rate for bothmodels to 1e minus 4 and the maximum length of the tokenized inputwas set to 128 tokens We conducted the experiments multiple timeswith different seeds for a maximum of 50 epochs with the trainingbatch size set to 32 We utilized AdamW as the optimizer and appliedearly stopping if the performance did not improve significantly after5 epochs

46 next action prediction

We defined a classification problem where we aim to predict the sys-temrsquos next action according to the given user input We assumed 14custom actions (see Table 15) that we considered being our classesWhile collecting the toy dataset every input was automatically la-beled by the rule-based system with the corresponding next actionand the dialogue-id Thus no additional post-labeling was requiredon this step

We investigated two settings for this task

bull Default Setting Utilizing a user input and the correspondingclass (ie next action) without any supplementary context Bydefault we conduct all of our experiments in this setting

bull Extended Setting Employing a user input corresponding nextaction and previous systems action as a source of supplementarycontext For this setting we experimented with the best per-forming model from the default setting

The overall dataset consists of 300 labeled dialogues where 200(with 1161 utterances) were employed for training and 100 for evalu-ation and test sets (50 dialogues with ca 300 utterances for each setrespectively)

461 Model Settings

For the NAP task we carried out experiments with German and multi-lingual BERT implementations as well We examined the performanceof both capitalized and lower-cased inputs as well as plain and prepro-cessed data For the multilingual BERT we used the base model inboth cased and uncased modifications For the German BERT we

5httpsgithubcomhuggingfacetransformers

[ September 4 2020 at 1017 ndash classicthesis ]

50 deep learning re-implementation of units

utilized the base model in the cased modification only6 For bothmodels we initiated the learning rate to 4e minus 5 and the maximumlength of the tokenized input was set to 128 tokens We conductedthe experiments multiple times with different seeds for a maximumof 300 epochs with the training batch size set to 32 We utilized AdamW

as the optimizer and applied early stopping if the performance didnot improve significantly after 15 epochs

47 evaluation and results

To evaluate the models we computed word-level macro F1 score forthe Named Entity Recognition (NER) task and utterance-level macro F1

score for the Next Action Prediction (NAP) task The word-level F1

is estimated as the average of the F1 scores per class each computedfrom all words in the evaluation and test sets The outcomes for theNER task are represented in Table 17 For utterance-level F1 a singleclass label (ie next action) is taken for the whole utterance Theresults for the NAP task are shown in Table 18

We additionally computed average dialogue accuracy for the best per-forming NAP models This score indicates how well the predictednext actions compose the conversational flow The average dialogueaccuracy was estimated for 50 conversations in the evaluation andtest sets respectively The results are displayed in Table 19

The obtained results for the NER task revealed that German BERT

performed significantly better than the multilingual BERT model Theperformance of the custom NER unit is at 093 macro F1 points for allpossible named entities (see Table 17) In contrast for the NAP taskthe multilingual BERT model obtained better performance than theGerman BERT model The best performing method in the default set-ting gained a macro F1 of 0677 points for 14 possible classes whereasthe model in the extended setting performed better ndash its best macroF1 score is 0752 for the equal number of classes (see Table 18)

Regarding the dialogue accuracy the extended system fine-tunedwith multilingual BERT obtained better outcomes as the default one(see Table 19) The overall conclusion for the NAP is that the capi-talized setting increased the performance of the model whereas theinclusion of punctuation has not influenced the results

6Uncased pre-trained modification of model was not available

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 51

Task Model Cased Punct F1

Eval Test

NER GER X X 0971 0930

NER Mult X X 0926 0905

Table 17 Word-level F1 for the Named Entity Recognition (NER) task Inbold best performance for evaluation and test sets

Task Model Cased Punct Extended F1

Eval Test

NAP GER X times times 0711 0673

NAP GER X X times 0701 0606

NAP Mult X X times 0688 0625

NAP Mult X times times 0769 0677

NAP Mult X times X 0810 0752

NAP Mult times X times 0664 0596

NAP Mult times times times 0742 0502

Table 18 Utterance-level F1 for the Next Action Prediction (NAP) task Un-derlined best performance for evaluation and test sets for defaultsetting (without previous action context) In bold best perfor-mance for evaluation and test sets on extended setting (with previ-ous action context)

Model Accuracy

Eval Test

NAP default 0765 0724

NAP extended 0813 0801

Table 19 Average dialogue accuracy computed for the Next Action Predic-tion (NAP) task for best preforming models In bold best perfor-mance for evaluation and test sets

[ September 4 2020 at 1017 ndash classicthesis ]

52 deep learning re-implementation of units

Figure5M

acroF1

evaluationN

AP

ndashdefault

modelC

omparison

between

Germ

anand

multilingual

BERT

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 53

Figu

re6

Mac

roF1

test

NA

Pndash

defa

ult

mod

elC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

54 deep learning re-implementation of units

Figure7M

acroF1

evaluationN

AP

ndashextended

model

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 55

Figu

re8

Mac

roF1

test

NA

Pndash

exte

nded

mod

el

[ September 4 2020 at 1017 ndash classicthesis ]

56 deep learning re-implementation of units

Figure9M

acroF1

evaluationN

ERndash

Com

parisonbetw

eenG

erman

andm

ultilingualBER

T

[ September 4 2020 at 1017 ndash classicthesis ]

47 evaluation and results 57

Figu

re1

0M

acroF1

test

NER

ndashC

ompa

riso

nbe

twee

nG

erm

anan

dm

ulti

lingu

alBE

RT

[ September 4 2020 at 1017 ndash classicthesis ]

58 deep learning re-implementation of units

48 error analysis

We then investigated the cases where models failed to predict thecorrect action or labeled the named entity span erroneously Belowwe outline the most common errors

next action prediction One of the prevalent flaws in the de-fault model was the mismatch between two consecutive actionsndash the action question number and subtopic That is due to theposition of these actions in the conversational flow The occur-rence of both actions in the dialogue is not strict and largelydepends on the previous action To explain this we illustratethe following possible conversational scenarios

bull The system can either directly proceed with the request forthe question number if the examination mode is the finalexamination

bull If the examination mode is training or exercise the systemhas to ask about the level first (ie chapter or section)

bull If it is a chapter-level the system can again directly pro-ceed with the request for the question number

bull In the case of section-level the system has to ask aboutthe subsection first (if this information was not providedbefore)

The examination of the extended model showed that the inductionof supplementary context (ie previous action) improved theperformance of the system by about 60

named entity recognition The cases where the system failedcover mismatches between the labels chapter and other andthe labels question number and other This failure arose dueto the imperfectly labeled span of a multi-word named entity(eg ldquoElementary Calculus Proportionality Percentagerdquo) The firstor last words in the named entity were excluded from the spanand erroneously labeled with the other label

49 summary

In this chapter we re-implemented two of three fundamental unitsof our IPA conversational agent ndash Named Entity Recognition (NER)and Dialogue Manager (DM) ndash using the methods of Transfer Learn-ing (TL) specifically the Bidirectional Encoder Representations fromTransformers (BERT) model

We believe that the obtained results are reasonably high consider-ing a comparatively little amount of training data we employed tofine-tune the models Therefore we conclude that both NAP and NER

segments could be used to replace or extend the existing rule-based

[ September 4 2020 at 1017 ndash classicthesis ]

410 future work 59

modules Rule-based components as well as ElasticSearch (ES) arelimited in their capacities and not flexible to novel patterns Whereasthe trainable units generalize better and could possibly decrease theamount of inaccurate predictions in case of accidental dialogue be-havior Furthermore to increase the overall robustness rule-basedand trainable components could be used synchronously when onesystem fails (eg ElasticSearch (ES) module can not extract an entity)the conversation continues on the prediction obtained from the othermodel

410 future work

There are possible directions for future work

bull At the time of writing of this thesis both units were trained andinvestigated on its own ndash that means without being incorpo-rated into the initial IPA conversational system Thus one of thepotential directions of future work would be the embodimentof these units into the original system and the examination ofpossible failure cases Furthermore the resulting hybrid sys-tem could then imply the constraint of additional meta policiesto prevent an unexpected systems behavior

bull Further investigation could be towards the multi-language mo-dality for the re-implemented units Since the Online Mathe-matik Bruumlckenkurs (OMB+) platform also supports English andChinese it would be interesting to examine whether the sim-ple translation from target language (ie English Chinese) tosource language (ie German) would be sufficient to employalready-assembled dataset and pre-trained units Presumablyit should be achievable in the case of the Next Action Predic-tion (NAP) unit but could lead to a noticeable performance dropfor the Named Entity Recognition (NER) task since there couldbe a considerable gap between translated and original entitiesspans

bull Last but not least one of the further extensions for the IPA

would be the eventual applicability for more cognitively de-manding tasks For instance it is of interest to extend the modelfor more complex domains like handling of mathematical ques-tions Would it be possible to automate the human assistance atleast by less complicated mathematical inquiries or is it still un-achievable with currently existing NLP techniques and machinelearning methods

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

Part II

C L U S T E R I N G - B A S E D T R E N D A N A LY Z E R

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

5F O U N D AT I O N S

As discussed in Chapter 1 predictive analytics is one of the poten-tial Intelligent Process Automation (IPA) tasks due to its capabilityto forecast future events by using current and historical data andtherethrough assist knowledge workers with for instance sales or in-vestments Modern machine learning algorithms allow an intelligentand fast analysis of large amounts of data However the evaluationof such algorithms and models remains one of the grave problemsin this domain This is mostly due to the lack of publicly availablebenchmarks for the evaluation of topic modeling and trend detectionalgorithms Hence in this part of the thesis we are addressing thischallenge and assembling the benchmark for detecting both trendsand downtrends (ie topics that become less frequent overtime) Tothe best of our knowledge the task of downtrend detection has notbeen well addressed before

In this chapter we introduce the concepts required by the secondpart of the thesis We begin with the enlightenment to the topic mod-eling task and define the emerging trend in Section 51 Next weexamine the related work existing approaches to topic modeling andtrend detection and their limitations in Section 52 In Chapter 6we introduce our TRENDNERT benchmark for trend and downtrenddetection and describe the method we employed for its assembly

51 motivation and challenges

Science changes and evolves rapidly novel research areas emergewhile others fade away Keeping pace with these changes is challeng-ing Therefore recognizing and forecasting emerging research trends isof significant importance for researchers academic publishers as wellas for funding agencies stakeholders and innovation companies

Previously the task of trend detection was mostly solved by do-main experts who used specialized and cumbersome tools to inves-tigate the data to get useful insights from it (eg through data visu-alization or statistics) However manual analysis of large amountsof data could be time-consuming and hiring domain experts is verycostly Furthermore the overall increase of research data (ie GoogleScholar PubMed DBLP) in the past decade makes the approachbased on human domain experts less and less scalable In contrastautomated approaches become a more desirable solution for the taskof emerging trend identification (Kontostathis et al 2004 Salatino2015) Due to a large number of electronic documents appearing on

63

[ September 4 2020 at 1017 ndash classicthesis ]

64 foundations

the web daily several machine learning techniques were developed tofind patterns of word occurrences in those documents using hierarchi-cal probabilistic models These approaches are called ndash topic modelsand they allow the analysis of extensive textual collections and valu-able insights from them The main advantage of topic models is theirability to discover hidden patterns of word-use and connecting docu-ments containing similar patterns In other words the idea of a topicmodel is that documents are a mixture of topics where a topic is de-fined as a probability distribution over words (Alghamdi and Alfalqi2015)

Topics have a property of being able to evolve thus modeling top-ics without considering time may confuse the precise topic identifi-cation Whereas modeling topics by considering time is called topicevolution modeling and it can reveal important hidden informationin the document collection allowing recognizing topics with the ap-pearance of time This approach also provides the ability to check theevolution of topics during a given time window and examine whethera given topic is gaining interest and popularity over time (ie becom-ing an emerging trend) or if its development is stable Considering thelatter approach we turn to the task of emerging trend identification

How is an emerging trend defined An emerging trend is a topicthat is gaining interest and utility over time and it has two attributesnovelty and growth (Small Boyack and Klavans 2014 Tu and Seng2012) Rotolo Hicks and Martin (2015) explain the correlations ofthese two attributes in the following way Up to the point of theemergence a research topic is specified by a high level of novelty Atthat time it does not attract any considerable attention from the sci-entific community and due to the limited impact its growth is nearlyflat After some turning points (ie the appearance of significant sci-entific publications) the research topic starts to grow faster and maybecome a trend if the growth is fast however the level of a noveltydecreases continuously once the emergence becomes clear Then atthe final phase the topic becomes well-established while its noveltylevels out (He and Chen 2018)

To detect trends in textual data one employs computational anal-ysis techniques and Natural Language Processing (NLP) In the sub-sequent section we give an overview of the existing approaches fortopic modeling and trend detection

52 detection of emerging research trends

The task of trend detection is closely associated with the topic mod-eling task since in the first instance it detects the latent topics in thesubstantial collection of data It secondarily employs techniques toinvestigate the evolution of the topic and whether it has the potentialto become a trend

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 65

Topic modeling has an extensive history of applications in the scien-tific domain including but not limited to studies of temporal trends(Griffiths and Steyvers 2004 Wang and McCallum 2006) and investi-gation of the impact prediction (Yogatama et al 2011) Below we givea general overview of the most significant methods used in this do-main explain their importance and if any the potential limitations

521 Methods for Topic Modeling

In order to extract a topic from textual documents (ie research pa-pers blog posts) the precise definition of the model for topic repre-sentation is crucial

latent semantic analysis or formerly called Latent SemanticIndexing is one of the fundamental methods in the topic model-ing domain proposed by Deerwester et al in 1990 Its primarygoal is to generate vector-based representations for documentsand terms and then employ these representations to computethe similarity between documents (resp terms) in order to se-lect the most related ones In this regard Latent Semantic Anal-ysis (LSA) takes a matrix of documents and terms and then de-composes it into a separate document-topic matrix and a topic-term matrix Before turning to the central task of finding latenttopics that capture the relationships among the words and docu-ments it also performs dimensionality reduction utilizing trun-cated Singular Value Decomposition This technique is usuallyapplied to reduce the sparseness noisiness and redundancyacross dimensions in the document-topic matrix Finally usingthe resulting document and term vectors one applies measuressuch as cosine similarity to evaluate the similarity of differentdocuments words or queries

One of the extensions to traditional LSA is probabilistic LatentSemantic Analysis (pLSA) that was introduced by Hofmann in1999 to fix several shortcomings The foremost advantage ofthe follow-up method was that it could successfully automatedocument indexing which in turn improved the LSA in a prob-abilistic sense by using a generative model (Alghamdi and Al-falqi 2015) Furthermore the pLSA can differentiate betweendiverse contexts of word usage without utilizing dictionariesor a thesaurus That affects better polysemy disambiguationand discloses typical similarities by grouping words that sharea similar context Among the works that employ pLSA is theapproach proposed by Gohr et al (2009) where authors applypLSA in a window that slides across the stream of documents toanalyze the evolution of topics Also Mei et al (2008) uses thepLSA in order to create a network of topics

[ September 4 2020 at 1017 ndash classicthesis ]

66 foundations

latent dirichlet allocation was developed by Blei Ng andJordan (2003) to overcome the limitations of both LSA and pLSA

models Nowadays Latent Dirichlet Allocation (LDA) is ex-pected to be one of the most utilized and robust probabilistictopic models The fundamental concept of LDA is the assump-tion that every document is represented as a mixture of topicswhere each topic is a discrete probability distribution that de-fines the probability of each word to appear in a given topicThese topic probabilities provide a compressed representationof a document In other words LDA models all of d documentsin the collection as a mixture over n latent topics where eachof the topics describes a multinomial distribution over awwordin the vocabulary (Alghamdi and Alfalqi 2015)

LDA fixes such weaknesses of the pLSA like the incapacity toassign a probability to previously unseen documents (becausepLSA learns the topic mixtures only for documents seen in thetraining phase) and the overfitting problem (ie the number ofparameters in pLSA increases linearly with the size of the corpus)(Salatino 2015)

An extension of the conventional LDA is the hierarchical LDA

(Griffiths et al 2004) where topics are grouped in hierarchiesEach node in the hierarchy is associated with a topic and a topicis a distribution across words Another comparable approach isthe Relational Topic Model (Chang and Blei 2010) which is amixture of a topic model and a network model for groups oflinked documents The attribute of every document is its text(ie discrete observations taken from a fixed vocabulary) andthe links between documents are hyperlinks or citations

LDA is a prevalent method for discovering topics (Blei and Laf-ferty 2006 Bolelli Ertekin and Giles 2009 Bolelli et al 2009Hall Jurafsky and Manning 2008 Wang and Blei 2011 Wangand McCallum 2006) However it was explored that to trainand fine-tune a stable LDA model could be very challenging andtime-consuming (Agrawal Fu and Menzies 2018)

correlated topic model One of the main drawbacks of the LDA

is its incapability of capturing dependencies among the topics(Alghamdi and Alfalqi 2015 He et al 2017) Therefore Bleiand Lafferty (2007) proposed the Correlated Topic Model as anextension of the LDA approach In this model the authors re-placed the Dirichlet distribution with a logistic-normal distribu-tion that models pairwise topic correlations with the Gaussiancovariance matrix (He et al 2017)

However one of the shortcomings of this model is the compu-tational cost The number of parameters in the covariance ma-trix increases quadratic to the number of topics and parameterestimation for the full-rank matrix can be inaccurate in high-dimensional space (He et al 2017) Therefore He et al (2017)

[ September 4 2020 at 1017 ndash classicthesis ]

52 detection of emerging research trends 67

proposed their method to overcome this problem The authorsintroduced the model which learns dense topic embeddingsand captures topic correlations through the similarity betweenthe topic vectors According to He et al their method allowsefficient inference in the low-dimensional embedding space andsignificantly reduces previous cubic or quadratic time complex-ity to linear concerning the topic number

522 Topic Evolution of Scientific Publications

Although the topic modeling task is essential it has a significant off-spring task namely the modeling of topic evolution Methods able toidentify topics within the context of time are highly applicable in thescientific domain They allow understanding of the topic lineage andreveal how research on one topic influences another Generally thesemethods could be classified according to the primary sources of in-formation they employ text citations or key phrases

text-based Extensive textual collections have dynamic co-occurre-nces of word patterns that change over time Thus models thattrack topics evolution over time should consider both word co-occurrence patterns and the timestamps Blei and Lafferty 2006

In 2006 Wang and McCallum proposed a Non-Markov Continu-ous Time model for the detection of topical trends in the col-lection of NeurIPS publications (the overall timestamp of 17years was considered) The authors made the following as-sumption If the pattern of the word co-occurrence exists fora short time then the system creates a narrow-time-distributiontopic Whereas for the long-term patterns system generated abroad- time-distribution topic The principal objective of this ap-proach is that it models topic evolution without discretizing thetime ie the state at time t + t1 is independent of the state attime t (Wang and McCallum 2006)

Another work done by Blei and Lafferty (2006) proposed the Dy-namic Topic Models The authors assumed that the collectionof documents is organized based on certain time spans and thedocuments belonging to every time span are represented witha K-component model Thereby the topics are associated withtime span t and emerge from topics corresponding to span timetminus 1 One of the main differences of this model compared toother existing approaches is that it utilizes Gaussian distribu-tion for the topic parameters instead of the conventional Dirich-let distribution and therefore can capture the evolution overvarious time spans

[ September 4 2020 at 1017 ndash classicthesis ]

68 foundations

citation-based analysis is the second popular direction that isconsidered to be effective for trend identification He et al 2009Le Ho and Nakamori 2005 Shibata Kajikawa and Takeda2009 Shibata et al 2008 Small 2006 This method assumes thatcitations in their various forms (ie bibliographic citations co-citation networks citation graphs) indicate the meaningful rela-tionship between topics and uses citations to model the topicevolution in the scientific domain Nie and Sun (2017) utilizethis approach along with the Latent Dirichlet Allocation (LDA)and k-means clustering to identify research trends The authorsfirst use LDA to extract features and determine the optimal num-ber of topics Then they use k-means to obtain thematic clus-ters and finally they compute citation functions to identify thechanges of clusters over time

Though the citation-based approachrsquos main drawback is that aconsistent collection of publications associated with a specificresearch topic is required for these techniques to detect the clus-ter and novelty of the particular topic He and Chen 2018 Fur-thermore there are also many trend detection scenarios (egresearch news articles or research blog posts) in which citationsare not readily available That makes this approach not applica-ble to this kind of data

keyphrase-based Another standard solution is based on the useof keyphrase information extracted from research papers In thiscase every keyword is representative of a single research topicSystems like Saffron (Monaghan et al 2010) as well as Saffron-based models use this approach Saffron is a research toolkitthat provides algorithms for keyphrase extraction entity link-ing and taxonomy extraction Asooja et al (2016) utilize Saffronto extract the keywords from LREC proceedings and then pro-posed trend forecast based on regression models as an extensionto Saffron

However this method raises many questions including the fol-lowing how to deal with the noisiness of keywords (ie notrelevant keywords) and how to handle keywords that have theirhierarchies (ie sub-areas) Other drawbacks of this approachinclude the ambiguity of keywords (ie ldquoJavardquo as an island andldquoJavardquo as a programming language) or necessity to deal withsynonyms which could be treated as different topics

53 summary

An emerging trend detection algorithm aims to recognize topics thatwere earlier inappreciable but now gaining the importance and in-terest within the specific domain Knowledge of emerging trends isespecially essential for stakeholders researchers academic publish-ers and funding organizations Assume a business analyst in the

[ September 4 2020 at 1017 ndash classicthesis ]

53 summary 69

biotech company who is interested in the analysis of technical arti-cles for recent trends that could potentially impact the companiesinvestments A manual review of all the available information in aspecific domain would be extremely time-consuming or even not fea-sible However this process could be solved through Intelligent Pro-cess Automation (IPA) algorithms Such software would assist humanexperts who are tasked with identifying emerging trends through au-tomatic analysis of extensive textual collections and visualization ofoccurrences and tendencies of a given topic

[ September 4 2020 at 1017 ndash classicthesis ]

[ September 4 2020 at 1017 ndash classicthesis ]

6B E N C H M A R K F O R S C I E N T I F I C T R E N D D E T E C T I O N

This chapter covers work already published at interna-tional peer-reviewed conferences The relevant publica-tion is Moiseeva and Schuumltze 2020 The research describedin this chapter was carried out in its entirety by the authorof the thesis The other author of the publication acted asan advisor

Computational analysis and modeling of the evolution of trendsis a significant area of research because of its socio-economic impactHowever no large publicly available benchmark for trend detectioncurrently exists making a comparative evaluation of methods impos-sible We remedy this situation by publishing the benchmark TREND-NERT consisting of a set of gold trends (resp downtrends) and doc-ument labels that is available as an unlimited download and a largeunderlying document collection that can also be obtained for free Wepropose Mean Average Precision (MAP) as an evaluation measure fortrend detection and apply this measure in an investigation of severalbaselines

61 motivation

What is an emerging research topic and how is it defined in the litera-ture At the time of emergence a research topic often does not attractmuch recognition from the scientific society and is exemplified by justa few publications He and Chen (2018) associate the appearance ofa trend to some triggering event eg the publication of an article ina high-impact journal Later the research topic starts to grow fasterand becomes a trend (He and Chen 2018)

We adopt this as our key representation of trend in this paper atrend is a research topic with a strongly increasing number of publi-cations in a particular time interval In opposite to trends there arealso downtrends which refer to the topics that move lower in theirdemand or growth as they evolve Therefore we define a downtrendas the converse to a trend a downtrend is a research topic that has astrongly decreasing number of publications in a particular time inter-val Downtrends can be as important to detect as trends eg for afunding agency planning its budget

To detect both types of topics (ie trends and downtrends) intextual data one employs computational analysis techniques and NLPThese methods enable the analysis of extensive textual collectionsand gain valuable insights into topical trendiness and evolution over

71

[ September 4 2020 at 1017 ndash classicthesis ]

72 benchmark for scientific trend detection

time Despite the overall importance of trend analysis there is to ourknowledge no large publicly available benchmark for trend detectionThis makes a comparative evaluation of methods and algorithms im-possible Most of the previous work has used proprietary or moder-ate datasets or evaluation measures were not directly bound to theobjectives of trend detection (Gollapalli and Li 2015)

We remedy this situation by publishing the benchmark TREND-NERT1 The benchmark consists of the following

1 A set of gold trends compiled based on an extensive analysis ofthe meta-literature

2 A set of labeled documents (based on more than one millionscientific publications) where each document is assigned to atrend a downtrend or the class of flat topics and supported bythe topic-title

3 An evaluation script to compute Mean Average Precision (MAP)as the measure for the accuracy of trend detection

We make the benchmark available as unlimited downloads2 Theunderlying research corpus can be obtained for free from SemanticScholar3 (Ammar et al 2018)

611 Outline and Contributions

In this work we address the challenge of the benchmark creationfor (down)trend detection This chapter is structured as follows Sec-tion 62 introduces our model for corpus creation and its fundamen-tal components In Section 63 we present the crowdsourcing proce-dure for the benchmark labeling Section 65 outlines the proposedbaseline for trend detection our experimental setup and outcomesFinally we illustrate the details on the distributed benchmark in Sec-tion 66 and conclude in Section 67

Our contributions within this work are as follows

bull TRENDNERT is the first publicly available benchmark for (down)-trend detection

bull TRENDNERT is based on a collection of more than one milliondocuments It is among the largest that has been used for trenddetection and therefore offers a realistic setting for developingtrend detection algorithms

bull TRENDNERT addresses the task of detecting both trends anddowntrends To the best of our knowledge the task of down-trend detection has not been addressed before

1The name is a concatenation of trend and its ananym dnert Because it supportsthe evaluation of both trends and downtrends

2httpsdoiorg1017605OSFIODHZCT3httpsapisemanticscholarorgcorpus

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 73

62 corpus creation

In this section we explain the methodology we used to create theTRENDNERT benchmark

621 Underlying Data

We employ a corpus provided by Semantic Scholar4 It includes morethan one million papers published mostly between 2000 and 2015

in about 5000 computer science journals and conference proceedingsEvery record in the collection consists of a title key phrases abstractfull text and metadata

622 Stratification

The distribution of documents over time in the entire original datasetis skewed (see Figure 11) We discovered that clustering the entire col-lection as well as a random sample produces corrupt results becausemore weight is given to later years than to earlier years Thereforewe first generated a stratified sample of the original document collec-tion To this end we randomly pick 10 000 documents for each yearbetween 2000 and 2016 for an overall sample size of 160 000

Figure 11 Overall distribution of papers in the entire dataset Years 1975 to2016

4httpswwwsemanticscholarorg

[ September 4 2020 at 1017 ndash classicthesis ]

74 benchmark for scientific trend detection

623 Document representations amp Clustering

As we mentioned this workrsquos primary focus was the creation of abenchmark not the development of new algorithms or models fortrend detection Therefore for our experiments we have selected analgorithm based on k-means clustering as a simple baseline methodfor trend detection

In conventional document clustering documents are usually repre-sented as bag-of-words (BOW) feature vectors (Manning Raghavanand Schuumltze 2008) Those however have a major weakness theyneglect the semantics of words (Le and Mikolov 2014) Still the re-cent work in representation learning domain and particularly doc2vecndash the method proposed by Le and Mikolov (2014) ndash can provide rep-resentations to document clustering that overcome this weakness

The doc2vec5 algorithm is inspired by techniques for learning wordvectors (Mikolov et al 2013) and is capable of capturing semantic reg-ularities in textual collections This is an unsupervised approach forlearning continuous distributed vector representations for text (ieparagraphs documents) This approach maps documents into a vec-tor space such that semantically related documents are assigned sim-ilar vector representations (eg an article about ldquogenomicsrdquo is closerto an article about ldquogene expressionrdquo than to an article about ldquofuzzysetsrdquo) Formally each paragraph is mapped to the individual vectorrepresented by a column in matrix D and every word is also mappedto the individual vector represented by a column in matrix W Theparagraph vector and word vectors are concatenated to predict thenext word in a context (Le and Mikolov 2014) Other work has al-ready successfully employed this type of document representationsfor topic modeling combining them with both Latent Dirichlet Allo-cation (LDA) and clustering approaches (Curiskis et al 2019 DiengRuiz and Blei 2019 Moody 2016 Xie and Xing 2013)

In our work we run doc2vec on the stratified sample of 160 000papers and represent each document as a length-normalized vector(ie document embedding) These vectors are then clustered intok = 1000 clusters using the scikit-learn6 implementation of k-means(MacQueen 1967) with default parameters The combination of doc-ument representations with a clustering algorithm is conceptuallysimple and interpretable Also the comprehensive comparative eval-uation of topic modeling methods utilizing document embeddingsperformed by Curiskis et al (2019) showed that doc2vec feature repre-sentations with k-means clustering outperform several other methods7

on three evaluation measures (Normalized Mutual Information Ad-justed Mutual Information and Adjusted Rand Index)

5We use the implementation provided by Gensim httpsradimrehurekcom

gensimmodelsdoc2vechtml6httpsscikit-learnorgstable7hierarchical clustering k-medoids NMF and LDA

[ September 4 2020 at 1017 ndash classicthesis ]

62 corpus creation 75

We run 10 trials of stratification and clustering resulting in 10 dif-ferent clusterings We do this to protect against the variability ofclustering and because we do not want to rely on a single clusteringfor proposing (down)trends for the benchmark

624 Trend amp Downtrend Estimation

Recalling our definitions of trend and downtrend A trend (resp down-trend) is defined as a research topic that has a strongly increasing(resp decreasing) number of publications in a particular time inter-val

Linear trend estimation is a widely used technique in predictive(time-series) analysis that proves statements about tendencies in thedata by correlating the measurements to the times at which they oc-curred (Hess Iyer and Malm 2001) Considering the definition oftrend (resp downtrend) and the main concept of linear trend esti-mation models we utilize the linear regression to identify trend anddowntrend candidates in resulting clustering sets

Specifically we count the number of documents per year for a clus-ter and then estimate the parameters of the best fitting line (ie slope)Clusters are then ranked according to the resulting slope and then = 150 clusters with the largest (resp smallest) slope are selectedas trend (resp downtrend) candidates Thus our definition of a single-clustering (down)trend candidate is a cluster with extreme ascendingor descending slope Figure 12 and Figure 13 give two examples of(down)trend candidates

625 (Down)trend Candidates Validation

As detailed in Section 623 we run ten series of clustering result-ing in ten discrete clustering sets We then estimated the (down)trendcandidates according to the procedure described in Section 624 Con-sequently for each of the ten clustering sets we obtained 150 trendand 150 downtrend candidates

Nonetheless for the benchmark among all the (down)trend can-didates across the ten clustering sets we want to sort out only thosethat are consistent overall sets To obtain consistent (down)trend candi-dates we apply a Jaccard coefficient metric that is used for measuringthe similarity and diversity of sample sets We define an equivalencerelation simR two candidates (ie clusters) are equivalent if their Jac-card coefficient is gt τsim where τsim = 05 Following this notion wecompute the equivalence of (down)trend candidates across the tensets

Finally we define an equivalence class as an equivalence class trendcandidate (resp equivalence class downtrend candidate) if it contains trend(resp downtrends) candidates in at least half of the clusterings This

[ September 4 2020 at 1017 ndash classicthesis ]

76 benchmark for scientific trend detection

Figure 12 A trend candidate (positive slope) Topic Sentiment and EmotionAnalysis (using Social Media Channels)

Figure 13 A downtrend candidate (negative slope) Topic XML

[ September 4 2020 at 1017 ndash classicthesis ]

63 benchmark annotation 77

procedure gave us in total 110 equivalence class trend candidates and107 equivalence class downtrend candidates that we then annotatefor our benchmark8

63 benchmark annotation

To annotate the equivalence class (down)trend candidates obtainedfrom our model we used the Figure-Eight9 platform It provides anintegrated quality-control mechanism and a training phase before theactual annotation (Kolhatkar Zinsmeister and Hirst 2013) whichminimizes the rate of poorly qualified annotators

We design our annotation task as follows A worker is shown adescription of a particular (down)trend candidate along with the list ofpossible (down)trend tags Descriptions are mixed and presented inrandom order The temporal presentation order of candidates is alsoarbitrary The worker is asked to pick from the list with (down)trendtags an item that matches the cluster (resp candidate) best Evenif there are several items that are a possible match the worker mustchoose only one If there is no matching item the worker can selectthe option other Three different workers evaluated each cluster andthe final label is the majority label If there is no majority label wepresent the cluster repeatedly to the workers until there is a majority

The descriptors of candidates are collected through the followingprocedure

bull First we review the documents assigned to a candidate cluster

bull Then we extract keywords and titles of the papers attributed tothe cluster Semantic Scholar provides keywords and titles as apart of the metadata

bull Finally we compute the top 15 most frequent keywords andrandomly select 15 titles These keywords and titles are thedescriptor of the cluster candidate presented to crowd-workers

We create the (down)trend tags based on keywords titles and meta-data such as publication venue The cluster candidates were manu-ally labeled by graduate employees (ie domain experts in the com-puter science domain) considering the abovementioned items

631 Inter-annotator Agreement

To ensure the quality of obtained annotations we computed the InterAnnotator Agreement (IAA) - a measure of how well two (or more)

8Note that the overall number of 217 refers to the number of the (down)trendcandidates and not to the number of documents Each candidate contains hundredsof documents

9Formerly called CrowdFlower

[ September 4 2020 at 1017 ndash classicthesis ]

78 benchmark for scientific trend detection

annotators can make the same decision for a particular category Todo that we used the following metrics

krippendorffrsquos α is a reliability coefficient developed to measurethe agreement among observers or measuring instruments draw-ing distinctions among typically unstructured phenomena orassign computable values to them (Krippendorff 2011) Wechoose Krippendorffrsquos α as an inter-annotator agreement mea-sure because it ndash unlike other specialized coefficients ndash is a gen-eralization of several reliability criteria and applies to situationslike ours where we have more than two annotators and a givenannotator only labels a subset of the data (ie we have to dealwith missing values)

Krippendorff (2011) has proposed several metric variations eachof which is associated with a specific set of weights We useKrippendorffrsquos α considering any number of observers and miss-ing data In this case Krippendorffrsquos α for the overall numberof 217 annotated documents is α = 0798 According to Landisand Koch (1977) this value corresponds to a substantial level ofagreement

figure-eight agreement rate Figure-Eight10 provides an agree-ment score c for each annotated unit u which is based onthe majority vote of the trusted workers The score has beenproven to perform well compared to the classic metrics (Kol-hatkar Zinsmeister and Hirst 2013) We computed the averageC for the 217 annotated documents C is 0799

Since both Krippendorffrsquos α and internal Figure Eight IAA are ratherhigh we consider the obtained annotations for the benchmark to beof good quality

632 Gold Trends

We compiled a list of computer science trends for the years 2016based on the analysis of the twelve survey publications (Ankerholz2016 Augustin 2016 Brooks 2016 Frot 2016 Harriet 2015 IEEEComputer Society 2016 Markov 2015 Meyerson and Mariette 2016Nessma 2015 Reese 2015 Rivera 2016 Zaino 2016) For this yearwe identified a total of 31 trends see Table 20

Since there is a variation as to the name of a specific trend wecreated a unique name for each of them Out of the 31 trends 28(90 3) are instantiated as trends by our model that we confirmedin the crowdsourcing procedure11 We searched for the three missingtrends using a variety of techniques (ie inspection of non-trendsdowntrends random browsing of documents not assigned to trend

10Former CrowdFlower platform11The author verified this based on her judgment of equivalence of one of the 31

gold trend names in the literature with one of the trend tags

[ September 4 2020 at 1017 ndash classicthesis ]

64 evaluation measure for trend detection 79

candidates and keyword searches) Two of the missing trends donot seem to occur in the collection at all ldquoHuman Augmentationrdquo andldquoFood and Water Technologyrdquo The third missing trend ldquoCryptocurrencyand Cryptographyrdquo does occur in the collection but the occurrencerate is very small (ca 001) We do not consider these three trendsin the rest of the paper

Computer Science Trend Computer Science Trend

Autonomous Agents and Systems Natural Language Processing

Autonomous Vehicles Open Source

Big Data Analytics Privacy

Bioinformatics and (Bio) Neuroscience Quantum Computing

Biometrics and Personal Identification Recommender Systems and Social Networks

Cloud Computing and Software as a Service Reinforcement Learning

Cryptocurrency and Cryptographylowast Renewable Energy

Cyber Security Robotics

E-business Semantic Web

Food and Water Technologylowast Sentiment and Emotion Analysis

Game-based Learning Smart Cities

Games and (Virtual) Augmented Reality Supply Chains and RFIDs

Human Augmentationlowast Technology for Climate Change

MachineDeep Learning Transportation and Energy

Medical Advances and DNA Computing Wearables

Mobile Computing

Table 20 31 areas identified as computer science trends for year 2016 in themedia 28 of these trends are instantiated by our model as welland thus covered in our benchmark Topics marked with asterisk(lowast) were not found in our underlying data collection

In summary based on our analysis of meta-literature we identified28 gold trends that also occur in our collection We will use them asthe gold standard that trend detection algorithms should aim to discover

64 evaluation measure for trend detection

Whether something is a trend is a graded concept For this reasonwe adopt an evaluation measure based on a ranking of candidatesspecifically Mean Average Precision (MAP) The score denotes theaverage of the precision values of ranks of correctly identified andnon-redundant trends

We approach trend detection as computing a ranked list of sets ofdocuments where each set is a trend candidate We refer to docu-ments as trendy downtrendy and flat depending on which class theyare assigned to in our benchmark We consider a trend candidate

[ September 4 2020 at 1017 ndash classicthesis ]

80 benchmark for scientific trend detection

c as a correct recognition of gold trend t if it satisfies the followingrequirements

bull c is trendy

bull t is the largest trend in c

bull |tcap c||c| gt ρ where ρ is a coverage parameter

bull t was not earlier recognized in the list

These criteria give us a true positive (ie recognition of a gold trend)or false positive (otherwise) for every position of the ranked list anda precision score for each trend Precision for a trend that was notobserved is 0 We then compute MAP see Table 21 for an example

Gold Trends Gold Downtrends

Cloud Computing Compilers

Bioinformatics Petri Nets

Sentiment Analysis Fuzzy Sets

Privacy Routing Protocols

Trend Candidate |tcap c||c| tpfp P

c1 Cloud Computing 060 tp 100

c2 Privacy 020 fp 050

c3 Cloud Computing 070 fp 033

c4 Bioinformatics 055 tp 050

c5 Sentiment Analysis 095 tp 060

c6 Sentiment Analysis 059 fp 050

c7 Bioinformatics 041 fp 043

c8 Compilers 060 fp 038

c9 Petri Nets 080 fp 033

c10 Privacy 033 fp 030

Table 21 Example of proposed MAP evaluation (with ρ = 05) MAP is theaverage of the precision values 10 (Cloud Computing) 05 (Bioin-formatics) 06 (Sentiment Analysis) and 00 (Privacy) ie MAP =214 = 0525 True positive tp False positive fp Precision P

65 trend detection baselines

In this section we investigate the impact of four different configura-tion choices on the performance of trend detection by way of ablation

[ September 4 2020 at 1017 ndash classicthesis ]

65 trend detection baselines 81

651 Configurations

abstracts vs full texts The underlying data collection containsboth abstracts and full texts12 Our initial expectation was thatthe full text of a document comprises more information thanthe abstract and it should be a more reliable basis for trend de-tection This is one of the configurations we test in the ablationWe employ our proposed method on full text (solely documenttexts without abstracts) and abstract (only abstract texts) collec-tion separately and observe the results

stratification Due to the uneven distribution of documents overthe years the last years (with increased volume of publications)may get too much impact on results compared to earlier yearsTo investigate this we conduct an ablation experiment wherewe compare (i) randomly sampled 160k documents from theentire collection and (ii) stratified sampling In stratified sam-pling we select 10k documents for each of the years 2000 ndash 2015resulting in a sampled collection of 160k

length Lt of interval Clusters are ranked by trendiness and theresulting ranked list is evaluated To measure the trendiness orgrowth of topics over time we fit a line to an interval (ini|i0 6i lt i0 + Lt by linear regression where Lt is the length of theinterval i is one of the 16 years (2000 2015) and ni is thenumber of documents that were assigned to cluster c in thatyear As a simple default baseline we apply the regression to ahalf of an entire interval ie Lt = 8 There are nine such inter-vals in our 16-year period As the final measure of growth forthe cluster we take the maximal of the nine individual slopesTo determine how much this configuration choice affects our re-sults we also test a linear regression over the entire time spanie Lt = 16 In this case there is a single interval

clustering method We consider Gaussian Mixture Models (GMM)and k-means We use GMM with spherical type of covariancewhere each component has the same variance For both weproceed as described in Section 623

652 Experimental Setup and Results

We conducted experiments on the Semantic Scholar corpus and eval-uated a ranked list of trend candidates against the benchmark con-sidering the configurations mentioned above We adopted Mean Av-erage Precision (MAP) as the principal evaluation measure As a sec-ondary evaluation measure we computed recall at 50 (R50) whichestimates the percentage of gold trends found in the list of 50 highest-ranked trend candidates We found that the configuration (0) in Ta-

12Semantic Scholar provided the underlying dataset at the beginning of our workin 2016

[ September 4 2020 at 1017 ndash classicthesis ]

82 benchmark for scientific trend detection

ble 22 works best We then conducted ablation experiments to dis-cover the importance of the configuration choices

(0) (1) (2) (3) (4)

document part A F A A A

stratification yes yes no yes yes

Lt 8 8 8 16 8

clustering GMMsph GMMsph GMMsph GMMsph kmicro

MAP 36 (03) 07 (19) 30 (03) 32 (04) 34 (21)

R50 avg 50 (012) 25 (018) 50 (016) 46 (018) 53(014)

R50 max 61 37 53 61 61

Table 22 Ablation results Standard deviations are in parentheses GMMclustering performed with the spherical (sph) type of covarianceK-means clustering is denoted as kmicro in the ablation table

comparing (0) and (1) 13 we observed that abstracts (A) are amore reliable representation for our trend detection baselinethan full texts (F) The possible explanation for that observa-tion is that an abstract is a summary of a scientific paper thatcovers only the central points and is semantically very conciseand rich In opposite the full text contains numerous parts thatare secondary (ie future work related work) and thus mayskew the documentrsquos overall meaning

comparing (0) and (2) we recognized that stratification (yes) im-proves results compared to the setting with non-stratified (no)randomly sampled data We would expect the effect of stratifi-cation to be even more substantial for collections in which thedistribution of documents over time is more skewed than in our

comparing (0) and (3) the length of the interval is a vital config-uration choice As we assumed the 16 year interval is ratherlong and 8 year interval could be a better choice primarily ifone aims to find short-term trends

comparing (0) and (4) we recognized that topics we obtain fromk-means have a similar nature to those from GMM HoweverGMM that take variance and covariance into account and esti-mate a soft assignment still perform slightly better

66 benchmark description

Below we report the contents of the TRENDNERT benchmark14

1 Two records containing information on documents One filewith the documents assigned to (down)trends and the otherfile with documents assigned to flat topics

13Numbers (0) (1) (2) (3) (4) are the configuration choices in the ablation table14httpsdoiorg1017605OSFIODHZCT

[ September 4 2020 at 1017 ndash classicthesis ]

67 summary 83

2 A script to produce document hash codes

3 A file to map every hash code to internal IDs in the benchmark

4 A script to compute MAP (default setting is ρ = 25)

5 READMEmd a file with guidance notes

Every document in the benchmark contains the following informa-tion

bull Paper ID internal ID of documents assigned to each documentin the original Semantic Scholar collection

bull Cluster ID unique ID of meta-cluster where the document wasobserved

bull LabelTag Name of the gold (down)trend assigned by crowd-workers (eg Cyber Security Fuzzy Sets and Systems)

bull Type of (down)trend candidate trend (T ) downtrend (D) or flattopic (F)

bull Hash ID15 of each paper from the original Semantic Scholarcollection

67 summary

Emerging trend detection is a promising task for Intelligent ProcessAutomation (IPA) that allows companies and funding agencies to au-tomatically analyze extensive textual collections in order to detectemerging fields and forecast the future employing machine learningalgorithms Thus the availability of publicly available benchmarksfor trend detection is of significant importance as only thereby canone evaluate the performance and robustness of the techniques em-ployed

Within this work we release TRENDNERT ndash the first publicly availablebenchmark for (down)trend detection that offers a realistic setting fordeveloping trend detection algorithms TRENDNERT also supportsdowntrend detection ndash a significant problem that was not addressedbefore We also present several experimental findings on trend detec-tion First stratification improves trend detection if the distribution ofdocuments is skewed over time Third abstract-based trend detectionperforms better than full text if straightforward models are used

68 future work

There are possible directions for future work

bull The presented approach to estimate the trendiness of the topicby means of linear regression is straightforward and can be

15MD5-hash of First Author + Title + Year

[ September 4 2020 at 1017 ndash classicthesis ]

84 benchmark for scientific trend detection

replaced by more sophisticated versions like Kendallrsquos τ cor-relation coefficient or the Least Squares Regression Smoother(LOESS) (Cleveland 1979)

bull It also would be challenging to replace the k-means clustering al-gorithm that we adopted within this work with the LDA modelwhile retaining the document representations as it was donein Dieng Ruiz and Blei 2019 The case to investigate wouldbe then whether the topic modeling outperforms the simpleclustering algorithm in this particular setting And would theresulting system benefit from the topic modeling algorithm

bull Furthermore it would be of interest to conduct more researchon the following What kind of document representation canmake effective use of the information in full texts Recentlyproposed contextualised word embeddings could be a possibledirection for these experiments

[ September 4 2020 at 1017 ndash classicthesis ]

7D I S C U S S I O N A N D F U T U R E W O R K

As we observed in the two parts of this thesis Intelligent Process Au-tomation (IPA) is a challenging research area due to its multifaceted-ness and appliance in a real-world scenario Its main objective is toautomatize time-consuming and routine tasks and free up the knowl-edge worker for more cognitively demanding tasks However thistask addresses many difficulties such as a lack of resources for bothtraining and evaluation of algorithms as well as an insufficient per-formance of machine learning techniques when applied to real-worlddata and practical problems In this thesis we investigated two IPA

applications within the Natural Language Processing (NLP) domainndash conversational agents and predictive analytics ndash as well as addressedsome of the most common issues

bull In the first part of the thesis we have addressed the challenge ofimplementing a robust conversational system for IPA purposeswithin the practical e-learning domain considering the condi-tions of missing training data The primary purpose of the re-sulting system is to perform the onboarding process betweenthe tutors and students This onboarding reduces repetitiveand time-consuming activities while allowing tutors to focuson solely mathematical questions which is the core task withthe most impact on students Besides that by interacting withusers the system augments the resources with structured andlabeled training data that we used to implement learnable dia-logue components

The realization of such a system was associated with severalchallenges Among others were missing structured data andambiguous user-generated content that requires a precise lan-guage understanding Also the system that simultaneouslycommunicates with a large number of users has to maintainadditional meta policies such as fallback and check-up policiesto hold the conversation intelligently

bull Moreover we have shown the rule-based dialogue systemrsquos po-tential extension using transfer learning We utilized a smalldataset of structured dialogues obtained in a trial-run of therule-based system and applied the current state-of-the-art Bidi-rectional Encoder Representations from Transformers (BERT) ar-chitecture to solve the domain-specific tasks Specifically weretrained two of three components in the conversational systemndash Named Entity Recognition (NER) and Dialogue Manager (DM)(ie NAP task) ndash and confirmed the applicability of such algo-

85

[ September 4 2020 at 1017 ndash classicthesis ]

86 discussion and future work

rithms in a real-world low-resource scenario by showing thereasonably high performance of resulting models

bull Last but not least in the second part of the thesis we addressedthe issue of missing publicly available benchmarks for the eval-uation of machine learning algorithms for the trend detectiontask To the best of our knowledge we released the first openbenchmark for this purpose which offers a realistic setting fordeveloping trend detection algorithms Besides that this bench-mark supports downtrend detection ndash a significant problemthat was not sufficiently addressed before We additionally pro-posed the evaluation measure and presented several experimen-tal findings on trend detection

The ultimate objective of Intelligent Process Automation (IPA) shouldbe a creation of robust algorithms in low-resource and domain-specificconditions This is still a challenging task however some of the theexperiments presented in this thesis revealed that this goal could bepotentially achieved by means of Transfer Learning (TL) techniques

[ September 4 2020 at 1017 ndash classicthesis ]

B I B L I O G R A P H Y

Agrawal Amritanshu Wei Fu and Tim Menzies (2018) bdquoWhat iswrong with topic modeling And how to fix it using search-basedsoftware engineeringldquo In Information and Software Technology 98pp 74ndash88 (cit on p 66)

Alan M (1950) bdquoTuringldquo In Computing machinery and intelligenceMind 59236 pp 433ndash460 (cit on p 7)

Alghamdi Rubayyi and Khalid Alfalqi (2015) bdquoA survey of topicmodeling in text miningldquo In Int J Adv Comput Sci Appl(IJACSA)61 (cit on pp 64ndash66)

Ammar Waleed Dirk Groeneveld Chandra Bhagavatula Iz BeltagyMiles Crawford Doug Downey Jason Dunkelberger Ahmed El-gohary Sergey Feldman Vu Ha et al (2018) bdquoConstruction ofthe literature graph in semantic scholarldquo In arXiv preprint arXiv1805 02262 (cit on p 72)

Ankerholz Amber (2016) 2016 Future of Open Source Survey Says OpenSource Is The Modern Architecture httpswwwlinuxcomnews2016-future-open-source-survey-says-open-source-modern-

architecture (cit on p 78)Asooja Kartik Georgeta Bordea Gabriela Vulcu and Paul Buitelaar

(2016) bdquoForecasting emerging trends from scientific literatureldquoIn Proceedings of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) pp 417ndash420 (cit on p 68)

Augustin Jason (2016) Emergin Science and Technology Trends A Syn-thesis of Leading Forecast httpwwwdefenseinnovationmarket-placemilresources2016_SciTechReport_16June2016pdf (citon p 78)

Blei David M and John D Lafferty (2006) bdquoDynamic topic modelsldquoIn Proceedings of the 23rd international conference on Machine learn-ing ACM pp 113ndash120 (cit on pp 66 67)

Blei David M John D Lafferty et al (2007) bdquoA correlated topic modelof scienceldquo In The Annals of Applied Statistics 11 pp 17ndash35 (citon p 66)

Blei David M Andrew Y Ng and Michael I Jordan (2003) bdquoLatentdirichlet allocationldquo In Journal of machine Learning research 3Janpp 993ndash1022 (cit on p 66)

Bloem Peter (2019) Transformers from Scratch httpwwwpeterbloemnlblogtransformers (cit on p 45)

Bocklisch Tom Joey Faulkner Nick Pawlowski and Alan Nichol(2017) bdquoRasa Open source language understanding and dialoguemanagementldquo In arXiv preprint arXiv171205181 (cit on p 15)

Bolelli Levent Seyda Ertekin and C Giles (2009) bdquoTopic and trenddetection in text collections using latent dirichlet allocationldquo InAdvances in Information Retrieval pp 776ndash780 (cit on p 66)

87

[ September 4 2020 at 1017 ndash classicthesis ]

88 Bibliography

Bolelli Levent Seyda Ertekin Ding Zhou and C Lee Giles (2009)bdquoFinding topic trends in digital librariesldquo In Proceedings of the 9thACMIEEE-CS joint conference on Digital libraries ACM pp 69ndash72

(cit on p 66)Bordes Antoine Y-Lan Boureau and Jason Weston (2016) bdquoLearning

end-to-end goal-oriented dialogldquo In arXiv preprint arXiv160507683(cit on pp 16 17)

Brooks Chuck (2016) 7 Top Tech Trends Impacting Innovators in 2016httpinnovationexcellencecomblog201512267-top-

tech-trends-impacting-innovators-in-2016 (cit on p 78)Burtsev Mikhail Alexander Seliverstov Rafael Airapetyan Mikhail

Arkhipov Dilyara Baymurzina Eugene Botvinovsky NickolayBushkov Olga Gureenkova Andrey Kamenev Vasily Konovalovet al (2018) bdquoDeepPavlov An Open Source Library for Conver-sational AIldquo In (cit on p 15)

Chang Jonathan David M Blei et al (2010) bdquoHierarchical relationalmodels for document networksldquo In The Annals of Applied Statis-tics 41 pp 124ndash150 (cit on p 66)

Chen Hongshen Xiaorui Liu Dawei Yin and Jiliang Tang (2017)bdquoA survey on dialogue systems Recent advances and new fron-tiersldquo In Acm Sigkdd Explorations Newsletter 192 pp 25ndash35 (citon pp 14ndash16)

Chen Lingzhen Alessandro Moschitti Giuseppe Castellucci AndreaFavalli and Raniero Romagnoli (2018) bdquoTransfer Learning for In-dustrial Applications of Named Entity Recognitionldquo In NL4AIAI IA pp 129ndash140 (cit on p 43)

Cleveland William S (1979) bdquoRobust locally weighted regression andsmoothing scatterplotsldquo In Journal of the American statistical asso-ciation 74368 pp 829ndash836 (cit on p 84)

Collobert Ronan Jason Weston Leacuteon Bottou Michael Karlen KorayKavukcuoglu and Pavel Kuksa (2011) bdquoNatural language pro-cessing (almost) from scratchldquo In Journal of machine learning re-search 12Aug pp 2493ndash2537 (cit on p 16)

Cuayaacutehuitl Heriberto Simon Keizer and Oliver Lemon (2015) bdquoStrate-gic dialogue management via deep reinforcement learningldquo InarXiv preprint arXiv151108099 (cit on p 14)

Cui Lei Shaohan Huang Furu Wei Chuanqi Tan Chaoqun Duanand Ming Zhou (2017) bdquoSuperagent A customer service chat-bot for e-commerce websitesldquo In Proceedings of ACL 2017 SystemDemonstrations pp 97ndash102 (cit on p 8)

Curiskis Stephan A Barry Drake Thomas R Osborn and Paul JKennedy (2019) bdquoAn evaluation of document clustering and topicmodelling in two online social networks Twitter and Redditldquo InInformation Processing amp Management (cit on p 74)

Dai Zihang Zhilin Yang Yiming Yang William W Cohen Jaime Car-bonell Quoc V Le and Ruslan Salakhutdinov (2019) bdquoTransformer-xl Attentive language models beyond a fixed-length contextldquo InarXiv preprint arXiv190102860 (cit on pp 43 44)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 89

Daniel Ben (2015) bdquoBig Data and analytics in higher education Op-portunities and challengesldquo In British journal of educational tech-nology 465 pp 904ndash920 (cit on p 22)

Deerwester Scott Susan T Dumais George W Furnas Thomas K Lan-dauer and Richard Harshman (1990) bdquoIndexing by latent seman-tic analysisldquo In Journal of the American society for information sci-ence 416 pp 391ndash407 (cit on p 65)

Deoras Anoop Kaisheng Yao Xiaodong He Li Deng Geoffrey Ger-son Zweig Ruhi Sarikaya Dong Yu Mei-Yuh Hwang and Gre-goire Mesnil (2015) Assignment of semantic labels to a sequence ofwords using neural network architectures US Patent App 14016186

(cit on p 13)Devlin Jacob Ming-Wei Chang Kenton Lee and Kristina Toutanova

(2018) bdquoBert Pre-training of deep bidirectional transformers forlanguage understandingldquo In arXiv preprint arXiv181004805 (citon pp 42ndash45)

Dhingra Bhuwan Lihong Li Xiujun Li Jianfeng Gao Yun-NungChen Faisal Ahmed and Li Deng (2016) bdquoTowards end-to-end re-inforcement learning of dialogue agents for information accessldquoIn arXiv preprint arXiv160900777 (cit on p 16)

Dieng Adji B Francisco JR Ruiz and David M Blei (2019) bdquoTopicModeling in Embedding Spacesldquo In arXiv preprint arXiv190704907(cit on pp 74 84)

Donahue Jeff Yangqing Jia Oriol Vinyals Judy Hoffman Ning ZhangEric Tzeng and Trevor Darrell (2014) bdquoDecaf A deep convolu-tional activation feature for generic visual recognitionldquo In Inter-national conference on machine learning pp 647ndash655 (cit on p 43)

Eric Mihail and Christopher D Manning (2017) bdquoKey-value retrievalnetworks for task-oriented dialogueldquo In arXiv preprint arXiv 170505414 (cit on p 16)

Fang Hao Hao Cheng Maarten Sap Elizabeth Clark Ari HoltzmanYejin Choi Noah A Smith and Mari Ostendorf (2018) bdquoSoundingboard A user-centric and content-driven social chatbotldquo In arXivpreprint arXiv180410202 (cit on p 10)

Friedrich Fabian Jan Mendling and Frank Puhlmann (2011) bdquoPro-cess model generation from natural language textldquo In Interna-tional Conference on Advanced Information Systems Engineering pp 482

ndash496 (cit on p 2)Frot Mathilde (2016) 5 Trends in Computer Science Research https

wwwtopuniversitiescomcoursescomputer-science-info-

rmation-systems5-trends-computer-science-research (citon p 78)

Glasmachers Tobias (2017) bdquoLimits of end-to-end learningldquo In arXivpreprint arXiv170408305 (cit on pp 16 17)

Goddeau David Helen Meng Joseph Polifroni Stephanie Seneffand Senis Busayapongchai (1996) bdquoA form-based dialogue man-ager for spoken language applicationsldquo In Proceeding of FourthInternational Conference on Spoken Language Processing ICSLPrsquo96Vol 2 IEEE pp 701ndash704 (cit on p 14)

[ September 4 2020 at 1017 ndash classicthesis ]

90 Bibliography

Gohr Andre Alexander Hinneburg Rene Schult and Myra Spiliopou-lou (2009) bdquoTopic evolution in a stream of documentsldquo In Pro-ceedings of the 2009 SIAM International Conference on Data MiningSIAM pp 859ndash870 (cit on p 65)

Gollapalli Sujatha Das and Xiaoli Li (2015) bdquoEMNLP versus ACLAnalyzing NLP research over timeldquo In EMNLP pp 2002ndash2006

(cit on p 72)Griffiths Thomas L and Mark Steyvers (2004) bdquoFinding scientific top-

icsldquo In Proceedings of the National academy of Sciences 101suppl 1pp 5228ndash5235 (cit on p 65)

Griffiths Thomas L Michael I Jordan Joshua B Tenenbaum andDavid M Blei (2004) bdquoHierarchical topic models and the nestedChinese restaurant processldquo In Advances in neural information pro-cessing systems pp 17ndash24 (cit on p 66)

Hall David Daniel Jurafsky and Christopher D Manning (2008) bdquoStu-dying the history of ideas using topic modelsldquo In Proceedingsof the conference on empirical methods in natural language processingAssociation for Computational Linguistics pp 363ndash371 (cit onp 66)

Harriet Taylor (2015) Privacy will hit tipping point in 2016 httpwwwcnbccom20151109privacy-will-hit-tipping-point-

in-2016html (cit on p 78)He Jiangen and Chaomei Chen (2018) bdquoPredictive Effects of Novelty

Measured by Temporal Embeddings on the Growth of ScientificLiteratureldquo In Frontiers in Research Metrics and Analytics 3 p 9

(cit on pp 64 68 71)He Junxian Zhiting Hu Taylor Berg-Kirkpatrick Ying Huang and

Eric P Xing (2017) bdquoEfficient correlated topic modeling with topicembeddingldquo In Proceedings of the 23rd ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining ACM pp 225ndash233 (cit on pp 66 67)

He Qi Bi Chen Jian Pei Baojun Qiu Prasenjit Mitra and Lee Giles(2009) bdquoDetecting topic evolution in scientific literature How cancitations helpldquo In Proceedings of the 18th ACM conference on In-formation and knowledge management ACM pp 957ndash966 (cit onp 68)

Henderson Matthew Blaise Thomson and Jason D Williams (2014)bdquoThe second dialog state tracking challengeldquo In Proceedings of the15th Annual Meeting of the Special Interest Group on Discourse andDialogue (SIGDIAL) pp 263ndash272 (cit on p 15)

Henderson Matthew Blaise Thomson and Steve Young (2013) bdquoDeepneural network approach for the dialog state tracking challengeldquoIn Proceedings of the SIGDIAL 2013 Conference pp 467ndash471 (cit onp 14)

Hess Ann Hari Iyer and William Malm (2001) bdquoLinear trend analy-sis a comparison of methodsldquo In Atmospheric Environment 3530Visibility Aerosol and Atmospheric Optics pp 5211 ndash5222 issn1352-2310 doi httpsdoiorg101016S1352- 2310(01)00342-9 url httpwwwsciencedirectcomsciencearticlepiiS1352231001003429 (cit on p 75)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 91

Hofmann Thomas (1999) bdquoProbabilistic latent semantic analysisldquo InProceedings of the Fifteenth conference on Uncertainty in artificial in-telligence Morgan Kaufmann Publishers Inc pp 289ndash296 (cit onp 65)

Honnibal Matthew and Ines Montani (2017) bdquospacy 2 Natural lan-guage understanding with bloom embeddings convolutional neu-ral networks and incremental parsingldquo In To appear 7 (cit onp 15)

Howard Jeremy and Sebastian Ruder (2018) bdquoUniversal languagemodel fine-tuning for text classificationldquo In arXiv preprint arXiv180106146 (cit on p 44)

IEEE Computer Society (2016) Top 9 Computing Technology Trends for2016 httpswwwscientificcomputingcomnews201601top-9-computing-technology-trends-2016 (cit on p 78)

Ivancic Lucija Dalia Suša Vugec and Vesna Bosilj Vukšic (2019) bdquoRobo-tic Process Automation Systematic Literature Reviewldquo In In-ternational Conference on Business Process Management Springerpp 280ndash295 (cit on pp 1 2)

Jain Mohit Pratyush Kumar Ramachandra Kota and Shwetak NPatel (2018) bdquoEvaluating and informing the design of chatbotsldquoIn Proceedings of the 2018 Designing Interactive Systems ConferenceACM pp 895ndash906 (cit on p 8)

Khanpour Hamed Nishitha Guntakandla and Rodney Nielsen (2016)bdquoDialogue act classification in domain-independent conversationsusing a deep recurrent neural networkldquo In Proceedings of COL-ING 2016 the 26th International Conference on Computational Lin-guistics Technical Papers pp 2012ndash2021 (cit on p 13)

Kim Young-Bum Karl Stratos Ruhi Sarikaya and Minwoo Jeong(2015) bdquoNew transfer learning techniques for disparate label setsldquoIn Proceedings of the 53rd Annual Meeting of the Association for Com-putational Linguistics and the 7th International Joint Conference onNatural Language Processing (Volume 1 Long Papers) pp 473ndash482

(cit on p 43)Kolhatkar Varada Heike Zinsmeister and Graeme Hirst (2013) bdquoAn-

notating anaphoric shell nouns with their antecedentsldquo In Pro-ceedings of the 7th Linguistic Annotation Workshop and Interoperabil-ity with Discourse pp 112ndash121 (cit on pp 77 78)

Kontostathis April Leon M Galitsky William M Pottenger SomaRoy and Daniel J Phelps (2004) bdquoA survey of emerging trend de-tection in textual data miningldquo In Survey of text mining Springerpp 185ndash224 (cit on p 63)

Krippendorff Klaus (2011) bdquoComputing Krippendorffrsquos alpha- relia-bilityldquo In Annenberg School for Communication (ASC) DepartmentalPapers 43 (cit on p 78)

Kurata Gakuto Bing Xiang Bowen Zhou and Mo Yu (2016) bdquoLever-aging sentence-level information with encoder lstm for semanticslot fillingldquo In arXiv preprint arXiv160101530 (cit on p 13)

Landis J Richard and Gary G Koch (1977) bdquoThe measurement ofobserver agreement for categorical dataldquo In biometrics pp 159ndash174 (cit on p 78)

[ September 4 2020 at 1017 ndash classicthesis ]

92 Bibliography

Le Minh-Hoang Tu-Bao Ho and Yoshiteru Nakamori (2005) bdquoDe-tecting emerging trends from scientific corporaldquo In InternationalJournal of Knowledge and Systems Sciences 22 pp 53ndash59 (cit onp 68)

Le Quoc V and Tomas Mikolov (2014) bdquoDistributed Representationsof Sentences and Documentsldquo In ICML Vol 14 pp 1188ndash1196

(cit on p 74)Lee Ji Young and Franck Dernoncourt (2016) bdquoSequential short-text

classification with recurrent and convolutional neural networksldquoIn arXiv preprint arXiv160303827 (cit on p 13)

Leopold Henrik Han van der Aa and Hajo A Reijers (2018) bdquoIden-tifying candidate tasks for robotic process automation in textualprocess descriptionsldquo In Enterprise Business-Process and Informa-tion Systems Modeling Springer pp 67ndash81 (cit on p 2)

Levenshtein Vladimir I (1966) bdquoBinary codes capable of correctingdeletions insertions and reversalsldquo In Soviet physics doklady 8pp 707ndash710 (cit on p 25)

Liao Vera Masud Hussain Praveen Chandar Matthew Davis Yasa-man Khazaeni Marco Patricio Crasso Dakuo Wang Michael Mu-ller N Sadat Shami Werner Geyer et al (2018) bdquoAll Work andNo Playldquo In Proceedings of the 2018 CHI Conference on HumanFactors in Computing Systems ACM p 3 (cit on p 8)

Lowe Ryan Thomas Nissan Pow Iulian Vlad Serban Laurent Char-lin Chia-Wei Liu and Joelle Pineau (2017) bdquoTraining end-to-enddialogue systems with the ubuntu dialogue corpusldquo In Dialogueamp Discourse 81 pp 31ndash65 (cit on p 22)

Luger Ewa and Abigail Sellen (2016) bdquoLike having a really bad PAthe gulf between user expectation and experience of conversa-tional agentsldquo In Proceedings of the 2016 CHI Conference on HumanFactors in Computing Systems ACM pp 5286ndash5297 (cit on p 8)

MacQueen James (1967) bdquoSome methods for classification and analy-sis of multivariate observationsldquo In Proceedings of the fifth Berkeleysymposium on mathematical statistics and probability Vol 1 OaklandCA USA pp 281ndash297 (cit on p 74)

Manning Christopher D Prabhakar Raghavan and Hinrich Schuumltze(2008) Introduction to Information Retrieval Web publication at in-formationretrievalorg Cambridge University Press (cit on p 74)

Markov Igor (2015) 13 Of 2015rsquos Hottest Topics In Computer ScienceResearch httpswwwforbescomsitesquora2015042213-of-2015s-hottest-topics-in-computer-science-research

fb0d46b1e88d (cit on p 78)Martin James H and Daniel Jurafsky (2009) Speech and language pro-

cessing An introduction to natural language processing computationallinguistics and speech recognition PearsonPrentice Hall Upper Sad-dle River

Mei Qiaozhu Deng Cai Duo Zhang and ChengXiang Zhai (2008)bdquoTopic modeling with network regularizationldquo In Proceedings ofthe 17th international conference on World Wide Web ACM pp 101ndash110 (cit on p 65)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 93

Meyerson Bernard and DiChristina Mariette (2016) These are the top10 emerging technologies of 2016 httpwww3weforumorgdocsGAC16_Top10_Emerging_Technologies_2016_reportpdf (cit onp 78)

Mikolov Tomas Kai Chen Greg Corrado and Jeffrey Dean (2013)bdquoEfficient estimation of word representations in vector spaceldquo InarXiv preprint arXiv13013781 (cit on p 74)

Miller Alexander Adam Fisch Jesse Dodge Amir-Hossein KarimiAntoine Bordes and Jason Weston (2016) bdquoKey-value memorynetworks for directly reading documentsldquo In (cit on p 16)

Mohanty Soumendra and Sachin Vyas (2018) bdquoHow to Compete inthe Age of Artificial Intelligenceldquo In (cit on pp III V 1 2)

Moiseeva Alena and Hinrich Schuumltze (2020) bdquoTRENDNERT A Bench-mark for Trend and Downtrend Detection in a Scientific DomainldquoIn Proceedings of the Thirty-Fourth AAAI Conference on Artificial In-telligence (cit on p 71)

Moiseeva Alena Dietrich Trautmann Michael Heimann and Hin-rich Schuumltze (2020) bdquoMultipurpose Intelligent Process Automa-tion via Conversational Assistantldquo In arXiv preprint arXiv200102284 (cit on pp 21 41)

Monaghan Fergal Georgeta Bordea Krystian Samp and Paul Buite-laar (2010) bdquoExploring your research Sprinkling some saffron onsemantic web dog foodldquo In Semantic Web Challenge at the Inter-national Semantic Web Conference Vol 117 Citeseer pp 420ndash435

(cit on p 68)Monostori Laacuteszloacute Andraacutes Maacuterkus Hendrik Van Brussel and E West-

kaumlmpfer (1996) bdquoMachine learning approaches to manufactur-ingldquo In CIRP annals 452 pp 675ndash712 (cit on p 15)

Moody Christopher E (2016) bdquoMixing dirichlet topic models andword embeddings to make lda2vecldquo In arXiv preprint arXiv160502019 (cit on p 74)

Mou Lili Zhao Meng Rui Yan Ge Li Yan Xu Lu Zhang and ZhiJin (2016) bdquoHow transferable are neural networks in nlp applica-tionsldquo In arXiv preprint arXiv160306111 (cit on p 43)

Mrkšic Nikola Diarmuid O Seacuteaghdha Tsung-Hsien Wen Blaise Thom-son and Steve Young (2016) bdquoNeural belief tracker Data-drivendialogue state trackingldquo In arXiv preprint arXiv160603777 (citon p 14)

Nessma Joussef (2015) Top 10 Hottest Research Topics in Computer Sci-ence http www pouted com top - 10 - hottest - research -

topics-in-computer-science (cit on p 78)Nie Binling and Shouqian Sun (2017) bdquoUsing text mining techniques

to identify research trends A case study of design researchldquo InApplied Sciences 74 p 401 (cit on p 68)

Pan Sinno Jialin and Qiang Yang (2009) bdquoA survey on transfer learn-ingldquo In IEEE Transactions on knowledge and data engineering 2210pp 1345ndash1359 (cit on p 41)

Qu Lizhen Gabriela Ferraro Liyuan Zhou Weiwei Hou and Tim-othy Baldwin (2016) bdquoNamed entity recognition for novel types

[ September 4 2020 at 1017 ndash classicthesis ]

94 Bibliography

by transfer learningldquo In arXiv preprint arXiv161009914 (cit onp 43)

Radford Alec Jeffrey Wu Rewon Child David Luan Dario Amodeiand Ilya Sutskever (2019) bdquoLanguage models are unsupervisedmultitask learnersldquo In OpenAI Blog 18 (cit on p 44)

Reese Hope (2015) 7 trends for artificial intelligence in 2016 rsquoLike 2015on steroidsrsquo httpwwwtechrepubliccomarticle7-trends-for-artificial-intelligence-in-2016-like-2015-on-steroids

(cit on p 78)Rivera Maricel (2016) Is Digital Game-Based Learning The Future Of

Learning httpselearningindustrycomdigital-game-based-

learning-future (cit on p 78)Rotolo Daniele Diana Hicks and Ben R Martin (2015) bdquoWhat is an

emerging technologyldquo In Research Policy 4410 pp 1827ndash1843

(cit on p 64)Salatino Angelo (2015) bdquoEarly detection and forecasting of research

trendsldquo In (cit on pp 63 66)Serban Iulian V Alessandro Sordoni Yoshua Bengio Aaron Courville

and Joelle Pineau (2016) bdquoBuilding end-to-end dialogue systemsusing generative hierarchical neural network modelsldquo In Thirti-eth AAAI Conference on Artificial Intelligence (cit on p 11)

Sharif Razavian Ali Hossein Azizpour Josephine Sullivan and Ste-fan Carlsson (2014) bdquoCNN features off-the-shelf an astoundingbaseline for recognitionldquo In Proceedings of the IEEE conference oncomputer vision and pattern recognition workshops pp 806ndash813 (citon p 43)

Shibata Naoki Yuya Kajikawa and Yoshiyuki Takeda (2009) bdquoCom-parative study on methods of detecting research fronts using dif-ferent types of citationldquo In Journal of the American Society for in-formation Science and Technology 603 pp 571ndash580 (cit on p 68)

Shibata Naoki Yuya Kajikawa Yoshiyuki Takeda and KatsumoriMatsushima (2008) bdquoDetecting emerging research fronts basedon topological measures in citation networks of scientific publica-tionsldquo In Technovation 2811 pp 758ndash775 (cit on p 68)

Small Henry (2006) bdquoTracking and predicting growth areas in sci-enceldquo In Scientometrics 683 pp 595ndash610 (cit on p 68)

Small Henry Kevin W Boyack and Richard Klavans (2014) bdquoIden-tifying emerging topics in science and technologyldquo In ResearchPolicy 438 pp 1450ndash1467 (cit on p 64)

Su Pei-Hao Nikola Mrkšic Intildeigo Casanueva and Ivan Vulic (2018)bdquoDeep Learning for Conversational AIldquo In Proceedings of the 2018Conference of the North American Chapter of the Association for Com-putational Linguistics Tutorial Abstracts pp 27ndash32 (cit on p 12)

Sutskever Ilya Oriol Vinyals and Quoc V Le (2014) bdquoSequence tosequence learning with neural networksldquo In Advances in neuralinformation processing systems pp 3104ndash3112 (cit on pp 16 17)

Trautmann Dietrich Johannes Daxenberger Christian Stab HinrichSchuumltze and Iryna Gurevych (2019) bdquoRobust Argument Unit Recog-nition and Classificationldquo In arXiv preprint arXiv190409688 (citon p 41)

[ September 4 2020 at 1017 ndash classicthesis ]

Bibliography 95

Tu Yi-Ning and Jia-Lang Seng (2012) bdquoIndices of novelty for emerg-ing topic detectionldquo In Information processing amp management 482pp 303ndash325 (cit on p 64)

Turing Alan (1949) London Times letter to the editor (cit on p 7)Ultes Stefan Lina Barahona Pei-Hao Su David Vandyke Dongho

Kim Inigo Casanueva Paweł Budzianowski Nikola Mrkšic Tsu-ng-Hsien Wen et al (2017) bdquoPydial A multi-domain statistical di-alogue system toolkitldquo In Proceedings of ACL 2017 System Demon-strations pp 73ndash78 (cit on p 15)

Vaswani Ashish Noam Shazeer Niki Parmar Jakob Uszkoreit LlionJones Aidan N Gomez Łukasz Kaiser and Illia Polosukhin (2017)bdquoAttention is all you needldquo In Advances in neural information pro-cessing systems pp 5998ndash6008 (cit on p 45)

Vincent Pascal Hugo Larochelle Yoshua Bengio and Pierre-AntoineManzagol (2008) bdquoExtracting and composing robust features withdenoising autoencodersldquo In Proceedings of the 25th internationalconference on Machine learning ACM pp 1096ndash1103 (cit on p 44)

Vodolaacuten Miroslav Rudolf Kadlec and Jan Kleindienst (2015) bdquoHy-brid dialog state trackerldquo In arXiv preprint arXiv151003710 (citon p 14)

Wang Alex Amanpreet Singh Julian Michael Felix Hill Omer Levyand Samuel R Bowman (2018) bdquoGlue A multi-task benchmarkand analysis platform for natural language understandingldquo InarXiv preprint arXiv180407461 (cit on p 44)

Wang Chong and David M Blei (2011) bdquoCollaborative topic modelingfor recommending scientific articlesldquo In Proceedings of the 17thACM SIGKDD international conference on Knowledge discovery anddata mining ACM pp 448ndash456 (cit on p 66)

Wang Xuerui and Andrew McCallum (2006) bdquoTopics over time anon-Markov continuous-time model of topical trendsldquo In Pro-ceedings of the 12th ACM SIGKDD international conference on Knowl-edge discovery and data mining ACM pp 424ndash433 (cit on pp 65ndash67)

Wang Zhuoran and Oliver Lemon (2013) bdquoA simple and generic be-lief tracking mechanism for the dialog state tracking challengeOn the believability of observed informationldquo In Proceedings ofthe SIGDIAL 2013 Conference pp 423ndash432 (cit on p 14)

Weizenbaum Joseph et al (1966) bdquoELIZAmdasha computer program forthe study of natural language communication between man andmachineldquo In Communications of the ACM 91 pp 36ndash45 (cit onp 7)

Wen Tsung-Hsien Milica Gasic Nikola Mrksic Pei-Hao Su DavidVandyke and Steve Young (2015) bdquoSemantically conditioned lstm-based natural language generation for spoken dialogue systemsldquoIn arXiv preprint arXiv150801745 (cit on p 14)

Wen Tsung-Hsien David Vandyke Nikola Mrksic Milica Gasic LinaM Rojas-Barahona Pei-Hao Su Stefan Ultes and Steve Young(2016) bdquoA network-based end-to-end trainable task-oriented dia-logue systemldquo In arXiv preprint arXiv160404562 (cit on pp 1116 17 22)

[ September 4 2020 at 1017 ndash classicthesis ]

96 Bibliography

Werner Alexander Dietrich Trautmann Dongheui Lee and RobertoLampariello (2015) bdquoGeneralization of optimal motion trajecto-ries for bipedal walkingldquo In pp 1571ndash1577 (cit on p 41)

Williams Jason D Kavosh Asadi and Geoffrey Zweig (2017) bdquoHybridcode networks practical and efficient end-to-end dialog controlwith supervised and reinforcement learningldquo In arXiv preprintarXiv170203274 (cit on p 17)

Williams Jason Antoine Raux and Matthew Henderson (2016) bdquoThedialog state tracking challenge series A reviewldquo In Dialogue ampDiscourse 73 pp 4ndash33 (cit on p 12)

Wu Chien-Sheng Richard Socher and Caiming Xiong (2019) bdquoGlobal-to-local Memory Pointer Networks for Task-Oriented DialogueldquoIn arXiv preprint arXiv190104713 (cit on pp 15ndash17)

Wu Yonghui Mike Schuster Zhifeng Chen Quoc V Le Moham-mad Norouzi Wolfgang Macherey Maxim Krikun Yuan CaoQin Gao Klaus Macherey et al (2016) bdquoGooglersquos neural ma-chine translation system Bridging the gap between human andmachine translationldquo In arXiv preprint arXiv160908144 (cit onp 45)

Xie Pengtao and Eric P Xing (2013) bdquoIntegrating Document Cluster-ing and Topic Modelingldquo In Proceedings of the 30th Conference onUncertainty in Artificial Intelligence (cit on p 74)

Yan Zhao Nan Duan Peng Chen Ming Zhou Jianshe Zhou andZhoujun Li (2017) bdquoBuilding task-oriented dialogue systems foronline shoppingldquo In Thirty-First AAAI Conference on Artificial In-telligence (cit on p 14)

Yogatama Dani Michael Heilman Brendan OrsquoConnor Chris DyerBryan R Routledge and Noah A Smith (2011) bdquoPredicting a scien-tific communityrsquos response to an articleldquo In Proceedings of the Con-ference on Empirical Methods in Natural Language Processing Asso-ciation for Computational Linguistics pp 594ndash604 (cit on p 65)

Young Steve Milica Gašic Blaise Thomson and Jason D Williams(2013) bdquoPomdp-based statistical spoken dialog systems A reviewldquoIn Proceedings of the IEEE 1015 pp 1160ndash1179 (cit on p 17)

Zaino Jennifer (2016) 2016 Trends for Semantic Web and Semantic Tech-nologies http www dataversity net 2017 - predictions -

semantic-web-semantic-technologies (cit on p 78)Zhou Hao Minlie Huang et al (2016) bdquoContext-aware natural lan-

guage generation for spoken dialogue systemsldquo In Proceedingsof COLING 2016 the 26th International Conference on ComputationalLinguistics Technical Papers pp 2032ndash2041 (cit on pp 14 15)

Zhu Yukun Ryan Kiros Rich Zemel Ruslan Salakhutdinov RaquelUrtasun Antonio Torralba and Sanja Fidler (2015) bdquoAligningbooks and movies Towards story-like visual explanations by watch-ing movies and reading booksldquo In Proceedings of the IEEE interna-tional conference on computer vision pp 19ndash27 (cit on p 44)

[ September 4 2020 at 1017 ndash classicthesis ]

  • Abstract
  • Abstract
  • Acknowledgements
  • Contents
  • List of Figures
    • List of Figures
      • List of Tables
        • List of Tables
          • Acronyms
            • Acronyms
              • 1 Introduction
                • 11 Outline
                  • E-learning Conversational Assistant
                    • 2 Foundations
                      • 21 Introduction
                      • 22 Conversational Assistants
                        • 221 Taxonomy
                        • 222 Conventional Architecture
                        • 223 Existing Approaches
                          • 23 Target Domain amp Task Definition
                          • 24 Summary
                            • 3 Multipurpose IPA via Conversational Assistant
                              • 31 Introduction
                                • 311 Outline and Contributions
                                  • 32 Model
                                    • 321 OMB+ Design
                                    • 322 Preprocessing
                                    • 323 Natural Language Understanding
                                    • 324 Dialogue Manager
                                    • 325 Meta Policy
                                    • 326 Response Generation
                                      • 33 Evaluation
                                        • 331 Automated Evaluation
                                        • 332 Human Evaluation amp Error Analysis
                                          • 34 Structured Dialogue Acquisition
                                          • 35 Summary
                                          • 36 Future Work
                                            • 4 Deep Learning Re-Implementation of Units
                                              • 41 Introduction
                                                • 411 Outline and Contributions
                                                  • 42 Related Work
                                                  • 43 BERT Bidirectional Encoder Representations from Transformers
                                                  • 44 Data amp Descriptive Statistics
                                                  • 45 Named Entity Recognition
                                                    • 451 Model Settings
                                                      • 46 Next Action Prediction
                                                        • 461 Model Settings
                                                          • 47 Evaluation and Results
                                                          • 48 Error Analysis
                                                          • 49 Summary
                                                          • 410 Future Work
                                                              • Clustering-based Trend Analyzer
                                                                • 5 Foundations
                                                                  • 51 Motivation and Challenges
                                                                  • 52 Detection of Emerging Research Trends
                                                                    • 521 Methods for Topic Modeling
                                                                    • 522 Topic Evolution of Scientific Publications
                                                                      • 53 Summary
                                                                        • 6 Benchmark for Scientific Trend Detection
                                                                          • 61 Motivation
                                                                            • 611 Outline and Contributions
                                                                              • 62 Corpus Creation
                                                                                • 621 Underlying Data
                                                                                • 622 Stratification
                                                                                • 623 Document representations amp Clustering
                                                                                • 624 Trend amp Downtrend Estimation
                                                                                • 625 (Down)trend Candidates Validation
                                                                                  • 63 Benchmark Annotation
                                                                                    • 631 Inter-annotator Agreement
                                                                                    • 632 Gold Trends
                                                                                      • 64 Evaluation Measure for Trend Detection
                                                                                      • 65 Trend Detection Baselines
                                                                                        • 651 Configurations
                                                                                        • 652 Experimental Setup and Results
                                                                                          • 66 Benchmark Description
                                                                                          • 67 Summary
                                                                                          • 68 Future Work
                                                                                            • 7 Discussion and Future Work
                                                                                            • Bibliography
Page 11: Statistical Natural Language Processing Methods for
Page 12: Statistical Natural Language Processing Methods for
Page 13: Statistical Natural Language Processing Methods for
Page 14: Statistical Natural Language Processing Methods for
Page 15: Statistical Natural Language Processing Methods for
Page 16: Statistical Natural Language Processing Methods for
Page 17: Statistical Natural Language Processing Methods for
Page 18: Statistical Natural Language Processing Methods for
Page 19: Statistical Natural Language Processing Methods for
Page 20: Statistical Natural Language Processing Methods for
Page 21: Statistical Natural Language Processing Methods for
Page 22: Statistical Natural Language Processing Methods for
Page 23: Statistical Natural Language Processing Methods for
Page 24: Statistical Natural Language Processing Methods for
Page 25: Statistical Natural Language Processing Methods for
Page 26: Statistical Natural Language Processing Methods for
Page 27: Statistical Natural Language Processing Methods for
Page 28: Statistical Natural Language Processing Methods for
Page 29: Statistical Natural Language Processing Methods for
Page 30: Statistical Natural Language Processing Methods for
Page 31: Statistical Natural Language Processing Methods for
Page 32: Statistical Natural Language Processing Methods for
Page 33: Statistical Natural Language Processing Methods for
Page 34: Statistical Natural Language Processing Methods for
Page 35: Statistical Natural Language Processing Methods for
Page 36: Statistical Natural Language Processing Methods for
Page 37: Statistical Natural Language Processing Methods for
Page 38: Statistical Natural Language Processing Methods for
Page 39: Statistical Natural Language Processing Methods for
Page 40: Statistical Natural Language Processing Methods for
Page 41: Statistical Natural Language Processing Methods for
Page 42: Statistical Natural Language Processing Methods for
Page 43: Statistical Natural Language Processing Methods for
Page 44: Statistical Natural Language Processing Methods for
Page 45: Statistical Natural Language Processing Methods for
Page 46: Statistical Natural Language Processing Methods for
Page 47: Statistical Natural Language Processing Methods for
Page 48: Statistical Natural Language Processing Methods for
Page 49: Statistical Natural Language Processing Methods for
Page 50: Statistical Natural Language Processing Methods for
Page 51: Statistical Natural Language Processing Methods for
Page 52: Statistical Natural Language Processing Methods for
Page 53: Statistical Natural Language Processing Methods for
Page 54: Statistical Natural Language Processing Methods for
Page 55: Statistical Natural Language Processing Methods for
Page 56: Statistical Natural Language Processing Methods for
Page 57: Statistical Natural Language Processing Methods for
Page 58: Statistical Natural Language Processing Methods for
Page 59: Statistical Natural Language Processing Methods for
Page 60: Statistical Natural Language Processing Methods for
Page 61: Statistical Natural Language Processing Methods for
Page 62: Statistical Natural Language Processing Methods for
Page 63: Statistical Natural Language Processing Methods for
Page 64: Statistical Natural Language Processing Methods for
Page 65: Statistical Natural Language Processing Methods for
Page 66: Statistical Natural Language Processing Methods for
Page 67: Statistical Natural Language Processing Methods for
Page 68: Statistical Natural Language Processing Methods for
Page 69: Statistical Natural Language Processing Methods for
Page 70: Statistical Natural Language Processing Methods for
Page 71: Statistical Natural Language Processing Methods for
Page 72: Statistical Natural Language Processing Methods for
Page 73: Statistical Natural Language Processing Methods for
Page 74: Statistical Natural Language Processing Methods for
Page 75: Statistical Natural Language Processing Methods for
Page 76: Statistical Natural Language Processing Methods for
Page 77: Statistical Natural Language Processing Methods for
Page 78: Statistical Natural Language Processing Methods for
Page 79: Statistical Natural Language Processing Methods for
Page 80: Statistical Natural Language Processing Methods for
Page 81: Statistical Natural Language Processing Methods for
Page 82: Statistical Natural Language Processing Methods for
Page 83: Statistical Natural Language Processing Methods for
Page 84: Statistical Natural Language Processing Methods for
Page 85: Statistical Natural Language Processing Methods for
Page 86: Statistical Natural Language Processing Methods for
Page 87: Statistical Natural Language Processing Methods for
Page 88: Statistical Natural Language Processing Methods for
Page 89: Statistical Natural Language Processing Methods for
Page 90: Statistical Natural Language Processing Methods for
Page 91: Statistical Natural Language Processing Methods for
Page 92: Statistical Natural Language Processing Methods for
Page 93: Statistical Natural Language Processing Methods for
Page 94: Statistical Natural Language Processing Methods for
Page 95: Statistical Natural Language Processing Methods for
Page 96: Statistical Natural Language Processing Methods for
Page 97: Statistical Natural Language Processing Methods for
Page 98: Statistical Natural Language Processing Methods for
Page 99: Statistical Natural Language Processing Methods for
Page 100: Statistical Natural Language Processing Methods for
Page 101: Statistical Natural Language Processing Methods for
Page 102: Statistical Natural Language Processing Methods for
Page 103: Statistical Natural Language Processing Methods for
Page 104: Statistical Natural Language Processing Methods for
Page 105: Statistical Natural Language Processing Methods for
Page 106: Statistical Natural Language Processing Methods for
Page 107: Statistical Natural Language Processing Methods for
Page 108: Statistical Natural Language Processing Methods for
Page 109: Statistical Natural Language Processing Methods for
Page 110: Statistical Natural Language Processing Methods for