Click here to load reader
Upload
romeo
View
23
Download
0
Embed Size (px)
DESCRIPTION
Process Mining : A Research Agenda. Group 2 M9301106 謝妹圜 M9401008 李宛柔 M9401304 陳志威 M9401402 林宜萱. Agenda. Preface Introduction to Process Mining Challenging Problems in Process Mining Differences in Mining Algorithms Special Issue Conclusion. Preface. - PowerPoint PPT Presentation
Citation preview
Process Mining A Research AgendaGroup 2M9301106M9401008M9401304M9401402
Agenda
Preface
Introduction to Process Mining
Challenging Problems in Process Mining
Differences in Mining Algorithms
Special Issue
Conclusion
PrefaceThe evolution of enterprise information systemWFM BPM BPAFlexibility, diagnosis, and simulation are more important for information system.
The goal of process mining is to extract an explicitprocess model from event logs and also focuses oncausal relations between activities.
Process Mining (1/2)MethodWe can construct a process model by collecting a process log with data about the order that the events take place.
ExCase 1 : A , B , C , DCase 2 : A , C , B , DCase 3 : A , B , C , DCase 4 : A , C , B , DCase 5 : E , F B & C are in parallel
Process Mining (2/2)We can deduce for example the process modelCase 1 : A , B , C , DCase 2 : A , C , B , DCase 3 : A , B , C , DCase 4 : A , C , B , DCase 5 : E , F Start with task A and finish with task D.After executing A, task B and C are in parallel.
Challenging problems Mining hidden tasksMining duplicate tasksMining non-free-choice constructsMining loopsUsing timeMining different perspectivesDealing with noiseDealing with incompletenessGathering data from heterogeneous sourcesVisualizing resultsDelta analysis
Challenging problems Mining hidden tasksSuppose that both A and D are removed,B and C are in parallel
in this case it is still possible to construct a process model as belowCase 1 : A , B , C , DCase 2 : A , C , B , DCase 3 : A , B , C , DCase 4 : A , C , B , DCase 5 : E , F We can detect that there are
an AND-split & an AND-joint
Challenging problems Mining duplicate tasksWe can have a processmodel with two nodesreferring to the sametask, for example, taskE is renamed to task B Case 1 : A , B , C , DCase 2 : A , C , B , DCase 3 : A , B , C , DCase 4 : A , C , B , DCase 5 : B , F It is difficult to construct a process model as below
cause its not possible todistinguish the B from the Bs.
Challenging problems Mining non-free-choice tasksThe Fig.4 below shows a non-free-choice constructAfter executing tasks C,there is a choice between D and E, but it is controlled by thechoice between A and B ,so its not free-choice.
Challenging problems Mining loops (1/2)In a process it may be possible to execute the sametask multiple times. Fig.5 shows an example with a loop.
Possible events areBD, BCD, BCCD, BCCCD...Loops can also be used tojump back to any place in the process.
Challenging problems Mining loops (2/2)There is a relation between loops and duplicate tasks.
In Fig.5 task A is executed multiple times (twice) but is not in a loop. Task A is different from task C.Task A is a duplicate task as we mentioned before.
Challenging problems Using timeIn many cases, the log of each event has a timestamp.The time information can be used for two purpose
Adding time information to process model.
First mine the process model while ignoring the timestamp, then replay the log in the process model, so its easy to calculate flow time, waiting time, and processing time.
Improve the quality of the discovered process model.
If two events occur within a short time interval, its likely thatthere is some causal relation.
Challenging problems Mining different perspectivesControl-flow perspectiveOrdering of tasks, usually Including timestamps
Organization perspectiveRelations between roles & groups
Information perspectiveControl data and production data
Application perspectiveThe applications being used to execute tasks
Challenging problems Dealing with noiseNoiseIncorrectly logged informationThe information we dont need
The mining algorithm needs to distinguish exceptions from the normal flow.Being robust with noiseDetermine a threshold value to cut-off exceptions
Challenging problems Dealing with incompletenessSee the example as belowIf we change the processsuch that tasks C1C9 are executed in parallel, then there are 10!possible routes.The log is likely to beincomplete.
Challenging problems Gathering data from heterogeneous sourcesEvents may be logged at several levels of parts ofthe system, for example, an ERP system like SAP.
Its not easy to collect the event log for process mining.
One approach is to use a data warehouse whichextract the information from these logs we need.
Challenging problems Visualizing resultsAnother challenge is to present the results of processmining in a way that people can gain insight in it.
ARIS PPM is used to display the performance such as flow time, work in progress, etc. in a way that is easy to understand.
Challenging problems Delta analysisDelta analysis is used to compare the two models and explain the differences and commonalities.The two models are
Descriptive or normative models- The model that has been drawn up by people before mining
Reference models- The model constructed after mining
Differences in Mining AlgorithmsA strong relation between the mining algorithm andthe type of problems
To characterize a mining algorithm, we can start witha enumeration of the types of problemsNoise, incomplete logs, duplicate tasks.
Data Mining and Process Mining (1/2)Impossible to use existing data mining techniquesdirectly for process miningMost of the process mining techniques have some very specific properties
Process mining can be seen as a sub-domain of data miningInductive biasLocal-global dimensionComputational complexityMemory requirement
Data Mining and Process Mining (2/2)Workflow logs can contain Information about the attribute of casesActual route taken by a case
Traditional data mining The mining of decision rules that predict the routing of a case
Process miningFocus on mining the process model
The Inductive Bias during Process Mining Algorithm(1/5)Searching through a large space of possible modelsdefined by the process representation language
The goal of search is to find the process model thatbest fits with the data in the workflow log
The Inductive Bias during Process Mining Algorithm(2/5)Process model representation languagePetri netsBlock-oriented process modelsEvent dependency models
Petri nets is a more expressive representation language
The Inductive Bias during Process Mining Algorithm(3/5)The negative effect that the size of the search spacegrowsMakes the mining technique more sensitive for noiseNeeds more data for successful miningHas a negative effect on the computational complexity and memory requirement
The Inductive Bias during Process Mining Algorithm(4/5)If we know that we are looking for a linear model andusing linear regression as our modeling techniqueA few data examples are appropriateThe approach is less sensitive for noiseThe computing time is shorter than for the non linear case
The Inductive Bias during Process Mining Algorithm(5/5)If we know in advance which type of process model we are looking for and using this information during the selection of model representation languageWe have a strong inductive bias
The Local-Global Dimension (1/3)Using different strategies to find the most appropriate modelLocal strategies: step by step, local informationMarkovian approach
Global strategies: one strike search, all traces in workflow- Genetic search
The Local-Global Dimension (2/3)The advantage of local strategiesLess complex from computational view pointMemory requirement is lower
The disadvantage of local strategiesThe locally optimal steps wont guarantee a globally optimal process modelFor example: non-free-choice problem
The advantage of global strategiesMore robust for noise
The Local-Global Dimension (3/3)Combine local and global strategiesA local search approach is usedA global check is performed on the whole model and all data in the workflow log
Special IssueIntroduce 6 papers selected on process miningThe first 3 papers describe mining system in complete process modelsThe 4-th paper focus on the problem of the detection of concurrent behaviorThe last 2 papers introduce information about some global properties
Workflow Mining with InWoLvEAn overview of the algorithms implemented withinthe InWoLvE workflow mining system
InWoLvE solves the workflow mining problem in 2 stepsCreate a stochastic activity graph from the example setTransform this graph into a workflow model
Mining Exact Models of Concurrent WorkflowAn approach to mine exact workflow models fromworkflow logs
Using block-oriented representation language
AdvantageThe property that resulting workflow models are always exact (complete, specific..)
Disadvantage The inductive bias of the mining techniques
Discovering Workflow Models from Activities LifespansA extension of the work of Agrawal with time information
Present 2 new algorithms for mining process modelsout of workflow logs
The number of excess and absent edges in theresulting graphs is smaller than the old algorithm
Discovering Models of Behavior for Concurrent WorkflowFocus on concurrent behavior of process
A probability analysis of the workflow event traces
Discovery patterns by using metrics for the number, frequency, and regularity of event occurrences
Discovery of Temporal Patterns from Process InstancesFocus on the discovering of frequently occurring temporal patterns
Define the temporal pattern discovery problem andevaluate 3 temporal pattern discovery algorithms
Business Process IntelligenceBPI supports business and IT users in managingprocess execution quality
Provide several featuresAnalysisPredictionMonitoringControlOptimization
ConclusionIntroduction to process mining
Illustrated the potential of process mining and challenging problems in process miningHidden tasks, duplicate tasks, non-free-choice constructs, loops, time, noiseand so on.
Trigger new research efforts to solve some problems