Click here to load reader

Process Mining : A Research Agenda

  • Upload
    romeo

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

Process Mining : A Research Agenda. Group 2 M9301106  謝妹圜 M9401008  李宛柔 M9401304  陳志威 M9401402  林宜萱. Agenda. Preface Introduction to Process Mining Challenging Problems in Process Mining Differences in Mining Algorithms Special Issue Conclusion. Preface. - PowerPoint PPT Presentation

Citation preview

  • Process Mining A Research AgendaGroup 2M9301106M9401008M9401304M9401402

  • Agenda

    Preface

    Introduction to Process Mining

    Challenging Problems in Process Mining

    Differences in Mining Algorithms

    Special Issue

    Conclusion

  • PrefaceThe evolution of enterprise information systemWFM BPM BPAFlexibility, diagnosis, and simulation are more important for information system.

    The goal of process mining is to extract an explicitprocess model from event logs and also focuses oncausal relations between activities.

  • Process Mining (1/2)MethodWe can construct a process model by collecting a process log with data about the order that the events take place.

    ExCase 1 : A , B , C , DCase 2 : A , C , B , DCase 3 : A , B , C , DCase 4 : A , C , B , DCase 5 : E , F B & C are in parallel

  • Process Mining (2/2)We can deduce for example the process modelCase 1 : A , B , C , DCase 2 : A , C , B , DCase 3 : A , B , C , DCase 4 : A , C , B , DCase 5 : E , F Start with task A and finish with task D.After executing A, task B and C are in parallel.

  • Challenging problems Mining hidden tasksMining duplicate tasksMining non-free-choice constructsMining loopsUsing timeMining different perspectivesDealing with noiseDealing with incompletenessGathering data from heterogeneous sourcesVisualizing resultsDelta analysis

  • Challenging problems Mining hidden tasksSuppose that both A and D are removed,B and C are in parallel

    in this case it is still possible to construct a process model as belowCase 1 : A , B , C , DCase 2 : A , C , B , DCase 3 : A , B , C , DCase 4 : A , C , B , DCase 5 : E , F We can detect that there are

    an AND-split & an AND-joint

  • Challenging problems Mining duplicate tasksWe can have a processmodel with two nodesreferring to the sametask, for example, taskE is renamed to task B Case 1 : A , B , C , DCase 2 : A , C , B , DCase 3 : A , B , C , DCase 4 : A , C , B , DCase 5 : B , F It is difficult to construct a process model as below

    cause its not possible todistinguish the B from the Bs.

  • Challenging problems Mining non-free-choice tasksThe Fig.4 below shows a non-free-choice constructAfter executing tasks C,there is a choice between D and E, but it is controlled by thechoice between A and B ,so its not free-choice.

  • Challenging problems Mining loops (1/2)In a process it may be possible to execute the sametask multiple times. Fig.5 shows an example with a loop.

    Possible events areBD, BCD, BCCD, BCCCD...Loops can also be used tojump back to any place in the process.

  • Challenging problems Mining loops (2/2)There is a relation between loops and duplicate tasks.

    In Fig.5 task A is executed multiple times (twice) but is not in a loop. Task A is different from task C.Task A is a duplicate task as we mentioned before.

  • Challenging problems Using timeIn many cases, the log of each event has a timestamp.The time information can be used for two purpose

    Adding time information to process model.

    First mine the process model while ignoring the timestamp, then replay the log in the process model, so its easy to calculate flow time, waiting time, and processing time.

    Improve the quality of the discovered process model.

    If two events occur within a short time interval, its likely thatthere is some causal relation.

  • Challenging problems Mining different perspectivesControl-flow perspectiveOrdering of tasks, usually Including timestamps

    Organization perspectiveRelations between roles & groups

    Information perspectiveControl data and production data

    Application perspectiveThe applications being used to execute tasks

  • Challenging problems Dealing with noiseNoiseIncorrectly logged informationThe information we dont need

    The mining algorithm needs to distinguish exceptions from the normal flow.Being robust with noiseDetermine a threshold value to cut-off exceptions

  • Challenging problems Dealing with incompletenessSee the example as belowIf we change the processsuch that tasks C1C9 are executed in parallel, then there are 10!possible routes.The log is likely to beincomplete.

  • Challenging problems Gathering data from heterogeneous sourcesEvents may be logged at several levels of parts ofthe system, for example, an ERP system like SAP.

    Its not easy to collect the event log for process mining.

    One approach is to use a data warehouse whichextract the information from these logs we need.

  • Challenging problems Visualizing resultsAnother challenge is to present the results of processmining in a way that people can gain insight in it.

    ARIS PPM is used to display the performance such as flow time, work in progress, etc. in a way that is easy to understand.

  • Challenging problems Delta analysisDelta analysis is used to compare the two models and explain the differences and commonalities.The two models are

    Descriptive or normative models- The model that has been drawn up by people before mining

    Reference models- The model constructed after mining

  • Differences in Mining AlgorithmsA strong relation between the mining algorithm andthe type of problems

    To characterize a mining algorithm, we can start witha enumeration of the types of problemsNoise, incomplete logs, duplicate tasks.

  • Data Mining and Process Mining (1/2)Impossible to use existing data mining techniquesdirectly for process miningMost of the process mining techniques have some very specific properties

    Process mining can be seen as a sub-domain of data miningInductive biasLocal-global dimensionComputational complexityMemory requirement

  • Data Mining and Process Mining (2/2)Workflow logs can contain Information about the attribute of casesActual route taken by a case

    Traditional data mining The mining of decision rules that predict the routing of a case

    Process miningFocus on mining the process model

  • The Inductive Bias during Process Mining Algorithm(1/5)Searching through a large space of possible modelsdefined by the process representation language

    The goal of search is to find the process model thatbest fits with the data in the workflow log

  • The Inductive Bias during Process Mining Algorithm(2/5)Process model representation languagePetri netsBlock-oriented process modelsEvent dependency models

    Petri nets is a more expressive representation language

  • The Inductive Bias during Process Mining Algorithm(3/5)The negative effect that the size of the search spacegrowsMakes the mining technique more sensitive for noiseNeeds more data for successful miningHas a negative effect on the computational complexity and memory requirement

  • The Inductive Bias during Process Mining Algorithm(4/5)If we know that we are looking for a linear model andusing linear regression as our modeling techniqueA few data examples are appropriateThe approach is less sensitive for noiseThe computing time is shorter than for the non linear case

  • The Inductive Bias during Process Mining Algorithm(5/5)If we know in advance which type of process model we are looking for and using this information during the selection of model representation languageWe have a strong inductive bias

  • The Local-Global Dimension (1/3)Using different strategies to find the most appropriate modelLocal strategies: step by step, local informationMarkovian approach

    Global strategies: one strike search, all traces in workflow- Genetic search

  • The Local-Global Dimension (2/3)The advantage of local strategiesLess complex from computational view pointMemory requirement is lower

    The disadvantage of local strategiesThe locally optimal steps wont guarantee a globally optimal process modelFor example: non-free-choice problem

    The advantage of global strategiesMore robust for noise

  • The Local-Global Dimension (3/3)Combine local and global strategiesA local search approach is usedA global check is performed on the whole model and all data in the workflow log

  • Special IssueIntroduce 6 papers selected on process miningThe first 3 papers describe mining system in complete process modelsThe 4-th paper focus on the problem of the detection of concurrent behaviorThe last 2 papers introduce information about some global properties

  • Workflow Mining with InWoLvEAn overview of the algorithms implemented withinthe InWoLvE workflow mining system

    InWoLvE solves the workflow mining problem in 2 stepsCreate a stochastic activity graph from the example setTransform this graph into a workflow model

  • Mining Exact Models of Concurrent WorkflowAn approach to mine exact workflow models fromworkflow logs

    Using block-oriented representation language

    AdvantageThe property that resulting workflow models are always exact (complete, specific..)

    Disadvantage The inductive bias of the mining techniques

  • Discovering Workflow Models from Activities LifespansA extension of the work of Agrawal with time information

    Present 2 new algorithms for mining process modelsout of workflow logs

    The number of excess and absent edges in theresulting graphs is smaller than the old algorithm

  • Discovering Models of Behavior for Concurrent WorkflowFocus on concurrent behavior of process

    A probability analysis of the workflow event traces

    Discovery patterns by using metrics for the number, frequency, and regularity of event occurrences

  • Discovery of Temporal Patterns from Process InstancesFocus on the discovering of frequently occurring temporal patterns

    Define the temporal pattern discovery problem andevaluate 3 temporal pattern discovery algorithms

  • Business Process IntelligenceBPI supports business and IT users in managingprocess execution quality

    Provide several featuresAnalysisPredictionMonitoringControlOptimization

  • ConclusionIntroduction to process mining

    Illustrated the potential of process mining and challenging problems in process miningHidden tasks, duplicate tasks, non-free-choice constructs, loops, time, noiseand so on.

    Trigger new research efforts to solve some problems