Break It Down: A Comparison of Macro- and Microtasks

Preview:

Citation preview

Break It Down: A Comparison of Macro- and Microtasks + CHI 2015 -Justin Cheng et al. /최주은 x 2015 Autumn

Why this paper

Break It Down…

“overwhelming task can sometimes be transformed into a set of smaller, more manageable micro tasks …”

- 효과적인 업무 처리에 관한 내용인가...?로 시작 - 익숙한 듯 낯선 단어 Crowdsourcing이 보임 - 그렇다면 협업에 관련된 내용인가...?

1

Why this paper

Author

Stanford University

Justin Cheng

PhD student in Computer Science

Michael S. Bernstein

Assistant Professor in Computer Science

Microsoft

Shamsi T. Iqbal

Researcher in the neXus group

Jaime Teevan

Researcher

2

Introduction

Macrotasks ▶ Microtasks

Macrotasks

Macrotasks를 쪼개어 일하는 방식 -> Microtasks 큰 일을 작은 단위로 나누고, 분업하여, 합쳐짐

Microtasks

Be transformed into number of smaller one

3

ex) create a taxonomy of the UX

Introduction

Crowdsourcing

Microtask의 가장 대표적인 행태는 크라우드소싱(crowdsourcing) 집합적 지식활동을 의미, Taxonomy와 copyediting에서 자주 볼 수 있음(대표적인 예, ‘위키피디아’)

4

Research Question

5

같은 일을 두 가지 방식(Macrotask & Microtask) 중 하나로 처리할때 발생할 수 있는 이익과 비용 문제를 일을 처리하는데 소요되는 시간과 완성된 일의 질(quality)로 판단

Macrotasks Microtasks

vs

Microtasks가 짧은 시간에 고품질의 결과를 낼 것이다?

Method

- 실험 참가자 N=110 대상

- 42명: arithmetic/ 28명: sorting/ 40명: transcription

- ‘microtasking platform’ 사용

receipt arithmetic

line sorting

audio transcription

sum the cost and their cost

sort line

transcribe 30 second audio clip

* 위 타입별로 Macro냐 Micro냐에 따라 수행 방식이 다름

Interruptions

Task Type & format

to add a pair of two-digit numbers and select an answer from a set of five possible choices

* 랜덤하게 10초 간격으로 8번

Variables

6

Figure 1. Screenshots of the three task types, showing both the macro- and microtask structure.

Task Format

While the three task types can be performed as completemacrotasks, each has also been explored in the crowdsourcingliterature broken down into microtasks (Figure 1).

Arithmetic: In the macro version, participants were shown areceipt listing 10 items and their cost. They were then askedto sum the costs and enter the total. In the micro version,participants were shown a single line at a time and asked toadd the cost of the displayed item to a running total.

Sorting: The macrotask showed seven lines of text and askedthe participant to sort them by dragging and dropping eachline into its new position. The microtask implemented ahuman-powered quicksort [7]. Participants compared pairsof lines, selecting the line with more odd numbers.

Transcription: The macro audio transcription task involvedtranscribing a complete 30 second audio clip. The microtasksplit the clip into five parts using prosodic pauses. Workerswere then asked to transcribe each part individually. Similarcrowdsourced approaches include Legion:Scribe [5], wheredifferent workers capture different parts of a clip in real-time.

Interruptions

In some conditions, we interrupted participants with distrac-tor tasks as they worked. When interruptions occurred, theyappeared eight times at random intervals, spaced roughly 10seconds apart, without regard to where in the underlying taskthe participant was. The interruption task was modal; itblocked the interrupted task and had to be completed beforethat task could be resumed.

The interruption task required participants to add a pair oftwo-digit numbers and select an answer from a set of five pos-sible choices. The choices were designed such that the incor-rect options appeared similar to the correct answer. To stopparticipants from randomly selecting an answer, they were re-quired to select the correct answer in order to continue. A pi-lot version that required participants to type their answer into

a text box revealed similar results. Arithmetic tasks are com-monly used in the literature to distract attention when study-ing interruptions [9, 1].

Experimental Design

The study followed a 3 (task type) × 2 (task format) × 2(interruptions) design. The experiment was between-subjectswith respect to task type and within-subjects with respect totask format and the presence of interruptions. Each partic-ipant was assigned a task type and completed four tasks ofthat type, performing the macro and micro versions, with andwithout interruptions. The order of these tasks was random-ized. To familiarize participants with their assigned task typeand the interruption task, participants initially completed apractice task with interruptions. After completing all trials,participants were asked if they preferred the macro- or mi-crotask structure, and completed a NASA Task Load Index(TLX) assessment [3] to measure subjective mental work-load. The entire process took under 30 minutes.

Measures

For each condition, we measured the total time it took to com-plete the task excluding the time spent on interruptions, andthe average amount of time it took to complete each interrup-tion task. To further control for external interruptions, we alsotracked any loss of browser window focus during each taskand excluded participants who left a task for more than tenseconds. We also measured the quality of work performed.For the receipt arithmetic task we measured the probabilitythat the final answer was incorrect. For the sorting task, wemeasured the Kendall’s tau distance of the submitted rankingfrom the correct one, and for audio transcription we measuredthe word error rate.

To calculate significance we used a linear mixed-effectsmodel, with participant as a random effect. p-values were cal-culated using an F-test with a Kenward-Roger correction. Amixed-design ANOVA resulted in empirically similar results.Error bars in figures depict standard errors.

2

Experimental Design

- 3X2X2 피험자내(within-subjects), 피험자간(between-subjects )설계로 진행 - 3 (task type) -(within-subjects)

- 2 (task format) - (between-subjects )

- 일을 완성하는 시간과 완성된 일의 에러를 측정

(task type)

(task format)

7

Result

Coef. SE df(num/den) F p-value

Task Time (R2 = 0.77)

(Arithmetic Macrotask) 55.2 11.8 0.00∗∗

Task Type 2/181 59.8 0.00∗∗

Sorting 29.7 17.6 0.00∗∗

Transcription 134 16.2 0.00∗∗∗

Microtask 18.9 9.30 1/377 13.3 0.00∗∗∗

Interrupted 12.8 9.30 1/377 6.00 0.01∗

Type × Micro 2/377 4.12 0.02∗

Type × Interrupted 2/377 0.42 0.61

Microtask × Interrupted -3.30 13.2 1/377 2.64 0.10ˆ

Type × Micro × Interrupted 2/377 0.46 0.63

Task Error Rate (R2 = 0.37)

(Arithmetic Macrotask) 0.10 0.04 0.00∗∗

Task Type 2/143 9.38 0.00∗∗

Sorting 0.23 0.06 0.00∗∗

Transcription 0.04 0.05 0.00∗∗∗

Microtask -0.05 0.05 1/377 15.3 0.00∗∗∗

Interrupted 0.03 0.05 1/377 0.07 0.78

Type × Micro 2/377 0.79 0.46

Type × Interrupted 2/377 0.29 0.75

Micro × Interrupted -0.02 0.06 1/377 0.01 0.93

Type × Micro × Interrupted 2/377 0.13 0.88

Table 1. A linear mixed-effects model for both task time and task er-ror rate (ˆ: p<0.1, ∗: p<0.05, ∗∗: p<0.01, ∗∗∗: p<0.001). Some non-significant effects elided for space.

Participants

A total of 110 people participated in the experiment, with42 in the arithmetic condition, 28 in the sorting condition,and 40 in the transcription condition. The experiment wasconducted using an in-house microtasking platform that out-sources crowd work to vendors, similar to CrowdFlower. Weused the Clickworker vendor, with all participants from theUnited States. Empirically similar observations were ob-tained when testing each condition on Mechanical Turk. Par-ticipants were compensated $3.00 and repeat participationwas disallowed.

RESULTS

Completing a task via microtasks took longer overall thancompleting it as a macrotask, but the microtask format ledto fewer mistakes, an easier experience, and greater stabilityin the face of interruption. Table 1 shows the strength of theimpact of task type, task format, and the presence of inter-ruptions on task time (in seconds) and task error rate. Theintercept corresponds to the arithmetic task in a macrotaskformat without interruptions.

Time

For the three task types that we studied, overall, completingthe task via microtasks took longer than doing it as a macro-task (p<.001). For instance, the uninterrupted arithmetic tasktook approximately 42 seconds longer when broken downthan whole (Figure 2). This was expected, as each task typerequires some additional work when broken into microtasks

Arithmetic Sorting Transcription

0

50

100

150

200

Macro Micro Macro Micro Macro Micro

Task

Tim

e Ex

clud

ing

Inte

rrupt

ions

(s)

Uninterrupted Interrupted

Figure 2. Microtasks incur a significant additional fixed cost. Interrup-tions cause an increase in task time for macrotasks but not microtasks.

Arithmetic Sorting Transcription

0.0

0.1

0.2

0.3

0.4

Macro Micro Macro Micro Macro Micro

Erro

r Rat

e

Uninterrupted Interrupted

Figure 3. Microtasks significantly reduce task error rates for a task. Thepresence of interruptions does not impact task quality.

to support the added structure (e.g., in the arithmetic task, par-ticipants enter many additional numbers to keep a running to-tal in the microtask condition, but only a single number in themacrotask condition), and corroborates prior work showingthat introducing breaks between microtasks increases overallcompletion time far beyond a continuous microtask workflow[6].

However, while arithmetic and sorting took significantlylonger as microtasks, this was not the case for transcription.We hypothesize that users may have been implicitly break-ing the transcription macrotask similarly to how the micro-tasks were structured (i.e. listening in parts): participantsstarted, paused or stopped the audio a median of 5.3 timesin the macrotask, but only 10.2 times in the microtask condi-tion, where the minimum number of clicks is ten (to start andstop each of five segments).

Quality

The microtask structure produced higher quality work for allthree task types, with participants making fewer errors whendoing a task as a series of microtasks than as a macrotask(p<.001). The gray bars in Figure 3 represent the number oferrors made for each task type in the uninterrupted condition.One explanation for this is that the additional work required tosupport the microtasks helped externalize some of the mentalprocessing necessary to complete the tasks, which improvedperformance. In the case of transcription, splitting audio atshorter segments of about six seconds at prosodic pauses mayhave been be a good balance between keeping subtasks rela-tively short, while still maintaining adequate context.

3

- Microtasks의 업무 완성 시간이 더 짧았는가?

- 업무 완성 시간에 대한 Interruption의 영향은?

- Microtasks의 업무 완성 시간이 더 길었음

- Interruptions은 업무 시간을 증가시키는 경향이 있음 (Macrotasks의 경우 유의미한 차이가 있음)

Time

8

Coef. SE df(num/den) F p-value

Task Time (R2 = 0.77)

(Arithmetic Macrotask) 55.2 11.8 0.00∗∗

Task Type 2/181 59.8 0.00∗∗

Sorting 29.7 17.6 0.00∗∗

Transcription 134 16.2 0.00∗∗∗

Microtask 18.9 9.30 1/377 13.3 0.00∗∗∗

Interrupted 12.8 9.30 1/377 6.00 0.01∗

Type × Micro 2/377 4.12 0.02∗

Type × Interrupted 2/377 0.42 0.61

Microtask × Interrupted -3.30 13.2 1/377 2.64 0.10ˆ

Type × Micro × Interrupted 2/377 0.46 0.63

Task Error Rate (R2 = 0.37)

(Arithmetic Macrotask) 0.10 0.04 0.00∗∗

Task Type 2/143 9.38 0.00∗∗

Sorting 0.23 0.06 0.00∗∗

Transcription 0.04 0.05 0.00∗∗∗

Microtask -0.05 0.05 1/377 15.3 0.00∗∗∗

Interrupted 0.03 0.05 1/377 0.07 0.78

Type × Micro 2/377 0.79 0.46

Type × Interrupted 2/377 0.29 0.75

Micro × Interrupted -0.02 0.06 1/377 0.01 0.93

Type × Micro × Interrupted 2/377 0.13 0.88

Table 1. A linear mixed-effects model for both task time and task er-ror rate (ˆ: p<0.1, ∗: p<0.05, ∗∗: p<0.01, ∗∗∗: p<0.001). Some non-significant effects elided for space.

Participants

A total of 110 people participated in the experiment, with42 in the arithmetic condition, 28 in the sorting condition,and 40 in the transcription condition. The experiment wasconducted using an in-house microtasking platform that out-sources crowd work to vendors, similar to CrowdFlower. Weused the Clickworker vendor, with all participants from theUnited States. Empirically similar observations were ob-tained when testing each condition on Mechanical Turk. Par-ticipants were compensated $3.00 and repeat participationwas disallowed.

RESULTS

Completing a task via microtasks took longer overall thancompleting it as a macrotask, but the microtask format ledto fewer mistakes, an easier experience, and greater stabilityin the face of interruption. Table 1 shows the strength of theimpact of task type, task format, and the presence of inter-ruptions on task time (in seconds) and task error rate. Theintercept corresponds to the arithmetic task in a macrotaskformat without interruptions.

Time

For the three task types that we studied, overall, completingthe task via microtasks took longer than doing it as a macro-task (p<.001). For instance, the uninterrupted arithmetic tasktook approximately 42 seconds longer when broken downthan whole (Figure 2). This was expected, as each task typerequires some additional work when broken into microtasks

Arithmetic Sorting Transcription

0

50

100

150

200

Macro Micro Macro Micro Macro Micro

Task

Tim

e Ex

clud

ing

Inte

rrupt

ions

(s)

Uninterrupted Interrupted

Figure 2. Microtasks incur a significant additional fixed cost. Interrup-tions cause an increase in task time for macrotasks but not microtasks.

Arithmetic Sorting Transcription

0.0

0.1

0.2

0.3

0.4

Macro Micro Macro Micro Macro Micro

Erro

r Rat

e

Uninterrupted Interrupted

Figure 3. Microtasks significantly reduce task error rates for a task. Thepresence of interruptions does not impact task quality.

to support the added structure (e.g., in the arithmetic task, par-ticipants enter many additional numbers to keep a running to-tal in the microtask condition, but only a single number in themacrotask condition), and corroborates prior work showingthat introducing breaks between microtasks increases overallcompletion time far beyond a continuous microtask workflow[6].

However, while arithmetic and sorting took significantlylonger as microtasks, this was not the case for transcription.We hypothesize that users may have been implicitly break-ing the transcription macrotask similarly to how the micro-tasks were structured (i.e. listening in parts): participantsstarted, paused or stopped the audio a median of 5.3 timesin the macrotask, but only 10.2 times in the microtask condi-tion, where the minimum number of clicks is ten (to start andstop each of five segments).

Quality

The microtask structure produced higher quality work for allthree task types, with participants making fewer errors whendoing a task as a series of microtasks than as a macrotask(p<.001). The gray bars in Figure 3 represent the number oferrors made for each task type in the uninterrupted condition.One explanation for this is that the additional work required tosupport the microtasks helped externalize some of the mentalprocessing necessary to complete the tasks, which improvedperformance. In the case of transcription, splitting audio atshorter segments of about six seconds at prosodic pauses mayhave been be a good balance between keeping subtasks rela-tively short, while still maintaining adequate context.

3

- 더 양질의 업무를 보인것은 어떤 양식인가?

- Quality에 대한 Interruption의 영향은?

- Microtasks의 에러 발생율이 낮음. 즉, 높은 퀄리티를 제공함

- Interruptions은 Microtasks에 거의 영향을 주지 않음.(Macrotasks의 경우 유의미한 차이가 있음)

Quality

Result

9

- Microtasks의 경우 시간은 Macro보다 더 소요되지만, 고품질의 결과를 만들어냄 - Interruptions에 영향을 받지 않음(일의 단위가 짧기 때문)

RQ. Microtasks가 짧은 시간에 고품질의 결과를 낼 것인가

Result

10

Conclusion

“Microtasks could enable people to complete tasks

in interruption-driven environments in a structured way

that requires less cognitive effort, and in spare moments of time. ”

- Microtasks가 방해 요인이 있는 환경에서 인지적 부담이 덜하고, 틈틈이 일을 완성할 수 있다

11