Green Proj

8/12/2019 Green Proj

1/72

Final Year Project: Keystroke Analytics

Candidate: 77415 - James Green

2013


2/72

Contents

1 Introduction 3

1.1 Biometric Keystroke Authentication . . . . . . . . . . . . . . . . . . . . . . 31.2 Overview of Existing Work . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Research Goals & Hypotheses 6

2.1 How People Intuitively Fragment a Passphrase . . . . . . . . . . . . . . . . 62.2 When learning to type a passphrase, how long does it take an individual to

produce a stable signature? . . . . . . . . . . . . . . . . . . . . . . . . . . 82.3 Keystroke-biometric Authentication . . . . . . . . . . . . . . . . . . . . . . 92.4 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.5 Professional Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.1 Public Interest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5.2 Professional Competence and Integrity . . . . . . . . . . . . . . . . 132.5.3 Duty to Relevant Authority . . . . . . . . . . . . . . . . . . . . . . 13

3 Method 14

3.1 Design of Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1.1 High Level Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.1.2 Construction of Passphrase . . . . . . . . . . . . . . . . . . . . . . 153.1.3 Length, Frequency & Format of Experiment . . . . . . . . . . . . . 153.1.4 Display and entry of pass-phrase . . . . . . . . . . . . . . . . . . . 163.1.5 Entry devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.1.6 Data Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.1.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

3.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.2.1 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.3 Running the Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.4 Analysis Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.4.1 Using Students T-test for Authentication . . . . . . . . . . . . . . 273.4.2 Using Self-organising Maps for Identification . . . . . . . . . . . . . 30

4 Results 34

4.1 Passphrase Fragmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2 Learning & Stabilisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2.1 Entry Duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.2.2 Passphrase Observation . . . . . . . . . . . . . . . . . . . . . . . . 394.2.3 Travel-time Stability . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3 Identification & Authentication . . . . . . . . . . . . . . . . . . . . . . . . 454.3.1 T-test Authentication . . . . . . . . . . . . . . . . . . . . . . . . . 454.3.2 Self-organising Maps . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3.3 Dwell Timings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

1


3/72

5 Discussion 545.1 Evaluation of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.1.1 How people intuitively fragment a passphrase . . . . . . . . . . . . 545.1.2 When learning to type a passphrase, how long does it take an indi-

vidual to produce a stable signature? . . . . . . . . . . . . . . . . . 555.1.3 Keystroke-biometric authentication . . . . . . . . . . . . . . . . . . 56

5.2 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6 Appendices 61

6.1 Digraph Timing Charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

2


4/72

1

Introduction

1.1 Biometric Keystroke Authentication

The fact that computers regularly store private, sensitive and classified information makesit very important that we can confidently identify their users. Traditionally this has beenachieved through password authentication systems. However, these systems are far fromperfect. For instance, if a password becomes compromised it is no longer adequate forauthenticating its rightful owner.

In the hope of improving on this, there exists ongoing research into utilising the idiosyn-crasies of a users interaction with a computer as a form of authentication. So far in thisfield the most promising techniques focus on patterns in the timing of a users typing. Weshall refer to this as biometric keystroke authentication. Compared to more conventionalbiometrics such as fingerprint or iris authentication, it offers the following advantages:

It does not require special tools or hardware, only a conventional keyboard.

It is non-invasive for the user.

It can easily be deployed in conjunction with existing authentication systems.

Can be collected without the users knowledge.

In terms of combining with existing systems, much research has gone into investigatingthe viability using the typing behaviour present upon the entry of password credentials asan additional layer of authentication. This technique could potentially overcome the short-comings of passwords, as not only must the password be known, but it must be entered inthe manner of the legitimate user. While this technique is promising, there has been littlework that takes into account how a user learns to type credentials. Yet acknowledgingthis learning process could be a very important feature of a viable biometric keystrokeauthentication system.

To investigate this issue this project will examine the timing patterns a user generates

when they type a password-like phrase on a conventional keyboard. For each entry of apassphrase we will capture a temporal signature that is comprised of the precise timingsof when each key is pressed, the duration it is held down for (known as dwell) and thepoint at which it is released.

To gather this data we shall get a group of volunteers to learn a passphrase by repeatedlytyping it into a computer. Our experiment will be set up like a traditional password entrysystem, where the passphrase must be entered correctly for the volunteer to proceed. Thecomputer will be running a program that records the precise timings of each keystroke,

3


5/72

such that afterwards we can easily run the data through a series of analysis and visualisa-tion techniques.

This data will be used to explore several questions relating to the way in which peoplelearn to type passwords. In general we are hoping to answer the question: Can we identifypeople by the way they type a password? But in terms of specific research goals, we arehoping to explore:

How do users intuitively fragment a passphrase?

When learning to type a passphrase, how long does it take people to produce a stablesignature?

For a given passphrase, can we distinguish between people who spent time learningit, and those who didnt?

For a given passphrase, can we distinguish between two individuals who have bothlearned the passphrase?

It is hoped that answering these questions will provide valuable information for cognitivescience, and practical insights for the development of biometric keystroke authenticationsystems.

1.2 Overview of Existing Work

Over the last few decades keystroke biometrics has been the subject of a variety of research.Rather than attempt to give a complete overview of the field, we shall focus on work thatis pertinent to this report.

Gains et al [1] provides one of the first documented attempts at using typing to identifya user. This experiment saw seven professional typists enter three large bodies of textinto a PDP11 computer, which was running a program that recorded the timings betweeneach keystroke. Using this dataset, they proceeded to investigate the time delays betweencertain pairs of consecutively typed letters, known as digraphs (eg th or ng). By per-

forming statistical analysis on this data they showed that it was possible to use digraphsto distinguish between the different typists. While this is encouraging, its relevance toauthentication is limited by the fact the experiment was based upon a sizeable corpus ofnatural language, yet passwords are small and are not constrained to only natural language.

Using digraphs is an approach that was expanded upon by Leggett et al [2], who in-tegrated this technique into a dynamic system that performed continuous authenticationas the user performed their everyday tasks on the computer. While this situation is verydifferent from the more static verification we are exploring, its accompanying report does

4


6/72


7/72

2

Research Goals & Hypotheses

There are many issues, questions and investigations that could be pursued in this field.Due to the time constraints of this project and report we have chosen to look into threespecific issues. In doing so, we wish to corroborate some of the existing research in thisfield, and perhaps investigate some of these questions in a novel way.

2.1 How People Intuitively Fragment a Passphrase

When people initially type a passphrase they have never seen before, the passphrase often

gets fragmented into smaller pieces. This is in the sense that the passphrase is enteredin a succession of quick bursts of keystrokes, separated by longer pauses. This is causedby the way in which human cognition breaks information into chunks. Put more simply,consider the situation where a computer user is presented with printed material displayinga random alphanumeric passphrase that they must enter into a computer. Some peoplewill fail to hold the entire passphrase in working memory, so instead of entering it in onecomplete motion, they look back and forth between the printed phrase and the computer,entering the passphrase in stages.

When carrying out memorisation tasks like this, our performance, and hence the way inwhich we fragment a passphrase, is underpinned by our ability to utilise working memory.

This link has been examined by many studies, but of particular relevance to passphrase-like sequences is the work of Mclean & Gregg [6], who showed how it is possible to usethe pauses in the verbal recital of a memorised sequence of random letters, to reveal thestructure of how the sequence is chunked in working memory.

Chunks are a fundamental principle of memory. They serve as a cognitive compressionmechanism that groups many pieces of information into a fewer number of larger items.One might argue that the fragmentation of a passphrase seems exactly the opposite (thedecomposition of a large item into several smaller ones). However, if you consider eachletter in the phrase as an individual item, the task can instead be thought of as how wemerge these letters together into reasonably sized groups.

While we can certainly expect to find chunking patterns in the timing of a passphraseentry, it is still a reasonably open question as to how this chunking behaviour can be usedto identify individuals. Existing work in keystroke biometrics mostly only builds modelsaround learned phrases that produce stable patterns and tends not to consider using thepatterns found in the initial first few entries of a new passphrase as material for identi-fication. Consequently, in this investigation we are going to study these initial patternsin an attempt to determine the extent to which the fragmentation process varies betweenindividuals, and hence carries identifying information. Figure 2.1 shows how two people

6


8/72

might type in the same passphrase differently.

Figure 2.1: Diagram showing two users performing fragmentation of text differently

Examining fragmentation behaviour could yield valuable insights for cognitive science.Certainly, results from this experiment should be examined in the context of existing re-search in this field. In terms of practical use, it is possible that authentication systems

could benefit from an understanding of how users fragment passphrases they are unfamiliarwith. In essence, it could comprise part of a profile of their typing behaviour.

There has already been some work that backs up this idea. For example, Banikowski etal [7] give a comprehensive overview of how the ability to organise and digest informationnaturally differs from one individual to the next. They note that this performance is notfixed, and improves when the user is familiar with the involved domain. Certainly, in thecontext of our experiment, an individual who frequently has to memorise passphrase-likesequences is more likely to be efficient and swift in their adaption to novel phrases. Indeed,Banikowski et al demonstrate how this behaviour can be learnt and outline a variety ofdeliberate strategies for increasing the efficiency of working memory.

This leads to our first hypothesis: Due to variations in human chunking abilities, theway in which we perform fragmentation of a passphrase varies between individuals. If thisis true, then we can consider how we might model this behaviour such that it forms partof a users biometric profile.

To get insight into how people perform this fragmentation process, we need to examinea data set of the temporal signatures produced from volunteers typing in passwords theyhave never seen before. Ideally we should include a diverse range of users. To perform

7


9/72

analysis, there should be a robust way of detecting fragmentation patterns.

2.2 When learning to type a passphrase, how long

does it take an individual to produce a stable sig-

nature?

It has long been known that when acquiring a new skill, our progress tends to follow alearning curve, where our performance of a task increases logarithmically with respect tothe number of times it is performed [8]. Typically after a certain number of repetitions,

performance will hit a plateau at which its rate of increase will become very small. It is atthis point where the task is often considered to be learned as well as it reasonably can be.In the context of learning a passphrase, performance can be measured as the time taken toperform a correct entry. By extension, this is a measurement of the speed and confidenceat which an individual recalls a passphrase from memory.

To recall a passphrase it must first be encoded into long term memory. In the case ofpasswords, the standard technique is to use repetition. Not only does this keep the pass-word active in working memory but it helps preserve the order of the characters in thepassword. As typing a password is a physical task, its transition to long term memorywill be aided by what is known as procedural learning. This is a mechanism that encodes

physical tasks in long term memory as a series of implicit memories. These are implicitin the sense that the repetitions will reinforce each other without us consciously recallingprevious repetitions when performing the task.

Once a password is learned in this manner, the task of typing it will involve minimalengagement of working memory, with less conscious awareness of the characters that con-stitute it. This effect should be particularly pronounced in skilled typists, whose speed oftyping will eventually transform the high level task of memorisation and recall, into a lowerlevel subconscious motor task. This effect is observed by De Luca et al [9] who demon-strated though trials that people entering a personal identification number (PIN) into acash machine have a tendency not to remember the numbers, but instead to combine thespatial movements between individual numbers into a shape or pattern that is mentallysuperimposed onto the machines number entry pad.

Given this form of learning, we can expect individuals to progress to a point where they nolonger have to refer to written material to enter the password and can recall it completelyfrom memory. As this progression happens, we can expect the entry time to decrease.Furthermore, existing work in biometric keystroke authentication suggests that during thisprocess, the keystroke timings will converge onto reasonably consistent patterns. This iscorroborated by the theory of implicit memory, which indicates that the temporal patterns

8


10/72

should cumulatively reinforce each other by way of repeated motor actions.

This helps form our second hypothesis: Over several trials, the timing signatures pro-duced by an individual learning to type a passphrase will stabilise by converging to acertain length and pattern. If this is indeed the case, we shall attempt to determine ap-proximately how long it takes to generate a stable signature, and if this time period isroughly uniform between all volunteers.

Being able to model the learning process in terms of keystroke signatures could be valu-able in augmenting password authentication. In particular, a machine learning techniquewhich performs authentication based on signatures might only want to form a model on theusers password entry once the user had reached a point of regularly entering the passwordin a stable fashion.

To investigate this, we need data of volunteers entering a password in a repeated fashionuntil they produce a stable signature. To help ensure this, we should use existing literatureto establish a lower boundry for the number of trials to schedule. To make sure the resultsare not tied to one particular passphrase, it would be desirable to study different groupslearning different passwords.

2.3 Keystroke-biometric Authentication

The essence of this investigation is to determine the extent to which it is possible to iden-tify users based on the temporal signature generated when they type a passphrase on akeyboard, and if so, can this be used as a biometric to augment computer authenticationsystems. This idea is based on the assumption that different users demonstrate noticeablydifferent temporal patterns when typing the same text. In terms of authentication, it ishoped that a malicious user with stolen credentials will enter them in a way that is distin-guishable from the genuine user.

The use of human interaction with technology as a basis for identification actually pre-dates the modern computer. For instance, in World War II, military intelligence couldidentify the sender of a morse code telegram based on the rhythm of the dots and dashes.[4]. Each operator had a distinctive style, known as his fist, which according to investi-gations by Bryan & Harter[10], was typically enough to identify a sender. They reportedthat one experienced operator who worked with over 40 operators on a regular basis, hadlittle trouble distinguishing between their various rhythms, speeds and syncopations.

Operating a telegraph and typing on a keyboard share a lot in common. They are bothlearned motor skills that involve us translating human language into mechanical motions.The level of expressiveness these motions afford is surprisingly high. For instance, Epp et

9


11/72

al [11] showed that it is possible to determine the emotional state of a computer user to areasonable degree of accuracy, based solely on their typing style.

As explored in the previous sections, existing research has demonstrated that this ex-pressiveness found in typing behaviour has led to positive results for both authenticationand identification using keystroke dynamics. Based upon this evidence we can expect tofind strong patterns in the data we capture and hope to confirm that keystroke biometricsis a viable technique.

Out of all the techniques which have been explored, the use of self-organising maps(SOMs) is one technique that has been relatively neglected. This is surprising as SOMsare naturally good at identifying patterns in high dimensional data and have the abilityto cluster information in an unsupervised manner. Joshi et al [12] successfully used self-organising maps for authentication by creating a small one-dimensional map for each user.However, there are still uses of SOMs that remain unexplored, such as planting severalusers on one SOM to reveal the differences between them. In this experiment we shallinvestigate the feasibility of this as a technique for user identification.

However, like other non-linear techniques such as neural networks, SOMs are not intuitivein the sense that while they can produce impressive results, it is not always clear how orwhy these results are achieved. Sometimes this is not so much an issue, but in this case wefeel it is important to get a more intuitive grasp of the shape and form of the data. To doso we shall supplement the use of SOMs with statistical analysis and bespoke visualisation.

As previously mentioned, we aim to use these techniques to uncover idiosyncratic fea-tures of an individuals typing style. Such features are likely to be distinguished by twogeneral properties. Firstly, they must be uniquely identifying of an individual, which maynot be a problem in data sets with only a few people, but given a large number of peopleto identify, this is a serious consideration. Secondly, these features must be stable and notfluctuate wildly. In other words there has to be a good signal-to-noise ratio. Good robustfeatures should be both unique and stable.

The most likely place to start looking for these properties is not in the raw timestampsof each key-press, but in the delay times between successive keystrokes. Like much of the

existing literature, we shall refer to these timings as digraphs. The relationship betweenraw timings, dwells and digraphs can be seen in figure 2.2. To be able to properly addressthe question of authentication and identification we need a selection of passphrases, eachof which is learned by several volunteers, who enter their chosen passphrase multiple times.In addition, it would useful if for each of the passphrases we had a variety of people performone-offentries. This would enable us to not just compare between people who have learneda passphrase, but between those who did and those who did not.

10


12/72

Figure 2.2: How digraphs are extracted from a temporal signature

To conclude this section, our third hypothesis is that it is possible to distinguish betweenindividuals using the temporal signatures produced from keystroke timings. (BiometricKeystroke Authentication)

2.4 Approach

To answer these hypotheses we shall experimentally gather the required data. We will

then analyse this data using a combination of visualisation, graphing, statistical tests andself-organising maps. This data shall come from volunteers interacting with bespoke soft-ware that we shall design and implement. To ensure that the data is close to what mightbe used in a functioning keystroke authentication system, we shall attempt to provide arealistic experience for the volunteers.

In terms of what techniques we are to use for each hypothesis, our justification is asfollows:

11


13/72

Hypothesis Technique Reasoning

1 Visualisationof fragmentpatterns.

Fragmentation can be visualised very naturally. Doing sowill allow us to get a quick and comprehensive understand-ing of how volunteers go about this process.

1 Collation of

common tim-ing patterns.

This will allow us to get a quantitive measure of the diver-

sity of fragmentation patterns.

2 Investigationof averageentry times.

This will provide a coarse but effective idea of volunteersrate of progress. It will also allow us to easily examine ourwork in the context of established theories of the learningcurve.

2 Visualisationof volunteerspassphraseobservationbehaviour.

By looking at how and when the volunteers elect to observethe passphrase, we can get a good idea as to how theirconfidence develops when entering the passphrase.

2 Charts of di-graph timings.

This will allow us to investigate a volunteers progressiontowards a stable passphrase entry. We will also be able tocross-reference this with our other visualisations to see howfragmentation patterns persist beyond the first few entries.

3 T-test basedauthentica-tion.

T-test based authentication is a well established way of per-forming authentication with temporal signatures. But asthis investigation will consider the effect learning a phrasehas on the resulting temporal signature, we can re-examinet-test authentication in the context of learned and un-learned impostors.

3 Self-organising

maps.

The dimensionality reducing abilities of self-organisingmaps make them a natural candidate for keystroke bio-

metrics, yet there have been few studies that explore themin this context.

12


14/72

Like any experiment involving volunteers, there are certain legal and ethical frameworksthat we must operate within. We shall explore these issues in the next section.

2.5 Professional Considerations

This project must comply with the Code of Conduct and Code of Practice published bythe British Computer Society. The following sections address this projects applicabilityto each section of the code of conduct.

2.5.1 Public Interest

This project has little direct application to this section of the code of conduct. However, ifelements of this research are used in authentication software, then arguably this increasedsecurity is in the public interest. Certainly, no one is harmed by developing keystrokedynamics as a biometric. On the contrary, keystroke dynamics is much less intrusive thantraditional biometrics such as retinal or fingerprint scanning.

The only other relevant issue is that we must ensure there is no discrimination withregards to who gets chosen for participation in the experiment.

2.5.2 Professional Competence and Integrity

This section applies to how we conduct this research in the context of academia. In partic-ular we must respect and value the viewpoints of other academics and accept criticisms ofour work. It is worth noting that we may offer volunteers entry into a prize draw for a smallamount of money as an incentive to my project. But this is in line with the Universitysethics approval, and is purely optional for the volunteers.

2.5.3 Duty to Relevant Authority

The most applicable rule of authority in this experiment, other than the ethical guide-lines produced by the University Of Sussex, is the Data Protection Act in UK law. Wemust be careful how volunteer data is stored, and how we share the results of our experi-ment. We must also ensure volunteers are fully aware of how their data is going to be used.

While the experiment is running, the volunteers email will be used to authenticate andidentify them. But, when the experiment is over the data collected from the volunteerswill be anonymised such that it will not be possible to link identities or information backto individual participants.

13


15/72

3

Method

3.1 Design of Experiment

3.1.1 High Level Design

To answer our particular set of hypotheses and research goals, we need an experimentalsetup that includes these features:

Volunteers should type in a passphrase they have not seen before.

Volunteers must be given the chance to repeat entry of the phrase until they havelearned it.

There should be several passphrases in use.

Each passphrase should be learned by many volunteers.

Each passphrase should be attempted (one offentry) by even more volunteers.

In this case, the term entry refers to the typing of a passphrase into a standard computerkeyboard. We shall elaborate on this, but first we shall consider the high level design ofthe experiment. To fully meet our requirements, the experiment will be constructed as

follows: The volunteers will be divided into several groups.

Each group will be assigned a primary phrase and a secondary phrase.

The primary phrase will be randomly generated.

The secondary phrase will be the primary phrase of another group.

Each volunteer will be required to enter the primary phrase of their group a fixednumber of times. At the end of the experiment they will act as an impostor foranother group by entering the secondary phrase.

In this setup the independent and dependent variables are:Independent variables: Individuals, passphrase.Dependent variable: Temporal signature.

By varying the individuals across the same passphrase we increase the certainty that iden-tifying patterns in the temporal signatures are due to the individual, not the passphrase.By having several groups, each learning a different phrase, we can verify that identifyingphenomena are consistent across different passphrases. Having each group act as one timeimpostors for another group allows comparison between those who have learned a phrase

14


16/72

and those who have not.

This high level design provides a good framework for capturing the kind of data we areinterested in. However there are still lots of details to be worked out. The following sectionswill discuss these issues and decide on reasonable solutions.

3.1.2 Construction of Passphrase

The volunteers are to be presented with a generated passphrase. There are several optionsas to how the passphrase is generated.

For instance, pass-phrases could include dictionary words. However, this is likely to bea bad idea, as certain volunteers may be familiar typing particular words. This would addan extra layer of complexity and make analysis harder.

The best way to ensure that the pass-phrases are completely new to the volunteers isto create randomly generated sequences of characters. To keep the experiment as simpleas possible, it is probably preferable to keep the pass-phrases as lower-case and alphabeticonly. This will also reduce the likelihood of volunteers getting thrown off by varyingkeyboard layouts.

3.1.3 Length, Frequency & Format of Experiment

It was initially considered to have the volunteer participate in one or two sessions wherethey would sit down and repeatedly enter a pass-phrase until they had learnt it. However,while such a set-up allows complete experimental control, it is a very un-natural learningsituation and is likely to feel excessively artificial or tedious. One solution would be tohave a user periodically enter a password while they perform some other non-essentialtask (possibly of their choosing). However, this is still a fairly artificial way to learn andvolunteers might not even manage to learn a password in the amount of time allotted tosuch an experiment.

A better solution would be to have the volunteers enter the passphrase only once or twicea day, but over a period of one or two weeks. This would allow volunteers to perform the

learning in a much more natural way. One way to implement this would be to have regu-lar meetings in which the volunteers could participate, but we feel that this is not practical.

Instead we decided that the experiment shall be distributed and remote. Users will takepart, in their own time, with their own computers. This shall be achieved by placing theexperiment on a website. Not only is this convenient, but it provides us with a more real-istic dataset that is closer to what a real biometric keystroke authentication system wouldhave to deal with.

15


17/72

In terms of the length of the experiment Tapiador and Siguenza [5] demonstrated that auser can learn a password after approximately 20 entries. However, we also wish to capturea substantial number of signatures once the phrase has been learned. Given that this isthe case, it was decided that 40 entries should give us enough data to work with. The last4 of these entries will see the volunteer use their groups secondary phrase to act as animpostor for another group.

At each session, a volunteer will use their computer to connect to the experiment website. To reinforce the learning, each of these sessions will consist of two entries. This meansthat 40 entries can be gathered over 20 sessions. In an effort to make the experiment easierto participate in, volunteers can decide when they complete these sessions, with only thefollowing restrictions:

There must be at least two hours between each session.

Up to two sessions can be performed per day.

Volunteers will be encouraged to try and complete sessions twice daily.

Should a volunteer perform one or none on a given day, then the experiment willsimply take more days to complete.

With this setup, the experiment will take a minimum of 10 days for each volunteer tocomplete.

3.1.4 Display and entry of pass-phrase

As the experiment begins, the volunteers are going to need to be prompted with the pass-phrase. By prompting, we mean displaying the passphrase prominently on the screen. Ifwe were to simply to tell the volunteers their passphrase, and let them choose how to storeit, many of them would write it down on paper or use a password manager. But as thisbehaviour is inconsistent, it would devalue our data. For this reason, all prompting of thepassphrase shall take place on the screen where we have full control of it.

However, there are two issues to consider here. Firstly, we must decide how long into theexperiment we keep prompting the volunteer. Secondly, we must decide how the prompt

displays the pass-phrase. In particular, we need to look at how we can stop the volunteerfrom typing in the pass-phrase as they read it off the screen. If they do this, then it isharder to see how the phrase is being chunked up and the volunteers are less likely tocommit the phrase to memory.

Longevity of prompting

One option would be to slowly prompt the volunteers less as they learn the pass-phrase.But this might prove difficult to implement correctly. A better solution would be to stop

16


18/72

immediately prompting the volunteers after a fixed amount of time into the experiment,and after this point have the volunteers elect to reveal the pass-phrase.

Method of prompting

If the volunteers are allowed to look at the prompt and type at the same time, there isa danger that those who are fluent touch typists will simply type as they read off thescreen, without actually committing the password to working memory. A good solution isto design the experiment such that volunteers can toggle between viewing the prompt andtyping, but cant type and look at the same time.

Repeated entry

As explored when discussing the frequency of the experiment, to speed up learning, it isdesirable to have volunteers perform entry of the pass-phrase twice in one sitting. Thisshouldnt take much longer for volunteers to complete and it adds value to the experiment.

Display of entered text

Intuitively it might seem a reasonable suggestion to show the user what they have typedso far in clear text. But it is worth considering if this is the right way to implement visualfeedback in this experiment.

When a user of a website enters a password it is usually displayed as a series of periods

or asterisks (to stop prying eyes from picking up their password). As this is an acceptedstandard, implementing visual feedback in this way would make the experimental resultsmore applicable to password authentication. Furthermore, it forces the user to enter thepassword in one go, with no going back to edit what has already been entered. If theuser wishes to do so, they must start again. Enforcing this will result in cleaner temporalsignatures that are easier to analyse.

Incorrect entry

Rather than reject individual mis-typed letters, it is easier to reject completed phrases andallow the user to enter the entire phrase again. This is to ensure all temporal signaturesare the same and match the passphrase correctly.

However, it may still be worth while recording the keystrokes of the incorrect phrases,as this could provide insights into the learning process.

3.1.5 Entry devices

While we wish for all volunteers to use standard keyboards, the experiment is run remotely,so volunteers may attempt to to enter the pass-phrase on a non-standard keyboard device

17


19/72

(eg mobile phone keyboard). The implementation of our experiment will try to preventthis.

3.1.6 Data Security

It would be unacceptable to lose experimental data, so due to the lengthy nature of thisexperiment there should be regular backups of the entire system. In the event of theprimary instance of the experiment failing it should be possible to resume it from backup.

3.1.7 Summary

Amalgamating all these requirements together gives up an experiment that will get the dataneeded to investigate our hypothesis in a robust and secure fashion. Volunteers progressthrough this experiment can be divided into three stages:

Stage Phrase Repetitions Prompt1 Primary 20 Immediately2 Primary 16 Upon choosing3 Secondary 4 Immediately

In the above table the primary phrase refers to the main passphrase that each groupof volunteers will be learning. The secondary phrase refers to the phrase that each groupwill enter at the end to act as an impostor for another group. In stages 1 and 3, when

a volunteer takes part in an entry session, they will be shown the passphrase they arelearning before entry. However, in stage 2 they will be asked to enter it from memory, andwill only be shown the passphrase if they elect to do so.

We have been careful to make sure that all the features needed to investigate our hy-pothesis are included in this experiment. The way in which the stages of the experimentcorrespond to each hypothesis is shown in figure 3.1. In the next section we will discusshow we implemented this design and explain what technologies we used.

18


20/72

Figure 3.1: Diagram showing the design of the experiment

.

19


21/72

3.2 Implementation

Using the design described in the above section the experiment software was successfullybuilt, tested and used to gather data. Rather than give an in-depth analysis of the code,we shall provide an more informative overview of the technologies we used, and describethe interaction with the software that volunteers experienced.

3.2.1 Technologies

We built the experiment software as a web application using the following technologies:

HTML5 + Javascript - For creating the information pages and javascript client

Twitter Bootstrap - CSS & Javascript library for styling the HTML pages

Node.js + Express middleware - Web server to serve content and power the restfulAPI

Jade Templating Engine - To allow dynamic rendering of HTML

MongoDB - Database to provide persistence for data gathered.

The volunteer was authenticated with the site using their email and a password. (Not tobe confused with passphrase they were learning). Once authenticated they were directed

to a page containing a javascript client which allowed them to perform the experiment.This client queried the servers RESTful API to retrieve the pass-phrase to prompt the vol-unteer with, along with any other experimental parameters. Once the pass-phrase entrywas completed, the temporal signatures recorded were submitted through the same API.These were then stored in JSON notation in the MongoDB store. The entire system wasbacked up remotely and automatically to another server using rsync. The design of theexperiment server is shown in figure 3.2.

To make the experiment easier for volunteers to complete we used the middleware tocreate log-in sessions. We also used long term cookies to re-authenticate volunteers whoreturned with the same web browser on the same computer. This way volunteers were notrequired to login very often.

The software was designed to be very modular. The core modules are shown in figure3.3. This means that if similar experiments are required, the software to very easily be reconfigured to with different groups, passphrase and schedules.

20


22/72

Figure 3.2: Diagram showing the design of the experiment server

Figure 3.3: Diagram showing the node.js components used in the experiment server

21


23/72

Path Method Resource Functionality/experiment GET HTML Provides the experiment web client/data GET JSON Returns the current experiment (if there is

one) for the currently logged in volunteer./data POST JSON Receives collected experimental data from

the client./signup GET HTML Provides the web page where volunteers can

register to take part./signup POST JSON Accepts registration data for new volunteers./login GET HTML Provides a web page where volunteers can log

in./login POST JSON Accepts the username and password of a vol-

unteer and logs them in./logout GET HTML When navigated to, logs out the currently

logged in volunteer./participation GET HTML Returns a web page that lists information

about participating./consent GET HTML Returns a webpage that records the volun-

teers consent to take part./admin/ GET HTML Returns a web page that provides an admin-

istrators overview./admin/user GET HTML Returns a web page that provides an

overview of a volunteers progress through the

experiment./admin/data/summary GET JSON Returns a summary of all volunteers progress

through the experiment./admin/data/user GET JSON Returns all experiment data for a given vol-

unteer./admin/data/everything GET JSON Returns a database dump of all experimental

data.

Figure 3.4: Table detailing the http interface to the experiment server

3.3 Running the Experiment

To register to take part in the experiment, volunteers had to be given information aboutwhat they were taking part in and had to tick a box indicating they agree to our termsand conditions. After this they chose a username and password then logged-in for the firsttime. This process can be seen in figure 3.6.

When a volunteer registered they were allocated into one of the three phrase groups. In

22


24/72

Figure 3.5: The process for typing in a passphrase

an attempt to keep group numbers easy, the software always allocated each volunteer tothe group with the fewest members.

Once signed-up and authenticated, volunteers could begin completing the experiment.As per the design, volunteers could toggle between entering the phrase and looking at it.This process is shown in 3.5.

To make it easy to keep track of volunteers, the experiment software included an ad-ministration page which listed volunteers and showed their progress. It also provided avisual indication that highlighted volunteers who had not completed an experiment in over24 hours. This page, pictured in figure 3.7 was not accessible by any of the volunteers.To perform preliminary analysis of the temporal signatures as they were being collected,we created another administration page that displayed the entries of each volunteer. Thispage, seen in figure 3.8 shows dwell times, digraphs and typing patterns.

23


25/72

Figure 3.6: The sign-up process for volunteers

24


26/72

Figure 3.7: An excerpt from the administration console

Figure 3.8: Screenshot of the volunteer info page

25


27/72

At the end of the experiment we had captured the data of 20 volunteers across 3 groups.A greater number of volunteers registered to take part, but many of them did not completethe experiment. Unfortunately, this meant in our dataset of completed experiments thegroupsizes were uneven. These groups, the number of volunteers in each, and the phrasesthey learned can be seen in the following table:

Group ID Volunteers Primary Phrase Secondary Phrase1 5 zmlntdzqdmg fhtvlqwlesf 2 8 vrfpebvvewe zmlntdzqdmg3 7 fhtvlqwlesf vrfpebvvewe

To analyse this data, we decided to use the Scala programming language. Scala providesa hybrid of functional and object-oriented methodologies which make it great for manipu-lating data. To get our JSON data into Scala, we used the Lift-web JSON parsing library.Custom visualisations were implemented using Processing.

26


28/72

3.4 Analysis Techniques

3.4.1 Using Students T-test for Authentication

Students t-test provides a measure of whether the mean of two groups are statisticallydifferent from each other. Rather than just using the difference of the mean, it takesinto account the variability of each group. This makes it particularly useful for keystrokeauthentication, as different users may enter the passphrase with different degrees of vari-ability. In essence, it provides an assessment of signal over noise.

The t-test of two distributions, a and b, is defined as following:

t= xaxb

vara

na+ varb

nb

(3.1)

Where for each group: xis the mean, n is the size and var is the variation.

Usually the resulting value of a t-test is looked up in a statistical table that gives ameasure of significance. However, in this work we shall be adapting the t-test into anauthentication mechanism that determines whether or not a candidate signature belongsto a collection of temporal signatures that represent a user.

This authentication procedure will use the delay between pairs of key-presses, known asdigraphs, that are present in each temporal signature. To do this we take all an individualusers signatures, and extract a collection of timings for each digraph. This process isillustrated in figure 3.9. This means that we can take in to account the variability of eachindividual digraph, which is advantageous as a user may vary certain digraphs more thanothers.

These collections of digraph timings form the users model that we will authenticateupon. To perform authentication of a candidate signature, we simply use the t-test tocompare each digraph of the candidate signature with the corresponding collection of di-graphs in the users model. The values from each t-test are summed together to give asimilarity score. If the score is below a certain threshold value, then the candidate signa-ture is deemed as belonging to the user and it passes authentication. This process is shown

in figure 3.10.

27


29/72

Figure 3.9: A diagram showing how digraphs are collated for use in authentication

28


30/72

Figure 3.10: A diagram showing how the t-test can be adapted to perform authentication

29


31/72

To assess the suitability of this t-test authentication method, we can perform authenti-cation attempts using the data gathered from our experiments. For each volunteer we willsimulate attempts from the volunteer in question, from impostors who have not learned thephrase (acquired from another groups secondary phrase), and from impostors who havelearned the phrase (acquired from other volunteers in the same group). The proportionof incorrectly authenticated and rejected signatures will measure the performance of thistechnique. To get these measures for each volunteer we perform the following procedure:

Decide on a threshold value to compare against the results of the t-test.

Gather the volunteers signatures that are usable for authentication.

For each of the usable temporal signatures, exclude it such that it is not used formodelling. We then compare the remaining signatures to the excluded signatureusing t-test authentication and classify the result using the given threshold. Thenumber of legitimate signatures rejected gives the false rejection rate.

Gather the same number of naive impostor signatures as there are usable signatures,and compare each naive impostor signature to the collection of usable signaturesusing t-test authentication. As before classify with the threshold. The number ofsignatures accepted gives us the false authentication rate for naive impostors.

Perform the same with impostors who have learned the phrase to get the false au-thentication rate for trained impostors.

By averaging the false rejecting and false authentication rates, we can get a clear idea ofthe performance of t-test based authentication for that given threshold value. Then byrepeating this process and varying the threshold value, we can experimentally determinethe optimum threshold.

3.4.2 Using Self-organising Maps for Identification

Self-organising maps (SOM), invented by Teuvo Kohonen, provide a way to represent mul-tidimensional data in a lower number of dimensions while still preserving the topologicallayout of the data. SOMs are notable for being an unsupervised learning algorithm. Insteadof using target values or categories to guide the learning process, it uses the structures and

relationships within the input data. Input data simply consists of a collection of vectorswith uniform length.

30


32/72

Figure 3.11: A diagram showing the components of a self-organising map.

As shown in figure 3.11, a SOM is comprised of a lattice of nodes, usually in two di-mensions. Each node holds a weight vector of the same dimension as the input vectors.Initially all the weights in the network are randomised and the following steps are iterativelyapplied:

1. A random vector is chosen from the input data.

2. Every node is compared to this input vector using the euclidean distance between

the nodes weights and the input vector. The node which is most like the given inputis marked as the best matching unit (BMU)

3. The neighbourhood of the BMU calculated. This is a collection of surrounding nodesthat fall within a certain distance from the BMU. This distance - the radius of theneighbourhood - is a value that starts large but decreases with each iteration.

4. Each of the nodes in the neighbourhood has its weights altered to make them morelike the input vector. Nodes that are closer to the BMU will have their weightsaltered to a greater extent. The influence of the BMU is shown in figure 3.12.

31


33/72

Figure 3.12: A diagram showing the influence of a BMU within its neighbourhood.

For time step t, the radius r is calculated using the following formula:

R(t) =R0exp

t

(3.2)

is a time constant that is calculated from i, the total number of iterations to perform,and n the radius of the lattice:

= i

log(n)) (3.3)

When shifting a nodes weight towards an input vector, the amount adjusted is a pro-portion of the difference between the input and the weight. This proportion is known asthe learning rate. To make this relationship follow a gaussian curve, the influence of thedistance, , at a given point in time, is modelled as:

(t) = exp

dist2

2R2(t))

(3.4)

As we have just discussed, the learning rate decreases the further away it is from theBMU. However, this learning rate, L, also decreases with respect to time much in the sameway the radius does:

L(t) =L0exp

t

(3.5)

32


34/72

Taking all of this into account, for a input vector, V , the new value of a nodes weightvector is calculated as:

W(t+ 1) =W(t) + (t)L(t)(V(t)W(t)) (3.6)

One the map has been trained, it can then be visualised. Each input vector can besuperimposed on the map, at the position of its best matching unit, revealing the structureof the data.

33


35/72

4

Results

Now that we have collected some experimental data we shall analyse it. Each of thefollowing sections will investigate one of our three hypotheses. As our volunteers wereeffectively users of our software, we shall use the term volunteer and user interchangeably.

4.1 Passphrase Fragmentation

The first of our three hypotheses concerns the way in which a volunteer initially frag-ments a passphrase into sections. To explore this we need a way to visualise the patterns

present in the first few experiments of each volunteer in such a way that they can be ex-amined and compared. The visualisation in figure 4.1 shows our attempt at creating sucha visualisation.

Figure 4.1: Visualisation showing the fragmentation patterns present when volunteers areconfronted with new phrases

34


36/72

It shows the fragmentation patterns of the first four entries for both of the two phrases,for all volunteers. Each rectangle represents a passphrase entry. The vertical lines indi-cate points at which the volunteer stopped entry to look at the passphrase. The spatialposition of each line indicates at what point in the entry this occurred, where the leftmostside of the rectangle is the beginning of the passphrase, and the rightmost the end. Thusthe collection of lines inside a rectangle represents the fragmentation pattern of that givenentry. When this visualisation was first created it did not include fragmentation patternspresent in failed entries. However, sometimes the most informative patterns are displayedin failed entries, so they were introduced but displayed using a different colour line.

An initial examination of this visualisation reveals that volunteers use a wide varietyof fragmentation patterns. Many entries have no fragmentation at all, while others aresplit into 4 pieces. This visualisation also reveals there is some degree of consistency be-tween how they enter both phrases. In particular it shows that fragmentation of the firstpassphrase seems to be a strong indicator that the volunteer will fragment the secondpassphrase. However, some volunteers (6,7,8) perform little to no fragmentation upon thefirst phrase, but considerably fragment the second phrase. This could be due to the factthat these volunteers had become used to the format of the experiment and were surprisedby the sudden change to a different passphrase, thus altering their behaviour.

While consistency between both phrases isnt perfect, there are plenty of examples wherea particular pattern is repeated in both passphrases. For instance, volunteer 15 has a dis-tinct way of breaking the passphrase into two chunks. On the other hand, there are a

surprising number of volunteers who demonstrate very little fragmentation, so any per-ceived consistency is simply the lack of any pattern. Interestingly, volunteers will often failentry one or more times, but still not resort to looking at the phrase. This is a surprisingbehaviour but it unfortunately has resulted in less fragmentation patterns than was hopedfor.

However, some of the failed attempts did yield patterns. And when there is an entry thathas both patterns for failed and successful attempts, the two patterns tend to reinforceeach other. These can be seen as the purple lines on the visualisation. This demonstrateshow volunteers will repeat fragmentation strategies.

Another interesting pattern is that some volunteers demonstrate convergence of frag-ments from one pattern to the next. This is in the sense that two separate fragments inone phrase join to become a single fragment in the next. Good examples of this includevolunteers 1 and 6 on phrase one. This is revealing of how chunks of information are ma-nipulated in memory during the learning process.

35


37/72

Figure 4.2: Graph showing the occurrence of different fragmentation patterns, taken fromthe first entry of each phrase

The chart in figure 4.2 shows the number of different fragmentation patterns presentin the initial entry of each passphrase. Given 20 volunteers who entered 2 pass phrases,we have 40 of such patterns. Interestingly the most common pattern is no pattern at all,

accounting for 40% of all initial entries. The second most popular behaviour is to splitthe passphrase into two section. Beyond this, are a variety of other patterns. Generally,it seems that longer patterns (more chunks) are less popular, and even-size chunks arepreferred.

Some of the more unusual patterns could be due to the user getting familiar with ourexperiment by exploring its functionality. If this is the case, then this is something weshould consider carefully for future experiments.

36


38/72

4.2 Learning & Stabilisation

Our second hypothesis focuses around the progression of the temporal signature from avolunteer learning to type a passphrase. In particular, we are hoping to examine how longit takes for them to form a stable entry, where they enter the passphrase in a consistentway. To investigate this we shall consider entry duration, passphrase prompting behaviourand digraph stability. As before, we do so using a mixture of visualisation and graphingtechniques.

4.2.1 Entry Duration

Because of its simplicity, time taken to enter a passphrase is a good first measure toinvestigate when considering the learning process. As we explored previously, we expectthe entry time to decrease logarithmically, following what is known as the learning curve.Figure 4.3 shows the duration of each entry averaged across all volunteers. As is indicatedby the logarithmic trend line, the entry time does decrease following a learning curve.

Figure 4.3: The average entry time for each

One interesting feature of figure 4.3 is that it shows a increase in the average entry du-ration at entry 20, which is the point at which volunteers are no longer shown the phrase

37


39/72

by default, and must either recall it completely from memory, or elect to prompted thepassphrase once more. Interestingly, it looks like the pattern from entry 20 onwards mightfollow another smaller exponential curve. It is likely this represents volunteers adaptionto the change in the experiment.

Previously we mentioned that giving groups of volunteers different passphrases to learnwould allow us to see if the rate of learning is consistent between different random passphrasesof the same length. Figure 4.4 shows how the average duration of entry progresses for eachof the three phrase groups. On the whole, it seems to substantiate this idea as the threeseries follow similar logarithmic curves. However, there is an interesting discrepancy forthe group fhtvlqwlesf on the second entry. Investigating this, it turns out there is ananomalous signature, where one volunteer began entry, but for reasons unknown pausedfor over 60 seconds before completing it.

Figure 4.4: Average entry time for each group

In terms of finding the point at which passphrase entry is performed in a stable fashion.it seems that from approximately the 25th entry onwards, the entry duration is reasonablyconsistent. While this is encouraging, one could argue that extrapolation of the trend lineindicates that the duration will keep reducing. However, due to the nature of logarithmiccurves, its questionable as to whether any reduction in duration beyond this point will beobservable over noise in the data.

38


40/72

4.2.2 Passphrase ObservationAnother way in which we can explore volunteers learning process is by looking at how andwhen they decide to observe the passphrase. As previously explained in detail, volunteerscan switch between entering and viewing the phrase, but not do both at the same time.Initially the experiment defaults to displaying the passphrase, but from entry 21 and on-wards, the experiment will default to not showing the passphrase and volunteers will haveto choose if they wish to reveal it again.

Figure 4.5: Visualisation of when and how volunteers elected to view the passphrase

The way in which volunteers view the passphrase is visualised in figure 4.5. This visu-alisation is advantageous as it it allows us to follow each volunteers progress through the

39


41/72

different stages of the learning process. In this visualisation each row represents a volun-teer and each column represents a particular entry number. As there were two entries persession, the entries are paired together in twos. Each rectangle indicates a single entry ofthe passphrase. The colour of this rectangle represents how the volunteer decided to viewthe passphrase. The significance of each colour is as follows:

Red indicates they saw the passphrase at the beginning, returned to look at it againafter they started typing.

Orange indicates they looked at the passphrase at the start and only the start.

Pink indicates they looked at the passphrase, but not at the start, only after theyhad begin typing it.

Green indicates they completed entry without looking at the passphrase.

The blue line between entries 20 and 21 represents the point at which volunteers are nolonger prompted by default. It is worth noting that it is impossible for volunteers to get agreen mark before this point, as they will always be initially shown the passphrase.

There are several interesting trends displayed in this visualisation. The most generalpattern is the progression from frequently looking at the phrase, to looking at it periodi-cally, to recalling it completely from long term memory.

Just from eyeballing the data, it seems that a little over half the volunteers managed

to really commit the phrase to memory. This can be judged by looking for consecutivesequences of green boxes after approximately entry numbers 25 to 36. Interestingly, thosethat repeatedly refer back to the phrase at around this point progress from red to orangeboxes (in the first half of the experiment) later than other volunteers. This implies thatthey are further back in their learning progress and potentially learn this task at a slowerrate.

One volunteer is particularly anomalous as they never really seem to learn the phrase.They consistently refer back to the prompt and only begin to show signs of learning itnear the very end. A possible explanation is that this volunteer simply formed a habit ofrelying on being prompted, and felt little need to learn the passphrase.

Another common pattern displayed in this visualisation stems from the fact that eachsession has two entries. Typically, in the first entry, the volunteer will look at the phraseto remember it, and in the following second entry, they will enter it straight from workingmemory. This pattern is particularly prominent in session 11, entries 21 and 22, wherethere is an orange, green pattern.

Looking at this visualisation reveals a possible limitation in the design of this experi-ment. This is the fact that volunteers may have entered the passphrase without looking

40


42/72

at it (represented by a green box), before entry 21, but because the experiment alwaysshowed them the phrase in sessions before this point, they were not offered the chance to.

In conclusion, this visualisation shows that most volunteers learn the passphrase in some-where between 20 and 25 entries. This is consistent with figure 4.3, which shows entry timestabilising at around this point.

4.2.3 Travel-time Stability

While duration gives us a reasonable indicator of stability, it overlooks any of the nuancesof the temporal signatures. A better way is to examine the digraphs of the travel timesbetween keys. To do this we came up with charts that show these digraph times for eachvolunteer. The charts for all volunteers can be found in the appendix, but four particularlyinteresting charts are shown in figures 4.6 and 4.7. These charts can be read as follows:

Each series represents a certain digraph.

Each point signifies that digraphs time in a certain temporal signature.

A points position on the y-axis signifies the time delay (gap) between the two lettersin that digraph.

A points position on the x-axis indicates which entry number this digraph belongsto. Therefore if you added all the times at a single x axis position, you would get

the duration of that entry. (minus dwell times) Each chart is scaled to 3000ms. This is because the dynamics we are interested in

are sub 3000ms and also allows us to compare different volunteers easily.

These charts are informative, because if a series line is straight, it implies that the di-graph it represents is stable. Conversely, an uneven series line indicates instability.

Looking at figure 4.6, we can see that the digraphs times for volunteer 5 are initially veryunstable. There are many high peaks on the chart, which are are good indicators for wherethe passphrase was broken into fragments. For this volunteer, entry does not become par-ticularly stable until about signature number 25. Interestingly, beyond this point digraph

5 is consistently larger than the others, a promising feature for identification. Volunteer 9,also show in figure 4.6 contrasts with volunteer 5. In particular, it is comparatively muchstabler and does so from a much earlier point. In terms of identifying features, digraph 2seems to be consistently longer than the other digraphs.

Figure 4.7 juxtaposes two volunteers with very different temporal signatures. Volunteer11 forms a stable signature very quickly and has digraph timings that are remarkablyconsistent. On the other hand, the digraphs in the signatures from volunteer 15 are com-paratively erratic. Volunteer 15 never really seems to learn the phrase, and nearly always

41


43/72

enters it in two chunks.

In conclusion, these charts indicate that after a certain number of entries, the digraphtiming converge downwards towards reasonably stable values. An interesting trend is thatdigraphs that are originally on fragment boundaries (where the volunteer stops entry andrefers to the passphrase) tend to remain longer than other digraphs even when the phrase islearnt and digraphs are relatively stabilised. In other words, the way in which a volunteerinitially fragments a passphrase seems to set a precedent for how they will eventually learnto type it. Examples of digraphs that display this pattern include digraph 5 for volunteer 5,digraph 2 for volunteer 9 and digraph 4 for volunteer 11. Generally, it seems that digraphsinside original fragment boundaries are consistently small, while digraphs that are betweenthese fragments tend to be proportionally longer. We shall explore how this relates tochunking in a later section.

42


44/72

Figure 4.6: Digraph stabilities for volunteers 5 & 9

43


45/72

Figure 4.7: Digraph stabilities for volunteers 11 & 15

44


46/72

4.3 Identification & Authentication

The last of our three hypothesises is concerned with the extent to which we can use theidiosyncrasies and nuances in a volunteers typing behaviour to distinguish them. We shallinvestigate two very different techniques, students t-test and self-organising maps.

4.3.1 T-test Authentication

As was explored in the previous section, most people do develop a typing behaviour thatproduces a stable temporal signature. Our first method of determining the usefulness ofthese signatures is Students t-test. To correctly model each user, we had to devise a way

of gathering usable signatures. It was found that signatures which the volunteer typedin one go without referring to the prompt, and do not contain any digraphs greater than1500ms, were suitable for this purpose. It is worth noting that one volunteer was droppedfrom this section of the experiment for having too few usable signatures.

With the usable signatures from each volunteer, a series of authentication attempts weresimulated to find the optimal threshold for the t-test. Figure 4.8 shows the results of thesesimulations.

Figure 4.8: A graph showing how changing the authentication threshold affects authenti-cation rates.

The three series on the graph show the average error rates for three distinct authentica-tion scenarios. The first series (blue) represents the false rejection rate (FRR) of legitimate

45


47/72

users attempt to authenticate. In this case the error rate is based on the number of timesusers were falsely rejected. The second series (red) represents the false authentication rate(FAR) of naive impostors who have not learned the phrase. The simulated attempts atauthentication were gathered from the final phase of the experiment, where each groupsacts as an impostor for another group. The error rate is the number of falsely authenticatedimpostors. The final series (yellow) represent the false authentication rate of learned im-postors. In this case the simulated attempts were gathered from other volunteers insidethe same group as the volunteer being tested.

As expected, increasing the threshold decreases the FRR but increases the FARs. Theoptimal balance is where the sum of both the FRR and the FAR are as low as possible.However, this point varies depending on the kind of impostor. When the impostor is naiveand has not practiced entering the phrase the optimal threshold is around 74, with 6.6%legitimate users falsely rejected and 10% naive impostors falsely authenticated. However,if the impostor has learned the passphrase, then the optimal threshold is more strict, av-eraging around 52. In this case 17% of legitimate users are falsely rejected and 16% oftrained impostors are falsely authenticated. This performance difference between the cal-ibre of impostor is to be expected. A more interesting trend is that as the threshold isincreased, the error rate for naive impostors seems to increase linearly, but the error ratefor trained impostors seems to follow a logarithmic curve. As this was surprising, we ranthe simulation again with higher threshold values. The resulting graph displayed in figure4.9 shows that the two false authentication rates do in fact both follow a curve, and thatthey both eventually converge.

Figure 4.9: A graph showing false authentication rates for naive and trained impostorsconverging

46


48/72

4.3.2 Self-organising MapsSelf-organising maps are our second technique for exploring the usability of signatures.Given that SOMs are non-linear and allow unsupervised modelling, they contrast nicelywith t-test authentication and give us a qualitatively alternative approach. As SOMs pre-serve the topological properties of their input, assessment of our data shall performed byvisualising the output of each SOM. Like the t-test authentication, we shall only includeusable temporal signatures (using the same criteria).

To get the best results with a SOM, there are several parameters that must be tuned.The two most important of these are the learning rate, and the initial neighbourhoodsize. So the first goal was to find good values for these parameters. It was found that

increasing the learning rate increased the density of clusters in the map and increasingthe initial neighbourhood size increased how spread out the map was within the availablespace. It was determined that depending on the input data, there were two combinationsof parameters that seemed to produce particularly good results: learning rate: 0.8, initialneighbourhood size: SOMSize/4 and learning rate: 0.5, initial neighbourhood size: SOM-Size/3. Because these parameters could be pre-tuned in a production environment, andfor the sake of brevity, we shall present each SOM with its most optimum parameters. AllSOMs in this section were trained over 1000 iterations.

In our visualisation each node in the maps network is represented as a rectangle. Eachfilled in rectangle represents a particular passphrase entry, its colour representing which

volunteer the entry belongs to. Each of these passphrase entries is placed over the nodethat its digraph timings are closest to (using euclidian distance). It is worth reiteratingthat as the SOM is being trained, it does not know which volunteer each entry belongsto; it simply arranges them according to the topography of the input. Thus the horizontaland vertical dimensions of the map do not correspond to any particular values. They areactually an approximate 2 dimensional representation of the spacial arrangement betweenthe digraphs in higher order space.

Our first SOM, shown in figure 4.10, displays the 10 dimensional travel time digraphsof all volunteers mapped out into a 2 dimentional space. An initial look at this SOMis fairly promising. While not all entries are located in meaningful positions and somevolunteers are scattered across the map, many of the entries do form distinguishable and

interesting patterns. For instance, entries of certain volunteers form very clear clusters,while others occupy less dense but equally well defined spaces. This implies that to beuseful, a collection of temporal signatures doesnt have to all be clustered around a centralpoint. They can in fact vary to a greater extent than one may imagine and still remainsuitable for authentication as long as they form contiguous regions in hyperspace. In otherwords, the user can vary their input as long as the variations collectively form a gradualcontinuum and are not extremely different. This is more apparent in the SOM of thedigraphs for passphrase group zmlntdzqdmg, pictured in figure 4.11, where one can see

47


49/72

several different volunteers forming well defined, but not circular regions.

Figure 4.10: Self-organising Map of the digraph times from all volunteers

48


50/72

Figure 4.11: Self-organising Map of the digraph times from volunteers in group zmlnt-dzqdmg

49


51/72

Figure 4.12: Self-organising Map of the digraph times from volunteers in group fhtvlqwlesf

50


52/72

Figure 4.13: Self-organising Map of the digraph times from volunteers in group vrfpebvvewe

The SOM of group fhtvlqwlesf shown in figure 4.12 demonstrates reasonable separationbetween the different volunteers. On the other hand the SOM of groupvrfpebvvewe, shownin figure 4.13, demonstrates the weakest patterns out of all three phrase groups. In partic-ular, volunteer 9 has signatures that fall right across the entire map. This is surprising ascross referencing back to figures 4.5 and 4.6 indicates that this volunteer learns to type thephrase quickly and enters it in a very stable fashion. One explanation is that their entryof the passphrase is in fact too perfect and lacks any idiosyncratic features. This wouldindicate that some users are inherently more easy to identify than others.

In conclusion, SOMs demonstrate very promising results. Not only do they provide arobust way for visualising higher order data, but the patterns shown are clear enough toconsider maps worthy of solving the identification problem. Furthermore, the output ofSOMs could be used as the input for more conventional machine learning techniques suchas k-means clustering and neural networks. We shall explore these possibilities in a latersection.

51


53/72

4.3.3 Dwell TimingsThus far the focus of this investigation has mainly been on the travel timings betweenkeystrokes, known as digraphs. One feature of the temporal signatures that we have ne-glected up to this point is the duration each key is held down for, known as dwell. Becausesome existing work has asserted that the dwell times carry important biometric informa-tion [13], we shall now take the time to explore this feature. To do this we shall repeat ourinvestigation of t-test authentication and self-organising maps, but this time, rather thanextract inter-key digraph times from the temporal signatures, we shall instead extract thekey dwell times. Other than this, we require no further adaptations to our methods, asboth digraphs timings and dwell timings are simply number vectors.

Figure 4.14 shows the performance of t-test based authentication when using dwell timesas feature vectors. As before, we consider the FRR of legitimate volunteers and the twoFARs of naive and trained impostors. From this graph we can see the dwell times doindeed seem to carry enough idiosyncratic information to give moderate performance forauthentication. A threshold of 52 produces a FRR of 30% and yields 10% for both FARs.However, unlike digraphs there is only a marginal difference between the FARs of thetwo different calibre of impostors, with naive impostors being only marginally easier tocorrectly reject. In other words, learning the phrase seems to have little effect on the dwelltimes. Furthermore, it seems much easier to exclude naive impostors with digraphs thanit is with dwells.

Figure 4.14: A graph showing how changing the threshold affects authentication rates whenusing dwell times as features

52


54/72

Figure 4.15 shows a SOM that has been trained on the dwell times. Our technique isthe same as before, the only difference being that instead of using 10 dimensional digraphvectors, we are instead using 11 dimensional dwell timing vectors. As might be expectedfrom the results of the t-test authentication, dwell timings do indeed seem to carry enoughdistinguishing information for a SOM to arrange the entries of many of the volunteers intodiscrete regions. The cohesiveness of these regions is not perfect, but the patterns shownin the map are far from random.

Figure 4.15: Self-organising Map of the dwell times from all volunteers

While this is a good indicator that dwell time is a useful feature, we feel that due tothe nature of this experiment, perceived idiosyncratic features and distinguishable patternsmay in fact be attributable to the mechanical variations between the different keyboardsused by our volunteers. Variations in key thicknesses and key springing mechanisms couldwell affect the dwell time captured by our experiment. However, as was seen in figure 4.14,there is a very slight variation between the error rates of learned and naive impostors whenusing dwell timings, so there is a still a possibility dwell times are useful.

53


55/72

5

Discussion

5.1 Evaluation of Hypotheses

5.1.1 How people intuitively fragment a passphrase

We hypothesised that due to variations in human chunking abilities, the way in whichvolunteers fragment passphrases would vary between individuals. The evidence we col-lected goes some way in supporting this. What was surprising however, is the number ofvolunteers who did not perform any fragmentation at all. However, although our dataset

is small, there were a variety of different fragmentation patterns among the volunteers whodecided to fragment it. It was even apparent that some volunteers had particular strate-gies, such as always dividing the phrase into two roughly equal size sections. Because webuilt the experiment specifically to capture the way in which the user switches betweenlooking the phrase and entering the phrase we were able to capture behaviour like thisparticularly easily.

We also mentioned the possibility of using these behaviours as part of a users biometricprofile. While our data does show some volunteers with consistent behaviour many ofthe volunteers had very different behaviour on each passphrase they entered. To be moreconclusive, we would need a dataset that captured volunteers being introduced to manymore phrases.

In terms of the cognitive science of chunking, there are two interesting trends in the data.Firstly, it appears that volunteers prefer to digest the passphrase in as few a fragmentsas possible. Given that fragments are great indication of the data is divided into chunksin working memory, this implies that people have an aversion to needlessly creating extrachunks in working memory. The second observation is that volunteers prefer chunkingpatterns that create even sized chunks. This suggests that there is some cognitive mech-anism for efficiently dividing up data. This is precisely the kind of phenomena exploredby Simon [14] who quotes that we can increase the amount of information that workingmemory contains simply by building larger and larger chunks, each chunk containing moreinformation than before.

To conclude, the hypothesis seems to be partially supported. However, while providinginsights that help us understand the passphrase learning process, our results indicate thatfragmentation patterns carry limited practical use for keystroke authentication.

54


56/72

5.1.2 When learning to type a passphrase, how long does it takean individual to produce a stable signature?

We hypothesised that over the duration of the experiment the temporal signatures pro-duced by volunteers would converge to stable values. We feel that our data supports thishypothesis. In particular, our use of charts to display the progression of digraph timesacross the span of the experiment showed series lines that tend towards stable values. Wealso verified that, for a random alphabetic phrase, it takes volunteers somewhere between20 and 30 entries to reach a point at which entry is performed in a stable fashion. Weconfirmed that during this process the average entry time follows a logarithmic learningcurve.

When looking at the visualisation which depicted the volunteers observation behaviour,one could distinguish three distinct phrases. Initially, volunteers look back and forth be-tween prompt and entry very often. We assume that at this stage the passphrase mostlyexists in working memory. Next, volunteers will progress to the point where they havemostly learned the passphrase, but still need to be prompted to assist in recalling it. Thisindicates that at the very least, at this stage, the passphrase is encoded in short termmemory. Finally, towards the end of the experiment volunteers reach a point at which thepassphrase enters long term memory and typing it becomes and implicit motor task. Onelimitation with the design of the experiment is that because it will always prompt volun-teers up until a certain point, it is hard to detect the point at which volunteers progressedto this final stage.

Although more subtle than other patterns in the data, there was some evidence to sug-gest that the way in which a user initially fragments a phrase sets a precedent for how theywill eventually type the passphrase. This is something that merits further investigation.

Out of all the people who took part, volunteer 14 was particularly fast and consistentin how they learned to type the passphrase. In particular, they managed to completeentry of the primary passphrase without ever having to fragment it. After the experimentwe talked to the volunteer and they claimed to have lots of experience working in retail,where they have to frequently memorise the random sequences of numbers and letters onbarcodes. While anecdotal, this supports the idea that certain skills involving workingmemory can be improved by practice. In this instance, its quite possible that volunteer 14has developed a learning schema that facilitates the memorisation of short alpha-numericcodes.

In conclusion, there is fairly good evidence for progressive formation of stable typingpatterns. A glance at the digraph charts shown in the appendix reveals that many (butnot all) volunteers show this progression. Although the data is noisy there are still distinctpatterns that show digraphs converging on certain values. Thus we consider this hypothesisto be supported reasonably well.

55


57/72

5.1.3 Keystroke-biometric authenticationWe initially hypothesised that it would be possible to use temporal signatures to distin-guish between individuals. As we researched and developed our techniques we realised thatthis task can be split into two specific sub-problems - authentication and identification.

The first problem is that of authenticating a user. This involves making a binary decisionas to whether a candidate signature belongs to the profile of the user it claims to be. Wedemonstrated that an adaption of students T-test showed to be reasonably effective at thistask. In doing so we noted that there is a trade offbetween the rates of falsely authen-ticated impostors and falsely rejected users. This is a parameter that could be tuned asdesired in a real life system.

We also showed that the T-test is a lot better at detecting naive impostors than it istrained ones. Much of the T-tests ability to catch out naive impostors comes from the factthat they enter the passphrase in a slow and unsure way. When it comes to distinguishingbetween users who have learned the same phrase, it is much harder as the only features towork with are subtle idiosyncrasies and small nuances. While in many ways this a truertest of distinguishing between people, in a practical authentication scenario it is unlikelythat the impostor will have really learned the phrase. Then again, if keystroke biometricauthentication systems gained popularity then this would not necessarily be the case.

The second problem is the task of identifying a user by matching a candidate temporal

signature onto the correct owner in a collection of known user profiles. We demonstratedthat self-organising maps are reasonably capable of using digraphs to divide up volunteersinto zones and clusters in the output space.

While our investigation into this stopped here, we think that identification should bepossible by finding the best matching unit for a candidate passphrase, and assign it anidentity based on the owner of the signatures that surround the BMU. In the regions ofthe SOM that are populated by more than one user this technique should be able to reflectthis uncertainty by reporting that a signature matched more than one user.

With refinement SOMs could be used in conjunction with other machine learning meth-ods. For example, the output of a SOM could be used as the input for a clustering technique

such as k-means clustering. Ideally, centroids could converge on clusters and be able toclassify regions of the search space. However, some of the patterns are

Documents

Green Proj