47
Ertuğrul Mikail Tunç, abjz066 Ertuğrul Mikail Tunç, login: abjz066, Supervisor: Dr. Ilir Gashi 2012-2013 City University London Information Systems BSc (Hons) Final Year Project Report Academic Year: 2012-13 A Comparative Analysis of the Hardware and Software Component Performance on a Virtualised Environment Vs. a Physical Environment By Ertuğrul Mikail Tunç Project supervisor: Dr. Ilir Gashi 5 August 2013

PRD_abjz066 1.0

Embed Size (px)

Citation preview

Page 1: PRD_abjz066 1.0

Ertuğrul Mikail Tunç, abjz066

Ertuğrul Mikail Tunç, login: abjz066, Supervisor: Dr. Ilir Gashi 2012-2013

City University London

Information Systems BSc (Hons) Final Year Project Report

Academic Year: 2012-13

A Comparative Analysis of the Hardware and

Software Component Performance on a Virtualised

Environment Vs. a Physical Environment

By

Ertuğrul Mikail Tunç

Project supervisor: Dr. Ilir Gashi

5 August 2013

Page 2: PRD_abjz066 1.0

Project Report Document

Version

Version Author Date Description

0.1 Ertuğrul Mikail Tunç 03/03/2013 Initial document skeleton

0.2 Ertuğrul Mikail Tunç 06/05/2013 Introduction

0.3 Ertuğrul Mikail Tunç 14/05/2013 Literature review

0.4 Ertuğrul Mikail Tunç 01/07/2013 Method

0.5 Ertuğrul Mikail Tunç 08/07/2013 –

17/07/2013

Database Analysis

0.6 Ertuğrul Mikail Tunç 18/07/2013 –

21/07/2013

Web Analysis

0.7 Ertuğrul Mikail Tunç 27/07/2013 –

31/07/2013

Improved presentation and readability of document, improved quality of data in results section

0.8 Ertuğrul Mikail Tunç 01/08/2013 –

02/08/2013

Conclusion

0.9 Ertuğrul Mikail Tunç 03/08/2013 Final review; minor changes

1.0 Ertuğrul Mikail Tunç 04/08/2013 Final Piece

Page 3: PRD_abjz066 1.0

Project Report Document

Contents

A Comparative Analysis of the Hardware and Software Component Performance on a

Virtualised Environment Vs. a Physical Environment ................................................. 1

Contents .............................................................................................................. 2

1 Introduction ..................................................................................................... 3

2 Outputs Summary ............................................................................................. 6

3 Literature Review.............................................................................................. 7

4 Method ........................................................................................................... 9

4.1 Analysis .................................................................................................... 9

4.2 Design and Implementation .......................................................................... 10

4.3 Evaluation ................................................................................................ 18

5 Results ........................................................................................................... 19

5.1 Database Test Results ................................................................................. 20

5.2 Summary .................................................................................................. 30

5.3 Web Test Results ....................................................................................... 31

5.4 Summary .................................................................................................. 39

6 Conclusions and Discussions .............................................................................. 40

6.1 Introduction .............................................................................................. 40

6.2 Project Objectives and Research Questions ...................................................... 40

6.3 General Conclusions ................................................................................... 40

6.4 Implications for Future Work ........................................................................ 42

6.5 Project Management ................................................................................... 42

7 Glossary ........................................................................................................ 43

8 Appendices .................................................................................................... 44

Page 4: PRD_abjz066 1.0

Project Report Document

1 Introduction

Virtualisation is one of the hot topics when it comes to Information Technology. Businesses

are moving to virtualised technologies at an exceedingly alarming rate. Previously, only large

corporations would have the cash-flow and technical expertise to implement a virtualised

platform; and even then, only servers or desktops would be virtualised.

Today, the influx of virtualisation software, services and hardware means that even small and

medium sized businesses are taking advantage of virtualisation. Software developers are

pushing their mobile (take Instagram as an example), desktop (the popular file syncing

application Dropbox uses Amazon S3 virtualised storage to store user files) web applications

(easyJet’s seating service utilises Microsoft’s Windows Azure virtualisation technology) and

much, much more.

Gartner research from 2012 shows that “virtualisation penetration has surpassed 50% of all

server workloads”1

Other research and analysis undertaken by Guy Rosen shows that the number of instance

requests for the Amazon EC2 service for a single region in a 24 hour period was

approximately 50,0002. This is indeed a staggering number of instance requests and although

the analysis is based on black-box observations (Amazon does not make this information

publicly available), I would not be surprised if the numbers were not far off from the truth.

These trends indicate that virtualisation is only going to gain more ground and penetrate more

audiences. By this I mean that it will not be limited to highly specialised technical personnel

as it was many years ago; it will be available to people who are not very technical (technical

with respect to virtualisation and hardware operations) who are, for example, looking to

deploy a database instance on the ‘cloud’ somewhere so their mobile app or web service can

interact with. The images in Appendix A paints this shift from the requirements of using this

1 Magic Quadrant for x86 Server Virtualization Infrastructure. 2013. Magic Quadrant for x86 Server Virtualization Infrastructure. [ONLINE] Available at: http://www.gartner.com/technology/reprints.do?id=1-1AVRXJL&ct=120612. [Accessed 08 April 2013]. 2 Anatomy of an Amazon EC2 Resource ID :: Jack of all Clouds :: Guy Rosen on Cloud Computing. 2013. Anatomy of an Amazon EC2 Resource ID :: Jack of all Clouds :: Guy Rosen on Cloud Computing. [ONLINE] Available at: http://www.jackofallclouds.com/2009/09/anatomy-of-an-amazon-ec2-resource-id/. [Accessed 09 April 2013].

Page 5: PRD_abjz066 1.0

Project Report Document

technology being highly technical to not very technical; it shows that the average John Doe

can come along and shift a few sliders here and there in order to deploy a virtual environment.

The reason I mention all of this is to make it very clear that virtualisation and cloud (the two

go hand-in-hand) is becoming more and more prevalent and we are beginning to see a shift of

business users demanding virtualisation without necessarily understanding the technical

advantages and disadvantages of the technology; it’s just seen as the next ‘cool’ thing that

‘everyone else is doing’.

The question I will answer in this final year project dissertation is one of performance;

whether workloads (such as a SQL database and a web application) will be affected by putting

them on a virtualised platform. For example, will web application X perform any better or

worse when it is on the physical hardware as compared to on a virtualised hypervisor?

As there are a countless number of scenarios which could be tested with regards to this

hypothesis, I will be limiting the scope of my research in a way that will mean keeping the

configuration static but making the results and analysis as comparative as possible (i.e.,

installing the exact same testing software, running the exact same tests, keeping the

configuration between the virtual and physical OS as similar as possible, etc)

The main objectives of this research project is to compare the performance of different

workloads on both physical and virtual infrastructure and conclude with sufficient evidence

which infrastructure performs better under the different workloads and by how much (in some

measurable form). I aim to do this by:

o Running a variety of tests to try and achieve a realistic approach to testing. These tests are

documented fully in the test plan references in Appendix B. This test plan contains all of

the tests I plan to run along with configurations of the tests, number of users, test types and

definitions.

These tests will be run from a ‘load injector’; a (LAN) server I will build which will be

running the tests against the virtual and physical servers.

o The different types of applications I will be running my tests against are as follows:

SQL Database

Page 6: PRD_abjz066 1.0

Project Report Document

Web application

o I shall collect performance data using monitoring tools such as:

PerfMon (Windows performance monitoring utility)

ESXi performance monitoring metrics (host level metrics)

The data from these tools are to be collected for the duration of the tests that are run.

Analysis of the data will consist of the following: Descriptive comparisons of the main software

and hardware component metrics such as CPU utilisation (hardware), SQL database

transactions (application). These will be well described, graphed and put in tables after having

been analysed. The two servers (virtual and physical) will be closely compared so every test

analysed should have a minimum of two sets of data.

Underlying causes and hypothesis will be observed as to why (if any) inconsistencies occur in

test results (e.g., if a test against a SQL DB consumes 100% in test A but only 25% in test B, I

will thoroughly investigate the causes behind test A’s results and present my theories to the

reader).

Page 7: PRD_abjz066 1.0

Project Report Document

2 Outputs Summary

In this section I will briefly go over the outputs that have been produced.

Test Plan

o The test plan briefly defines the definitions of test terms as well as describing the

test cases

o The output is a test plan document

o The recipient/end-user will be anyone who would like to scrutinise the research

and/or carry out similar research

o Recipients could use this research to make informed decisions as to what type of

technology to use when deploying an application and/or server.

o The test plan can be found in Appendix B

Analysis of the tests

o One of the main outputs of this research project is the analysis of the tests run.

o The output is in the form of graphs and supporting text

o The recipient/end-user will be anyone who is interested and/or researching the

field

o Recipients could use this research to make informed decisions as to what type of

technology to use when deploying an application and/or server.

o The analysis can be found in section 5 – Results

o The main findings are that the Physical machine out performs the virtual

machines in every test and that the virtual machine which is over-

committed on physical resources does significantly worse than both the

physical and virtual machines.

o The physical machine also had a higher throughput rate than both virtual

machines in all tests.

Page 8: PRD_abjz066 1.0

Project Report Document

3 Literature Review

This chapter summarises the reading undertaken in order to enable me to complete the project.

VMware performance benchmarking technical documentation (VMware, n.d.)

.

VMware performance benchmarking documentation provided the technical

information about how test beds were set-up as well as how performance was

monitored. For example, The Zimbra Collaboration Server Performance on VMware

vSphere 5 documentation outlines the methodology and how the performance test

environment is set-up. I used some of the ideas from here; e.g., keeping the

environments as similar as possible in both native and virtual environments to make

results as valid as possible

Also, the VMware documentation clearly states the exact configuration used in the

tests. I then incorporated this in to this research project by stating as much of the

configuration as I could.

JMeter user manual (The Apache Software Foundation, n.d.)

JMeter is a free, open-source performance testing tool developed in Java. It is widely

used in performance testing circles and allows a performance tester to be very flexible

with the creation of test cases and scenarios. It allows performance testing of many

known standards such as HTTP, TCP, VT100 terminals, JDBC etc.

As there is a lot to know about the configuration and set-up of the scenarios, I often

referred back to the JMeter user manual to learn what a certain configuration or option

performed.

Page 9: PRD_abjz066 1.0

Project Report Document

Microsoft TechNet library – Performance Monitor Getting Started Guide for Windows Server

2008 (Microsoft, 2007)

Performance Monitor is a native in-built tool in all modern versions of Windows. It

allows you to define performance metrics to monitor on a system, log that data to a file

via a schedule and output that data to a csv file. There are hundreds of performance

metrics to choose from so I decided to choose the main ones like CPU, disk, memory

and application specific metrics such as SQL processor time.

The TechNet library has an extensive list of the performance monitoring metrics and

lists the format and how they are collected. For example, some metrics are averaged

out over the collection interval (15 seconds by default) but some are just ‘snapshots’ of

the performance data. For analysis purposes, it is important to understand how the data

is formatted otherwise there could be questions over the validity of the results.

Frontiers of High Performance Computing and Networking – ISPA 2006 Workshops (Min, Di

Martino, Yang, Guo, & Ruenger, 2006)

This book studies the performance and scalability of compute intensive commercial

server workloads (specifically a Java server workload benchmark called

SPECjbb2005). Many of my methodologies were derived from this book; for example,

how tests were set-up, what type of data to log, what types of tests to run. The below

text extract is the methodology used by the authors – it is very similar to the

methodology I used. i.e., employ a platform running the VMware hypervisor, run

performance monitoring tools, run the tests on the native and virtual platforms and then

analyse the results.

“Our measurement-based methodology employs an Intel Xeon platform running with

[…] We run out workloads on the Xen hypervisor and employ several performance

tools […] to extract overall system performance […] and architectural behaviour […].

For comparison, we also measure the performance of SPECjbb2006 in a native (non-

virtualized) platform. We expect that the findings from out [sic] study will be useful to

VMM architects and platform architects as it provides insight into performance

bottlenecks” (page 465)

Page 10: PRD_abjz066 1.0

Project Report Document

4 Method

4.1 Analysis

I have worked in the performance testing and capacity management field for just over 3 years

and a trend I have noticed is that many organisations are moving their applications from

traditional, physical based infrastructure to virtualised environments such as Amazon’s AWS

and Microsoft’s Windows Azure. Elastic capacity, easier management over product

development and pricing are usually the key points for moving from physical to virtual.

IT savvy people in general have always stated that virtualisation would have an overhead over

physical and throw a percentage in the air (one that I hear often is 5-10%) but we would never

have any up-to-date technical analysis to back these figures or statements up.

Therefore, for this research project I decided to design test beds, load test scenarios and

performance monitoring in order to analyse the data and see if I could quantify virtualisation

overhead on more modern IT infrastructure (infrastructure being more modern CPU

architecture) and software (software being the virtualisation hypervisor).

During my analysis of the field of virtualisation and my own personal experiences I learnt that

there are many variations of hardware set-ups, configurations, types of workload (for

example, a storage server or application that serves mathematical intensive transactions) and

workload configurations for me to possibly test in the given timeframe. Therefore a caveat of

this research is that the conclusion should be taken only as a guideline and performance

testing/benchmarking should be performed to fit one’s own environment and set-up.

Page 11: PRD_abjz066 1.0

Project Report Document

4.2 Design and Implementation

Three types of scenarios were designed for this research project. Scenario 1 is based on

testing against the physical infrastructure. Scenario 2 is based on testing against the virtual

infrastructure with no over-commitment and scenario 3 was inspired by how public cloud

providers maximise their hardware utilisation by using a virtualisation technology called

‘Over-commitment’ or in other words, over contention of the physical resources.

What this means in practice is that a physical resource can be shared by multiple virtual

machines; resulting in cheaper running costs for the cloud provider because they are

maximising utilisation on hardware already provisioned.

For example, let’s pretend I provide computing resources to customers and I currently have 1

physical machine with a Quad core processor and 8 GB of RAM. If a customer 1 decides they

want to put a 4 core, 8GB virtual machine on my host then they have used up 100% of the

physical capacity meaning no one else can put a VM on the same host. Realistically speaking,

it is unlikely that this customer will always use 100% of the processor time and 100% of the

memory allocation. This means there is potential for a lot of hardware waste.

This is where the over-commitment technology comes in (which most, if not all public cloud

providers use) which means that customer 2 can come along and provision a second VM with

(according to the how much over-commitment is allowed by the hypervisor which is set by

the Administrator) 2 cores and 4GB of RAM. This means that there is a higher likelihood of

hardware utilisation and less ‘waste’. Of course this also means there is more of a chance of

contention which would be a worst case scenario whereby customer 1 and customer 2 are both

highly utilised.

From a technical point of view this means hardware resource sharing which is performed by

the hypervisor which has a performance impact on the virtual machine.

Page 12: PRD_abjz066 1.0

Project Report Document

Each test within a scenario is run three times to account for variation from different runs.

Scenario 1 – Physical environment

Test 1 – Web Application Testing

Test 2 – Database Application Testing

Scenario 2 – Virtual Machine environment

Test 1 – Web Application Testing

Test 2 – Database Application Testing

Scenario 3 – Virtual Machine with over-commit environment

Test 1 – Web Application Testing

Test 2 – Database Application Testing

I designed the test scenarios for the web and database application using an open-source,

popular performance testing tool called Apache JMeter; as explained in the Literature Review

section.

The test scenarios will be available as a .jmx file as part of this research.

The test plan (Appendix B) details the two workloads I will be testing, the type of tests I will

be running along with the test terminology.

Page 13: PRD_abjz066 1.0

Project Report Document

Database scenario:

The database scenario consists of a number of typical database transactions across two tables.

SQLStess.Users and SQLStress.Orders.

The Orders table was pre-populated with 1 million rows of test data and the Users table was

left empty.

First I needed to prepare the database and tables. I created the database (named SQLStressn)

using the SQL Management Studio interface. I created the tables and fields using the SQL

script shown in Figure 4-1.

Figure 4-1

CREATE TABLE SQLStress.Users

(

Username varchar(64),

FirstName varchar(255),

LastName varchar(255),

UniqueId varchar(255),

Password binary(20)

)

CREATE TABLE SQLStress.Orders

(

Username varchar(64),

OrderName varchar(255),

OrderId varchar(64)

)

Then there was a need to populate the Orders table so that I could perform somewhat realistic

inner-joins between the two tables.

I ran the script in Figure 4-2 1 million times to generate the required data.

Figure 4-2

INSERT INTO SQLStress.Orders (Username,OrderName,OrderId)

VALUES ('${userName}','This is an order

description','${uniqueId}')

Page 14: PRD_abjz066 1.0

Project Report Document

The parameters ( ${parameterName} ) are passed through a .csv file. An example of the data

in the .csv file can be seen in Figure 4-3

Figure 4-3

userName,firstName,lastName,uniqueId,password

username1000001,firstname,lastname,1000001,username1000001

username1000002,firstname,lastname,1000002,username1000002

After this preparation of the database, there is the actual test scenario.

The important configuration to note can be seen in Figure 4-4 (the Literature Review section

provides a reference to the JMeter user manual which further explains the terms):

Figure 4-4

Configuration Value Notes

Number of threads (users) 50 The number of virtual users active in the test

Ramp-Up Period (in seconds) 10 How long before all threads will start running.

Ramping them up too fast can cause issues (for

example if you try to start 500 threads in 1

second on a web server, it will likely result

in a denial of service)

Constant delay (in

milliseconds)

500 Constant delay/wait time of 500 ms in between

transactions to make the throughput profile

more realistic. For example on a website it is

realistic for a user to wait a short time

before clicking on the next transaction.

Random delay (in

milliseconds)

500 The random delay goes hand-in-hand with the

constant delay above.

A random delay of up to 500 ms. This makes the

total delay between transactions between 500ms

(constant delay means the minimum wait time is

500ms) and 1000ms (1000ms being the maximum

wait time as the random can be a maximum of

500ms)

Duration of test (in

minutes)

60 Each test ran for a whole hour

Page 15: PRD_abjz066 1.0

Project Report Document

There are a total of 5 transactions per loop. So each thread will go through all 5 of the below

transactions (with the delay between each transaction) and start again.

Three SELECT statements

Figure 4-5

SELECT TOP 100 *

FROM ${table}

WHERE FirstName = 'firstname'

One INSERT statement

Figure 4-6

DECLARE @HashThis binary(20);

SELECT @HashThis = CONVERT(binary(20),'${password}');

INSERT INTO

SQLStress.Users(Username,FirstName,LastName,UniqueId,Pas

sword)

VALUES

('${userName}','${firstName}','${lastName}',${uniqueId},

HASHBYTES('SHA1', @HashThis))

One INNERJOIN statement

Figure 4-7

SELECT SQLStress.Orders.OrderId,

SQLStress.Users.Username

FROM SQLStress.Orders

INNER JOIN SQLStress.Users

ON SQLStress.Orders.Username= SQLStress.Users.Username

WHERE SQLStress.Users.Username = '${userName}'

Page 16: PRD_abjz066 1.0

Project Report Document

Web Application scenario:

The web application is a basic prime number generator created in ASP.NET 4. The code can

be seen in Appendix D.

In simple terms, when the user visits the page, they are served a static page with a text field.

In this field the user enters a number which defines how many prime numbers to generate.

The first step to load the static page is the GET request and the entering of the number and

sending that to the web server and getting a response back is the POST request.

The important configuration to note can be seen in Figure 4-8:

Figure 4-8

Configuration Value Notes

Number of threads (users) 50 The number of virtual users active in the test

Ramp-Up Period (in seconds) 10 How long before all threads will start running.

Ramping them up too fast can cause issues (for

example if you try to start 500 threads in 1

second on a web server, it will likely result

in a denial of service)

Constant delay (in

milliseconds)

500 Constant delay/wait time of 500 ms in between

transactions to make the throughput profile

more realistic. For example on a website it is

realistic for a user to wait a short time

before clicking on the next transaction.

Random delay (in

milliseconds)

500 The random delay goes hand-in-hand with the

constant delay above.

A random delay of up to 500 ms. This makes the

total delay between transactions between 500ms

(constant delay means the minimum wait time is

500ms) and 1000ms (1000ms being the maximum

wait time as the random can be a maximum of

500ms)

Duration of test (in

minutes)

60 Each test ran for a whole hour

Page 17: PRD_abjz066 1.0

Project Report Document

Figure 4-9 and 4-10 show the two web transactions that were executed in the testing.

The GET transaction:

Figure 4-9

http://${server}/stress

Method: GET

The POST transaction:

Figure 4-10

http://${server}/stress

Method: POST

Parameters:

__VIEWSTATE: This is a dynamic value used by ASP.NET web pages

to persist changes to the state of web forms across post backs.

__EVENTVALIDATION: Ensures events raised on the client

originate from the controls rendered by ASP.NET

td1: 500: This is a variable defined in Default.aspx which

links to the number of prime numbers to generate

b1: Submit – button submit

Page 18: PRD_abjz066 1.0

Project Report Document

Infrastructure and Server Set-up:

The physical infrastructure runs on the following specifications:

Dell PowerEdge 1950 rack server

Two Intel Xeon E5345 Processors

Eight 1GB IBM RAM 667MHz ECC Buffered PC2-5300F FRU: 39M5784 P/N:

38L5903

Two 73.4 GB 10K RPM, IBM eServer ST973401SS, SAS drives

The virtual infrastructure runs on the following specifications:

VMware ESXi, 5.1.0, 799733

The virtual machines will run on the same underlying physical hardware as described

above

The reason why there are two SAS hard-drives is that one has Windows Server 2008 R2

installed whereas the other has VMware ESXi installed as the hypervisor with Windows

Server 2008 R2 installed as a virtual machine on that hypervisor. When I want to run a test on

the virtual or physical set-up, I simply swap the drives.

Figure 4-11 is a table of the server configurations during each scenario and test.

Figure 4-11

Scenario Test Configuration

Scenario 1 (Physical Machine) and

Scenario 2 (Virtual Machine)

Web Application and

Database Application

Two CPUs, 8 Cores, 8GB RAM

One Server 2008 R2 Enterprise

Scenario 3 (Virtual Machine with over-

commit)

Web Application and

Database Application

Two CPUs, 8 Cores, 8GB RAM

One Server 2008 R2 Enterprise

1 CPU, 4 Cores, 4GB RAM

One Ubuntu Desktop

Page 19: PRD_abjz066 1.0

Project Report Document

Database Application Set-up:

SQL Server 2008 R2 Enterprise

Default configuration

Web Application Set-up:

IIS 7.5

Default configuration

4.3 Evaluation

I completed all of the evaluation and analysis using Microsoft Excel. I used formulas, Pivot

tables and graphing to analyse and present the data in this research paper.

All of the data will be provided as part of the research so that there is opportunity to expand

on this research or customise the output to suit the reader.

Please see Appendix D for the location of these files.

Page 20: PRD_abjz066 1.0

Project Report Document

5 Results

In this section of the research project, I will discuss all of the data I have analysed.

The types of results I will show are as follows:

Database Test Results

o 90th, 95th and 99th Percentiles for Transaction Response Times (3 graphs for the

database transactions – SELECT, INSERT and INNERJOIN)

o Full distribution line graph of response times (3 graphs for the database

transactions – SELECT, INSERT and INNERJOIN)

o CPU (1 graph showing average CPU utilisation across the three environments)

o Memory (1 graph showing average working set for the SQL Server process for

all three environments)

o Disk (1 graph showing disk seconds per write across the three environments)

Web Test Results

o 90th, 95th and 99th Percentiles for Transaction Response Times (2 graphs for the

web transactions – GET and POST)

o Full distribution line graph of response times (2 graphs for the web transactions

– GET and POST)

o CPU (1 graph showing average CPU utilisation across the three environments)

o Memory (1 graph showing average working set for the W3WP process for all

three environments)

o Disk (1 graph showing disk seconds per write across the three environments)

Page 21: PRD_abjz066 1.0

Project Report Document

5.1 Database Test Results

The first set of results are percentiles of each individual database transaction response time.

E.g., 90% of users will see a transaction response time of x milliseconds for the INNERJOIN

database transaction.

As per Figure 5-1, the response times for the INNERJOIN SQL transaction show that the

Physical database server performs slightly faster than the virtual and significantly faster than

the over-committed virtual machine.

Across the percentiles, the transactions on the virtual environment are 4.6% slower than the

physical and the transactions on the over-committed virtual machine is 33.8% slower than the

physical.

Figure 5-1

Page 22: PRD_abjz066 1.0

Project Report Document

In Figure 5-2, we can see the average response times of the INNERJOIN statement throughout

the duration of the test (60 minutes).

The pattern we observe here is similar to the one in Figure 5-1 where the Physical machine

performs better than the other two.

We also see here that the behaviour of the over-committed virtual machine is more erratic when

compared to the other two which seem more stable.

This behaviour can be explained by CPU scheduling on the hypervisor due to contention for

the physical resources.

Figure 5-2

Page 23: PRD_abjz066 1.0

Project Report Document

Figure 5-3 indicates that the response times for the SELECT SQL transaction follow a similar

pattern whereby the Physical performs the fastest, followed by the virtual and over-committed

virtual.

The average response time for the 90th and 95th percentile show that the virtual machine is

approximately 6% slower than the physical machine.

Analysis also shows that the over-committed virtual machine is 50.6% slower than the

physical machine.

Figure 5-3

Page 24: PRD_abjz066 1.0

Project Report Document

In Figure 5-4, we see consistently higher response times for both virtual machines.

Again we can observe the erratic behaviour of the over-committed virtual machine.

Figure 5-4

Page 25: PRD_abjz066 1.0

Project Report Document

Figure 5-5 show the response times for the INSERT SQL transaction. Again a similar pattern

emerges whereby the virtual machine is on average 5.7% slower on the 90th and 95th

percentiles.

An unexpected result here is that the 99th percentile stats show that the virtual machine is 1

millisecond faster than the physical machine however the difference is too low to be of any

significance.

The over-committed virtual machine is consistently slower than the physical machine at an

average of 46% slower.

Figure 5-5

In Figure 5-6, we see consistently higher response times for both virtual machines. Again there

is erratic behaviour of the over-committed virtual machine.

Page 26: PRD_abjz066 1.0

Project Report Document

Figure 5-6

Figure 5-7, 5-8 and 5-9 show the transactions per minute of each SQL transaction. As the

response times in the graphs above indicate that Physical performs better, virtual slightly

worse and over-committed virtual significantly worse, we would expect to see this reflected in

the transaction throughput. i.e., if the transaction is quick to complete then we should see

more of them. If the transaction is slow then we should see less of them due to longer

processing time.

This is exactly what we see here where the physical exceeds in transaction throughput only

slightly over the virtual machine and significantly exceeds throughput as compared to the

over-committed virtual machine.

Average calculations across the three SQL transactions show that the virtual machine

executes 1.4-1.5% less transaction throughput per minute and the over-committed virtual

machine executes 9.9% less transaction throughput per minute.

Page 27: PRD_abjz066 1.0

Project Report Document

This means that for a system which pushes through high volumes of transactions, the

performance and throughput of these transactions can be impacted by using virtualisation and

public cloud technologies.

Figure 5-7

Figure 5-8

Page 28: PRD_abjz066 1.0

Project Report Document

Figure 5-9

Figure 5-10 indicates that CPU utilisation on all three environments are almost exactly the same –

approximately 372-374% CPU utilisation average (out of 800% because the systems utilise 8 cores).

Figure 5-10

Page 29: PRD_abjz066 1.0

Project Report Document

The data from Figure 5-11 was taken from vSphere. This data is collected directly from the hypervisor

and shows us the CPU utilisation (out of 100%) for each object.

As per the table in the Method section (Scenario 3), we know that the hardware allocation was as

follows:

Over-committed VM = 2 CPUs, 8 Cores, 8GB RAM

Ubuntu Desktop = 1 CPU, 4 Cores, 4GB RAM

To briefly explain how to interpret the graph I will go through the interpretation of the Windows

Server 2008 R2 virtual machine. We can see that the average CPU usage of the Windows Server 2008

R2 server was running at an average of 44% utilised; this means 44% utilisation out of the 2 CPUs

allocated. This makes sense because we can see on the previous page that the CPU utilisation was

approximately 374% out of 800%.

The important note to take from this graph is the average CPU utilisation of the Ubuntu virtual

machine as it tells us how much contention was present during the test.

The figure is 57% utilised which equates to 28.5% over-contention on the test Windows Server VM.

Figure 5-11

Page 30: PRD_abjz066 1.0

Project Report Document

Figure 5-12 shows the average SQL server working set. There is nothing unusual to report here as the

memory of the three servers are between 1 and 1.2GB.

Figure 5-12

Figure 5-13 shows us the disk seconds per write. This tells us how many seconds it takes to complete a

write on the disk. Although the numbers may look insignificant, remember that on a high throughput

production system, there could easily be hundreds of thousands or even millions of writes per second

where these numbers could mean the difference between low and high response times.

The average disk seconds per write on the physical was 0.0035 seconds OR 3.5ms.

On the virtual machine this was 4.7ms and on the over-committed virtual machine it was 5.3ms.

Figure 5-13

Page 31: PRD_abjz066 1.0

Project Report Document

5.2 Summary

In summary, the analysis shows that the Physical machine performs the fastest in terms of

transaction response times, throughput and component metrics as compared to the two virtual

machines.

We see that the INNERJOIN, SELECT and INSERT transactions on the virtual machine take

an average of 4.6%, 6% and 5.7% longer to complete respectively. Throughput is also 1.5%

less on the virtual machine.

The over-committed virtual machine takes 33.8%, 50.6% and 46% longer to complete for the

same transactions and executes approximately 10% less throughput.

Lastly, analysis shows that the component metric with the most impact on performance is the

seconds per disk write as shown in Figure 5-13. Briefly, the virtual machine took an average

of 34% longer to perform a disk write and the over-committed virtual machine took an

average of 51% longer to complete a disk write as compared to the physical machine.

The numbers are in the milliseconds however on a system with a high number of writes (and

potentially reads as I suspect a similar performance degradation would apply) this could be

the cause of a disk bottleneck and cause end-users poor performance.

Page 32: PRD_abjz066 1.0

Project Report Document

5.3 Web Test Results

The first set of results are percentiles of each individual web transaction response time. E.g.,

90% of users will see a transaction response time of x milliseconds for the GET HTTP

transaction.

Figure 5-14 shows us the response times for the GET transaction. We can see that the Physical

environment outperforms the two virtual environments as expected and the virtual outperforms

the over-committed virtual which is also expected.

Across the percentile figures, the Virtual machine is 210% slower at completing the transaction

than the Physical and the over-committed virtual machine is 596% slower than the Physical.

Figure 5-14

Page 33: PRD_abjz066 1.0

Project Report Document

Figure 5-15 shows very distinctively the differences in performance of the three environments

Another interesting pattern to note here is that the transaction response time is quite steady on

the Physical environment whereas the response times on the two virtual machines are not only

higher but also more erratic. This could mean that on a web server with more dynamic and more

content-full web pages, users could experience a very varied response time figure.

Figure 5-15

Page 34: PRD_abjz066 1.0

Project Report Document

Figure 5-16 shows us the response times for the POST transaction. Again, the Physical

outperforms the two Virtual environments however the performance of the two Virtual

environments are almost similar and crossover on the 99th percentile statistic.

My calculations show that across the percentiles, the Virtual environment was an average of

33.7% slower than the Physical environment and the over-committed Virtual machine was

36.3% slower than the Physical.

Figure 5-16

Figure 5-17 shows the response time graph of the POST request. The differences in response time are

not as substantial as they are for the GET request however you can still see the physical machine

outperforms the two virtual machines.

The erratic pattern also exists here for the two virtual machines.

Page 35: PRD_abjz066 1.0

Project Report Document

Figure 5-17

Figure 5-18 and 5-19 show the transactions per minute of each Web transaction (GET and

POST). As the response times in the graphs above indicate that Physical performs better,

virtual slightly worse and over-committed virtual significantly worse, we would expect to see

this reflected in the transaction throughput. i.e., if the transaction is quick to complete then we

should see more of them. If the transaction is slow then we should see less of them due to

longer processing time.

This is exactly what we see here where the physical machine exceeds in transaction

throughput over the virtual machine and significantly exceeds throughput as compared to the

over-committed virtual machine.

Average calculations show that the virtual machine performs 2.1% less transaction throughput

per minute and the over-committed virtual machine performs 4% less transaction throughput

per minute.

Page 36: PRD_abjz066 1.0

Project Report Document

Figure 5-18

Figure 5-19

Page 37: PRD_abjz066 1.0

Project Report Document

Figure 5-20 indicates that CPU utilisation profile is unlike what we saw for the database tests

(Figure 5-10) where the CPU utilisation was quite static, all within a 2% range of each other.

Here you can immediately see the over-committed VM is not only consuming more CPU

(approximately 40-50% more – so about half a CPU core) but it also has an erratic profile.

Figure 5-20

The only explanation I could come up with to explain this profile was the CPU Stolen time

which was recorded in Windows PerfMon via the ESX hypervisor’s APIs.

CPU Stolen time tells us the time the VM was able to run but not scheduled to run (potentially

because of the over-contention of physical resources due to the Ubuntu VM)

We can see the CPU Stolen Ms in Figure 5-21 which shows us an erratic profile during the

three tests on the over-committed virtual machine. Although the wait times may seem

insignificant, any delay is a direct hit on the performance of the virtual machine.

Page 38: PRD_abjz066 1.0

Project Report Document

Figure 5-21

The data from Figure 5-22 was taken from vSphere and is similar to the one I discussed in

Figure 5-11.

This data is collected directly from the hypervisor and shows us the CPU utilisation (out of

100%) for each object.

As per the table in the Method section (Scenario 3), we know that the hardware allocation was

as follows:

Over-committed VM = 2 CPUs, 8 Cores, 8GB RAM

Ubuntu Desktop = 1 CPU, 4 Cores, 4GB RAM

The important note to take from this graph is the average CPU utilisation of the Ubuntu

virtual machine as it tells us how much contention was present during the test.

The figure is 50% utilised which equates to 25% over-contention on the test Windows Server

VM.

Page 39: PRD_abjz066 1.0

Project Report Document

Figure 5-22

There is nothing unusual to report for the web server working set.

Figure 5-23

Page 40: PRD_abjz066 1.0

Project Report Document

The disk activity in Figure 5-24 on the Web testing is similar to the profile we saw in the Database test

analysis.

The average disk seconds per write on the physical was 0.0068 seconds OR 6.8ms.

On the virtual machine this was 7.9ms and on the over-committed virtual machine it was 9.6ms.

Figure 5-24

5.4 Summary

In summary, the performance of the Physical machine outperformed the performance of the

two virtual machines; a result similar to that of the Database server testing (Summary 5.2).

We see that the GET and POST transactions on the virtual machine take an average of 210%

and 33.7% longer to complete respectively. Throughput is also 2.1% less on the virtual

machine.

The over-committed virtual machine takes 596% and 36.3% longer to complete for the same

transactions and executes approximately 4% less throughput.

Component performance analysis also shows that the over-committed virtual machine CPU

utilisation is about half a core higher as well as a lot more erratic than the other two machines

(Figure 5-20).

The last component analysis is with regards to the seconds per disk write. The results here are

similar to those in the Database testing (Summary 5.2). The virtual machine took an average

of 16% longer to do complete a disk write and the over-committed virtual machine took an

average of 41% longer to complete a disk write as compared to the physical machine.

Page 41: PRD_abjz066 1.0

Project Report Document

6 Conclusions and Discussions

6.1 Introduction

In this chapter I will tie up and discuss the main project themes.

6.2 Project Objectives and Research Questions

Pages 5 and 6 in this paper outline the research question and objectives.

Briefly, the objectives were to run performance tests against a web and SQL server with

performance monitoring enabled and analyse the test results and performance log data.

I was to use the output of this analysis to answer the research question which was to find out

which environment was better for performance for the two workloads (the web application

and SQL server).

All the objectives were successfully met and the research question answered which I will

discuss in section 6.3

6.3 General Conclusions

Here I will discuss my main findings and conclusions of the research project.

My main finding is that I found the performance of the Physical machine far superior in every

test run as compared to the two virtual machines.

The virtual machine which had no levels of over-commitment performed only slightly worse

than the Physical machine and I found that the over-committed virtual machine performed the

worst in every test.

This answers the research question as I can definitively say that the Physical machine was better

for performance for the two tested workloads.

However I cannot give a definitive number or percentage as to how much worse the virtual

machines perform as the results varied in the two scenarios tested (SQL and web application).

Page 42: PRD_abjz066 1.0

Project Report Document

This is to be expected as they are two different workloads and the hypervisor may work better

with one type of load than another.

What I can say is that generally speaking, the overhead of virtualization management is

marginal (a few percent) and the overhead of an over-committed virtual machine can be

significantly more dependent on the level of over-commitment. i.e., the higher the over-

commitment, the higher the contention will be which will mean greater performance

degradation and sporadic activity.

Of course, it all completely depends on the system configuration, hypervisor configuration,

levels of over-commitment and how modern the hardware is (more modern CPU architectures

support more virtualization instructions which mean less hypervisor overhead).

It is also important to note that over-commitment is not inherently a bad thing. Over-

commitment, if done well can be an excellent use of capacity and the only way it can be done

well is if the workloads and systems are understood properly. For example, if I know that

System A will consume x resources at a certain time of the day, there is no problem over-

committing resource x to System B at other times of the day. Unfortunately the problem is that

in a public cloud environment (such as Amazon Web Services) it is almost impossible to

characterize the systems and workloads making it very difficult to over-commit; thus causing

contention of the resources.

My final point and recommendation is in line with the authors of the Frontiers of High

Performance Computing and Networking (Min, Di Martino, Yang, Guo, & Ruenger, 2006)

book where they say:

“[…] virtualization is currently not used in high-performance computing (HPC) environments.

One reason for this is the perception that the remaining overhead that VMMs introduce is

unacceptable for performance-critical applications and systems” (page 475)

I strongly believe that at this moment in time, virtualization is not the way forward for high

performance computing which depends on millisecond responses.

For all other non-HPC environments, virtualization should be considered as a technology as it

has many potential benefits to an organizations IT.

Page 43: PRD_abjz066 1.0

Project Report Document

6.4 Implications for Future Work

I believe work in this area can be improved by putting in more time in to testing different

types of workload scenarios, using more modern hardware and attempting to try out different

levels of over-commitment on test virtual machines. Also another interesting test would be to

see how different hypervisors perform to see if hypervisor x performs workload y more

efficiently than hypervisor z.

6.5 Project Management

In this section, I will discuss my own progress in the project including management and

control, what I have learnt and would I could have done differently.

Management and control of the project was done via keeping up with deadlines that my

supervisor and I had set and agreed upon. If deadlines could not be met then they were

extended along with a reason why and usually followed up with a brief meeting with my

supervisor.

During the project I learnt many things but one of the most important to me was the project

management aspect; making sure I could meet my deadlines and if not, to keep everyone

involved (my supervisor in this case) updated so that they can provide some useful advice or

suggestions.

I learnt that it is important to start projects as early as possible as not everything goes to plan

and the more time you have, the less pressure there is to solve any issues you may come

across. If I had my time to do this project again, this is one area I would have definitely taken

more seriously as this was a real issue during my project where tests were re-run many times

and test tools were changed due to inconsistencies in the results.

Page 44: PRD_abjz066 1.0

Project Report Document

7 Glossary

Term Definition

Over-commitment Over-commitment is a virtualisation term for the over-

commitment of physical resources to more than one

virtual machine.

For example, for a host which has a total capacity of 8GB

RAM, you can give 8GB RAM to two virtual machines.

The host’s memory would be over-committed in this case.

Contention Usually when there is over-commitment on physical

hardware, there is contention.

This means that two or more virtual machines are

contending for the same physical resource.

High performance computing or HPC Typically HPCs are used to solve advanced computation

problems and depend on near real time response times

and stability – something which cannot be guaranteed by

virtual machines at this time.

Example applications which can be classified as HPC

uses are as follows:

Data mining

Simulations

Modelling

Visualisation of complex data

Rapid mathematical calculations

Performance monitor or PerfMon This is a built-in Windows tool to define and collect

performance metrics

Page 45: PRD_abjz066 1.0

Project Report Document

8 Appendices

A. Please see the Project Definition Document titled Volume 1 – Project Definition Document which will be attached after these Appendices

B. Windows Azure and Amazon (AWS) graphical representations of launching instances

Page 46: PRD_abjz066 1.0

Project Report Document

C. Test Plan

Please see the Test Plan document titled Volume 2 – Test Plan which will be attached after these Appendices

D. Research results Excel files

Please see the Excel documents provided on CD as part of the research submission

E. Prime Number ASP.NET framework version 4 application

Default.aspx

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs" Inherits="_Default"

%>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"

"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head runat="server">

<title></title>

</head>

<body>

<form id="form1" runat="server">

<div>

Enter number of prime numbers needed: <asp:TextBox id="td1" runat="server" /><br />

<asp:Button ID="b1" OnClick="submitEvent" Text="Submit" runat="server" /><br /></br />

<asp:ListBox ID="lb1" runat="server" AutoPostBack="true"></asp:ListBox>

</div>

</form>

</body>

</html>

Default.aspx.cs

using System;

using System.Collections.Generic;

using System.Linq;

using System.Web;

using System.Web.UI;

using System.Web.UI.WebControls;

public partial class _Default : System.Web.UI.Page

{

protected void Page_Load(object sender, EventArgs e)

{

if (!IsPostBack)

{

td1.Text = "10";

}

}

protected void submitEvent(object sender, EventArgs e)

{

int needed = 10;

if (!string.IsNullOrEmpty(td1.Text))

{

Int32.TryParse(td1.Text, out needed);

}

lb1.Items.Clear(); //Remove all items

foreach (int a in generatePrimeNumbers(needed))

{

lb1.Items.Add(a.ToString());

Page 47: PRD_abjz066 1.0

Project Report Document

}

}

private List<int> generatePrimeNumbers(int total = 10)

{

int rand = new Random().Next();

int count = 0;

List<int> primeNumbers = new List<int>();

while (count != total)

{

if (isPrime(rand))

{

primeNumbers.Add(rand);

count++;

}

rand++;

}

return primeNumbers;

}

private bool isPrime(int val)

{

if ((val & 1) == 0)

{

if (val == 2)

{

return true;

}

else

{

return false;

}

}

for (int i = 3; (i * i) <= val; i += 2)

{

if ((val % i) == 0)

{

return false;

}

}

return val != 1;

}}

Web.Config

<?xml version="1.0"?>

<!--

For more information on how to configure your ASP.NET application, please visit

http://go.microsoft.com/fwlink/?LinkId=169433

-->

<configuration>

<system.web>

<compilation debug="true" targetFramework="4.0"/>

</system.web>

</configuration>