80
1 Dealing with Continuous Data Processing, ConFoo 2012 DEALING WITH CONTINUOUS DATA PROCESSING Michael Peacock, February, 2012

Dealing with Continuous Data Processing, ConFoo 2012

Embed Size (px)

Citation preview

Page 1: Dealing with Continuous Data Processing, ConFoo 2012

1 Dealing with Continuous Data Processing, ConFoo 2012

DEALING WITH CONTINUOUS DATA PROCESSING

Michael Peacock, February, 2012

Page 2: Dealing with Continuous Data Processing, ConFoo 2012

2 Dealing with Continuous Data Processing, ConFoo 2012

WHAT’S IN STORE

• Case study: large data focused project• Challenges and solutions we faced• Dealing with legacy code and legacy issues• Application availability• Handling incoming data• Processing incoming data• Storing the data• Querying the data quickly• Reporting against the data• Keeping the application responsive

Page 3: Dealing with Continuous Data Processing, ConFoo 2012

3 Dealing with Continuous Data Processing, ConFoo 2012

A DISCLAIMER

• What we have done isn’t perfect• Requirement changes, rapid growth and the

move from contractors to an in-house team meant everyone learnt more as we went along

• Some decisions have been good• Some decisions have been bad• We have to live with both because of the size

of our application and cost of alterations

• We have learnt a lot; and we have lots of tips for architecture, code, planning and scaling

Page 4: Dealing with Continuous Data Processing, ConFoo 2012

4 Dealing with Continuous Data Processing, ConFoo 2012

MICHAEL PEACOCK

Web Systems DeveloperTelemetry TeamSmith Electric Vehicles Corp

Lead Developer / Technical Author / Occasional Speaker

@[email protected]

Page 5: Dealing with Continuous Data Processing, ConFoo 2012

5 Dealing with Continuous Data Processing, ConFoo 2012

SMITH ELECTRIC VEHICLES

Page 6: Dealing with Continuous Data Processing, ConFoo 2012

6 Dealing with Continuous Data Processing, ConFoo 2012

NO. NOT MILK FLOATS (ANYMORE)

ALL ELECTRIC, COMMERCIAL VEHICLES.

Photo courtesy of kenjonbro: http://www.flickr.com/photos/kenjonbro/4037649210/in/set-72157623026469013

Page 7: Dealing with Continuous Data Processing, ConFoo 2012

7 Dealing with Continuous Data Processing, ConFoo 2012

ALL-ELECTRIC COMMERCIAL VEHICLES

Page 8: Dealing with Continuous Data Processing, ConFoo 2012

8 Dealing with Continuous Data Processing, ConFoo 2012

ELECTRIC VEHICLES

• Quick Specs• 16,500 – 26,000 lbs gross vehicle weight• 7,121 – 16,663 lbs payload• 50 – 240km• Top Speed 80km/h

• Applications• Flat bed• Military applications• Mini busses• School busses

Page 9: Dealing with Continuous Data Processing, ConFoo 2012

9 Dealing with Continuous Data Processing, ConFoo 2012

DATA CHALLENGES FROM ELECTRIC VEHICLES

• New, continually evolving technology• Viability evidence required• Government research• Customers wanting to continually monitor performance• How much energy are they using• How much does each charge cost• How far have they travelled• How much energy did they regenerate• Have their drivers been converted correctly

Page 10: Dealing with Continuous Data Processing, ConFoo 2012

10 Dealing with Continuous Data Processing, ConFoo 2012

ENTER TELEMETRY

• Monitor the vehicles CANBus*• Drive train• Battery management system• Battery modules• Charger

• Monitor additional sensors• GPS Location• Climate

• Collect and broadcast information• Team: Contractors to staff

Page 11: Dealing with Continuous Data Processing, ConFoo 2012

11 Dealing with Continuous Data Processing, ConFoo 2012

HUGE DATA VOLUMES

2,500pieces of data, per second, per vehicle

Page 12: Dealing with Continuous Data Processing, ConFoo 2012

12 Dealing with Continuous Data Processing, ConFoo 2012

CURRENT STATS

• ~ 500 telemetry enabled vehicles in service• Telemetry has now been made a standard

feature for all new vehicles that roll of the production line

• Large MySQL solution• 1.5 billion inserts per day• Constant minimum of 4000 inserts per second

• More data on the way

Page 13: Dealing with Continuous Data Processing, ConFoo 2012

13 Dealing with Continuous Data Processing, ConFoo 2012

INITIAL MANDATE

For a small number of vehicles, collect a small amount of data that we can provide to

customer-authorised grant authorities.

Page 14: Dealing with Continuous Data Processing, ConFoo 2012

14 Dealing with Continuous Data Processing, ConFoo 2012

INITIAL SOLUTION

• Single table of data• Key field to identify the vehicle• Vehicles communicating directly with our servers

Page 15: Dealing with Continuous Data Processing, ConFoo 2012

15 Dealing with Continuous Data Processing, ConFoo 2012

NEW MANDATE

• Collect everything• We need data for:

• Grant authorities• Customer performance data• Service issues• Warranty issues• Technology evaluation and continuous improvement• It’s all important!

• Display data in real time• Keep and display historical data• Calculate and show performance data

Page 16: Dealing with Continuous Data Processing, ConFoo 2012

16 Dealing with Continuous Data Processing, ConFoo 2012

DATA STARTS TO INCREASE

• Initial team took a sledge hammer to the huge database table:• Created a separate database

for each vehicle

http://www.flickr.com/photos/robotapocalypse/245508884/

Page 17: Dealing with Continuous Data Processing, ConFoo 2012

17 Dealing with Continuous Data Processing, ConFoo 2012

http://www.flickr.com/photos/holyoutlaw/5920882576

Page 18: Dealing with Continuous Data Processing, ConFoo 2012

18 Dealing with Continuous Data Processing, ConFoo 2012

SCALING

Initial application level sharding meant we could scale out with new databases and new database servers as the number of vehicles went up.

It doesn’t help us if the data we collect, or the data retention policy, increases; it also still leaves us with slow running queries.

Page 19: Dealing with Continuous Data Processing, ConFoo 2012

19 Dealing with Continuous Data Processing, ConFoo 2012

PROBLEM #0: INSERTS

• We utilise a dedicated server and number of dedicated applications to pull these messages and process them• This needs to happen quick enough for live

data to be seen through the web interface• Data is rapidly converted into batch SQL files,

which are imported to MySQL via “LOAD DATA INFILE”

• Results in high number of inserts per second (20,000 – 80,000)

• LOAD DATA INFILE isn’t enough on its own...

Page 20: Dealing with Continuous Data Processing, ConFoo 2012

20 Dealing with Continuous Data Processing, ConFoo 2012

INITIAL ARCHITECTURE

Page 21: Dealing with Continuous Data Processing, ConFoo 2012

21 Dealing with Continuous Data Processing, ConFoo 2012

PROBLEM #1: AVAILABILITY

Page 22: Dealing with Continuous Data Processing, ConFoo 2012

22 Dealing with Continuous Data Processing, ConFoo 2012

PROBLEM #2: CAPACITY

Sometimes, the data is too much

www.flickr.com/photos/eveofdiscovery/3149008295

Page 23: Dealing with Continuous Data Processing, ConFoo 2012

23 Dealing with Continuous Data Processing, ConFoo 2012

CAPACITY

Page 24: Dealing with Continuous Data Processing, ConFoo 2012

24 Dealing with Continuous Data Processing, ConFoo 2012

OPTION 1: CLOUD INFRASTRUCTURE

• Cloud based infrastructure gives:• More capacity• More failover• Higher availability

Page 25: Dealing with Continuous Data Processing, ConFoo 2012

25 Dealing with Continuous Data Processing, ConFoo 2012

PROBLEM WITH CLOUD INFRASTRUCTURE

• Huge volumes of data inserts into a MySQL solution: sub-optimal on virtualised environments• Existing enterprise hardware investment • Security and legal issues for us storing the data off-site

Page 26: Dealing with Continuous Data Processing, ConFoo 2012

26 Dealing with Continuous Data Processing, ConFoo 2012

SOLUTION #1: MQ

www.flickr.com/photos/gadl/89650415/inphotostream

Page 27: Dealing with Continuous Data Processing, ConFoo 2012

27 Dealing with Continuous Data Processing, ConFoo 2012

CLOUD BASED MESSAGE QUEUE

Page 28: Dealing with Continuous Data Processing, ConFoo 2012

28 Dealing with Continuous Data Processing, ConFoo 2012

“THE CLOUD” ISN’T PERFECT

www.flickr.com/photos/brapps/403257780

Page 29: Dealing with Continuous Data Processing, ConFoo 2012

29 Dealing with Continuous Data Processing, ConFoo 2012

PROBLEM #3: STORAGE SYSTEM STARTS TO CRACK

http://www.flickr.com/photos/mknott/2855987266

Page 30: Dealing with Continuous Data Processing, ConFoo 2012

30 Dealing with Continuous Data Processing, ConFoo 2012

SOLUTION: GENIUS DBA & VENDOR CONSULTANCY

• Constantly tweaking the servers and configuration to get more and more performance

• Pushing the capabilities of our SAN, tweaking configs where no DBA has gone before

• www.samlambert.com• http://www.samlambert.com/2011/07/how-t

o-push-your-san-with-open-iscsi_13.html• http://www.samlambert.com/2011/07/diagn

osing-and-fixing-mysql-io.html• [email protected]

Sam Lambert – DBA Extraordinaire

Page 31: Dealing with Continuous Data Processing, ConFoo 2012

31 Dealing with Continuous Data Processing, ConFoo 2012

LIVE, REAL-TIME INFORMATION

Page 32: Dealing with Continuous Data Processing, ConFoo 2012

32 Dealing with Continuous Data Processing, ConFoo 2012

LIVE DATA: PROBLEMS

• Showing data in real time causes a number of headaches:• Processing the huge number of inserts• Data and legacy application architecture • Race conditions• Accessing the data quickly• A global view

Page 33: Dealing with Continuous Data Processing, ConFoo 2012

33 Dealing with Continuous Data Processing, ConFoo 2012

LIVE, REAL TIME INFORMATION: PROBLEM

• Original database design dictated:• All data-points were stored in the same table• Each type of data point required a separate

query, sub-query or join to obtain

• Workings of the remote device collecting the data, and the processing server, dictated:• GPS Co-ordinates can be up to 6 separate data

points, including: Longitude; Latitude; Altitude; Speed; Number of Satellites used to get location; Direction

Page 34: Dealing with Continuous Data Processing, ConFoo 2012

34 Dealing with Continuous Data Processing, ConFoo 2012

LIVE DATA: GLOBAL MAP

Page 35: Dealing with Continuous Data Processing, ConFoo 2012

35 Dealing with Continuous Data Processing, ConFoo 2012

REAL TIME INFORMATION: CONCURRENT

• Initial Solution from the original developers:

• Pull as many pieces of real time information through asynchronously

• Involved the use of Flash based “widgets” which called separate PHP scripts to query the data

• Pages loaded relatively quickly• Data points took a little time to load• Not good enough & caused race conditions

Page 36: Dealing with Continuous Data Processing, ConFoo 2012

36 Dealing with Continuous Data Processing, ConFoo 2012

RACE CONDITIONS

Initial team made use of Flash based charts library. This was both good and bad.• The Good:• Requests for data were asynchronous• Main page loaded quickly

• The bad• Each chart was a separate query• Each chart was a separate request• Session authentication introduced race

conditions

Page 37: Dealing with Continuous Data Processing, ConFoo 2012

37 Dealing with Continuous Data Processing, ConFoo 2012

LOTS OF DATA: RACE CONDITIONS

• Sessions in PHP close at the end of the execution cycle• Unpredictable query times• Large number of concurrent requests per

screen

Session Locking

• Completely locks out a users session, as PHP hasn’t closed the session

Page 38: Dealing with Continuous Data Processing, ConFoo 2012

38 Dealing with Continuous Data Processing, ConFoo 2012

RACE CONDITIONS: PHP & SESSIONS

session_write_close()

Added after each write to the $_SESSION array. Closes the current session.

(requires a call to session_start immediately before any further reads or writes)

Page 39: Dealing with Continuous Data Processing, ConFoo 2012

39 Dealing with Continuous Data Processing, ConFoo 2012

LIVE: STABLE BUT SLOW

Page 40: Dealing with Continuous Data Processing, ConFoo 2012

40 Dealing with Continuous Data Processing, ConFoo 2012

CACHE THE DATA

• We cached most upto date datapoints for each vehicle in an in-memory key-value store• Memcache!• Allows us to quickly get access to live data (which continually changes) without hitting the database

Page 41: Dealing with Continuous Data Processing, ConFoo 2012

41 Dealing with Continuous Data Processing, ConFoo 2012

MEMCACHE

// instantiate$mc = new Memcache(); // set the memcache server and port$mc->connect( MEMCACHE_SERVER, MEMCACHE_PORT );// get data based on a key$value = $mc->get(‘key’);

Page 42: Dealing with Continuous Data Processing, ConFoo 2012

42 Dealing with Continuous Data Processing, ConFoo 2012

MEMCACHE FAILOVER

Depending on the importance of your caching system, you might want to consider High Availability.

Tools such as Moxi Memcached Proxy can do this, acting as a single memcached server, connecting to many.

Page 43: Dealing with Continuous Data Processing, ConFoo 2012

43 Dealing with Continuous Data Processing, ConFoo 2012

CACHING WITHIN LEGACY CODE

Integrating caching with both legacy and new code was difficult. We used a lazy loading registry to pass around a common object which provided access to databases, caching servers, and other shared data.

Page 44: Dealing with Continuous Data Processing, ConFoo 2012

44 Dealing with Continuous Data Processing, ConFoo 2012

LAZY LOADING REGISTRYpublic function getObject( $key ){    if( in_array( $key, array_keys( $this->objects ) ) )     {        return $this->objects[$key];    }    elseif( in_array( $key, array_keys( $this->objectSetup ) ) )    {        if( ! is_null( $this->objectSetup[ $key ]['abstract'] ) )        {            require_once( FRAMEWORK_PATH . 'registry/aspects/' . $this->objectSetup[ $key ]['folder'] . '/' .           $this->objectSetup[ $key]['abstract'] .'.abstract.php' );       }        require_once( FRAMEWORK_PATH . 'registry/aspects/' . $this->objectSetup[ $key ]['folder'] . '/' . $this-        >objectSetup[ $key]['file'] . '.class.php' );        $o = new $this->objectSetup[ $key ]['class']( $this );        $this->storeObject( $o, $key );        return $o;    }    elseif( $key == 'memcache' )    {        // requesting memcache for the first time, instantiate, connect, store and return        $mc = new Memcache();        $mc->connect( MEMCACHE_SERVER, MEMCACHE_PORT );        $this->storeObject( $mc, 'memcache' );        return $mc;    }}

Page 45: Dealing with Continuous Data Processing, ConFoo 2012

45 Dealing with Continuous Data Processing, ConFoo 2012

LLR: NOT FIT FOR PURPOSE

Although for us, the Lazy Loading Registry works, the inclusion of the memcache connection is stretching its flexibility.

Better approach: Dependency Injection Container

We could instead prepare configuration data and pass it within a DIC to the relevant features

Page 46: Dealing with Continuous Data Processing, ConFoo 2012

46 Dealing with Continuous Data Processing, ConFoo 2012

REAL TIME INFORMATION: # OF REQUESTS

• Currently, each piece of “live data” is loaded into a flash graph or widget, which updates every 30 seconds using an AJAX request

• The move from MySQL to Memcache reduces database load, but large number of requests still add strain to web server

• Making more use of text and image based representations, all of which can be updated from a single AJAX request

Page 47: Dealing with Continuous Data Processing, ConFoo 2012

47 Dealing with Continuous Data Processing, ConFoo 2012

RACE CONDITIONS: USE A TEMPLATE ENGINE

• V1 of the system mixed PHP and HTML

• You can’t re-initialise your session once output has been sent

• All new code uses a template engine, so session interaction has no bearing on output. When the template is processed and output, all database and session work has been completed long before.

Page 48: Dealing with Continuous Data Processing, ConFoo 2012

48 Dealing with Continuous Data Processing, ConFoo 2012

RACE CONDITIONS: USE A SINGLE ENTRY POINT

• Race conditions are further exacerbated by the PHP timeout values

• Certain exports, actions and processes take longer than 30 seconds, so the default execution time is longer

• Initially the project lacked a single entry point, and execution flow was muddled

• Single Entry Point makes it easier to enforce a lower time out, which is overridden by intensive controllers or models

Page 49: Dealing with Continuous Data Processing, ConFoo 2012

49 Dealing with Continuous Data Processing, ConFoo 2012

A LITTLE BREATHING SPACE

We now had an application which was:• Stable: sessions were not locking• Quick in parts:• Fleet overview (map)• Vehicle live screen

• Gaining the confidence of users• Unfortunately...• Speed was only the appearance

Page 50: Dealing with Continuous Data Processing, ConFoo 2012

50 Dealing with Continuous Data Processing, ConFoo 2012

THINGS WERE STILL SLOW

• Generating performance data • Backing up and archiving data• Exporting data for grant authorities • Our initial mandate!

• Viewing historical data• Viewing analytical data

Page 51: Dealing with Continuous Data Processing, ConFoo 2012

51 Dealing with Continuous Data Processing, ConFoo 2012

Page 52: Dealing with Continuous Data Processing, ConFoo 2012

52 Dealing with Continuous Data Processing, ConFoo 2012

GENERATING PERFORMANCE REPORTS

To look at detailed performance data we needed to:• Extract a range of data points for a particular day and vehicle:

• Battery current and voltage• Motor speed (revs per minute)• Pedal positions

• Perform some calculations:• Energy usage• Distance and speeds• Driving styles

Page 53: Dealing with Continuous Data Processing, ConFoo 2012

53 Dealing with Continuous Data Processing, ConFoo 2012

NOSQL

NoSQL solutions sound like a prime candidate for this type of problem; and NoSQL is something we are looking to experiment with. However we had a problem:• The original team picked MySQL• Our skill set is MySQL• Testing, R&D & Migration would take a long time

Page 54: Dealing with Continuous Data Processing, ConFoo 2012

54 Dealing with Continuous Data Processing, ConFoo 2012

REGULAR PROCESSING

• Introduced regular, automated data processing• Performance calculations were done over night,

for every vehicle• Saved as a summary record, one per vehicle

per day• Really, really, really easy and quick to pull out,

aggregate, compare and analyse

Page 55: Dealing with Continuous Data Processing, ConFoo 2012

55 Dealing with Continuous Data Processing, ConFoo 2012

Page 56: Dealing with Continuous Data Processing, ConFoo 2012

56 Dealing with Continuous Data Processing, ConFoo 2012

QUERY SPEED BECOMES MORE IMPORTANT

• We were now doing this processing for every vehicle• Not just requested vehicles on demand

We needed to be able to pull out the data even faster than before

Page 57: Dealing with Continuous Data Processing, ConFoo 2012

57 Dealing with Continuous Data Processing, ConFoo 2012

SHARDING

• Pulling data out was still very slow, especially when analysing data over the course of a day• We decided to shard the database again, this time based on the date: Week Number• Data stored before we implemented sharding wasn’t stored in a sharded table

Page 58: Dealing with Continuous Data Processing, ConFoo 2012

58 Dealing with Continuous Data Processing, ConFoo 2012

WHICH BUCKET?

public function getTableNameFromDate( $date ) {    // we can’t query for future data, so if the date is in the future    // reset it to today    $date = ( $date > date( 'Y-m-d') ) ? date('Y-m-d') : $date;    // get the time in seconds since epoc    $stt = strtotime( $date );    // is the query date since we implemented sharding?    if( $date >= $this->switchOver ) {        // calculate the year this week is from        $year = ( date( 'm', $stt ) == 01 && date( 'W', $stt )  == 52 ) ? date('Y', 

$stt ) - 1 : date('Y', $stt );        // add the year and the week number to the table, and return        return ‘data_' . $year . '_' . date('W', $stt );    } else {        // return the legacy table        return 'data';    }}

Page 59: Dealing with Continuous Data Processing, ConFoo 2012

59 Dealing with Continuous Data Processing, ConFoo 2012

QUERY MAGIC

private function parseQuery( $sql, $date=null ){    $date = ( is_null( $date ) ) ? date('Y-m-d') : $date;    return sprintf( $sql, $this->shardHelper->getTableNameFromDate( $date ) );}

Page 60: Dealing with Continuous Data Processing, ConFoo 2012

60 Dealing with Continuous Data Processing, ConFoo 2012

BACKUPS & ARCHIVES

• Shading makes backing up and archiving easier• Simply export a sharded-week table and save• No long queries which have to find data based

on date• If its an archive (not just a backup)

• DROP the table afterwards• No need to run a slow delete query based on a wide

ranging WHERE clause

Page 61: Dealing with Continuous Data Processing, ConFoo 2012

61 Dealing with Continuous Data Processing, ConFoo 2012

SHARDING: AN EXCUSE

DBMS’s such as MySQL don’t support sharding out of the box, so you need to:

Alter every query that touches this data

Use it as an excuse: • Check your data types• Check your indexes• Improve your queries

Page 62: Dealing with Continuous Data Processing, ConFoo 2012

62 Dealing with Continuous Data Processing, ConFoo 2012

DATA TYPES

• When prototyping a system and designing a database schema, its easy to be sloppy with your data types, and fields• DONT BE• Use as little storage space as you can• Ensure the data type uses as little as you can• Use only the fields you need

Page 63: Dealing with Continuous Data Processing, ConFoo 2012

63 Dealing with Continuous Data Processing, ConFoo 2012

INDEX OPTIMISATION

• Keep it small• From our legacy days of one database on one server, we had a column that told us which vehicle the data related to• This was still there...as part of an

index...despite the fact the application hadn’t required it for months

Page 64: Dealing with Continuous Data Processing, ConFoo 2012

64 Dealing with Continuous Data Processing, ConFoo 2012

QUERIES: OPTIMISATION TIPS

• Run every query through EXPLAIN EXTENDED • Check it hits the indexes• Remove functions like CURDATE from queries, to ensure query cache is hit

Page 65: Dealing with Continuous Data Processing, ConFoo 2012

65 Dealing with Continuous Data Processing, ConFoo 2012

CONCURRENT DATABASE CONNECTIONS

Sharding to multiple databases and multiple servers means its crucial to keep track of the connections.• Export and processing scripts – touched every database• If you don’t keep track and close connections when you are finished you risk flooding MySQL with too many connections

Page 66: Dealing with Continuous Data Processing, ConFoo 2012

66 Dealing with Continuous Data Processing, ConFoo 2012

REPORTS: DB CONNECTION MANAGEMENT

•In addition to our calculated reports, we also need to export key bits of information to grant authorities

• Initially our PHP based export scripts held one database connection per database (~400 databases)

• Re-wrote to maintain only one connection per server, and switch the database used

• Toggles to instruct the export to only apply for 1 of the servers at a time

• Modulus magic to run multiple export scripts per server

Page 67: Dealing with Continuous Data Processing, ConFoo 2012

67 Dealing with Continuous Data Processing, ConFoo 2012

CONCURRENT DATA PROCESSING

Depending on the batch processing we are going, sometimes we can run multiple instances per database server.

$counter = 1;foreach( $vehicles as $vehicle ){    if( ( $counter % $concurrent ) == $instance )    {        $vehiclesToProcess[] = $vehicle;    }    $counter++;}

Page 68: Dealing with Continuous Data Processing, ConFoo 2012

68 Dealing with Continuous Data Processing, ConFoo 2012

SYSTEM ARCHITECTURE

Page 69: Dealing with Continuous Data Processing, ConFoo 2012

69 Dealing with Continuous Data Processing, ConFoo 2012

SYSTEM ARCHITECTURE

Page 70: Dealing with Continuous Data Processing, ConFoo 2012

70 Dealing with Continuous Data Processing, ConFoo 2012

APPLICATION QUALITY

• When dealing with lots of data, quickly, you need to ensure:• You process it correctly• You can act fast if there is a bug• You can act fast when refactoring

Page 71: Dealing with Continuous Data Processing, ConFoo 2012

71 Dealing with Continuous Data Processing, ConFoo 2012

DEPLOYMENT

• When dealing with a stream of data, rolling out new code can mean pausing the processing work that is done• Put deployment measures in place to make a deployment switch over instantaneous

Page 72: Dealing with Continuous Data Processing, ConFoo 2012

72 Dealing with Continuous Data Processing, ConFoo 2012

ADDITIONAL POINTS

Page 73: Dealing with Continuous Data Processing, ConFoo 2012

73 Dealing with Continuous Data Processing, ConFoo 2012

DON’T LOG “EVERYTHING”: EXTRAPOLATE AND ASSUME

• Our telemetry unit broadcasts each data point once per second

• Data doesn’t change every second, e.g.• Battery state of charge may take several

minutes to loose a percentage point• Fault flags only change to 1 when there is a fault

• Make an assumption. • We compare the data to the last known value…if

it’s the same we don’t insert, instead we assume it was the same

• Unfortunately, this requires us to put additional checks and balances in place

Page 74: Dealing with Continuous Data Processing, ConFoo 2012

74 Dealing with Continuous Data Processing, ConFoo 2012

EXTRAPOLATE AND ASSUME: “INTERLATION”

• Built a special library which:• Accepted a number of arrays, each

representing a collection of data points for one variable on the truck

• Used key indicators and time differences to work out if/when the truck was off, and extrapolation should stop

• For each time data was recorded, pull down data for other variables for consistency

Page 75: Dealing with Continuous Data Processing, ConFoo 2012

75 Dealing with Continuous Data Processing, ConFoo 2012

INTERLACE

* Add an array to the interlationpublic function addArray( $name, $array )

* Get the time that we first receive data in one of our arrayspublic function getFirst( $field )

* Get the time that we last received data in any of our arrayspublic function getLast( $field )

* Generate the interlaced arraypublic function generate( $keyField, $valueField )

* Beak the interlaced array down into seperate dayspublic function dayBreak( $interlationArray )

* Generate an interlaced array and fill for all timestamps within the range of _first_ to _last_

public function generateAndFill( $keyField, $valueField )

* Populate the new combined array with key fields using the common fieldpublic function populateKeysFromField( $field, $valueField=null )

http://www.michaelpeacock.co.uk/interlation-library

Page 76: Dealing with Continuous Data Processing, ConFoo 2012

76 Dealing with Continuous Data Processing, ConFoo 2012

ABSTRACTION

When dealing with large data sets, abstraction becomes key

You can’t easily change your data structure to fit the application, so the application needs to be able to rapidly change to fit the data

Page 77: Dealing with Continuous Data Processing, ConFoo 2012

77 Dealing with Continuous Data Processing, ConFoo 2012

TECHNICAL TIPS

• Measure your applications performance, data throughput and so on• A data at scale problem itself

• Use as much RAM on your servers as is safe to do so• We give 80% per DB server to MySQL of 100 –

140GB

Page 78: Dealing with Continuous Data Processing, ConFoo 2012

78 Dealing with Continuous Data Processing, ConFoo 2012

WHAT DO WE HAVE NOW?

Now we have a fast, stable reliable systemPulling in millions of messages from a queue per dayDecoding those messages into 1.5 billion data points per dayInserting 1.5 billion data points into MySQL per dayPerformance data generated, and grant authority reports exported dailyMore sleep on a night than we used to

Page 79: Dealing with Continuous Data Processing, ConFoo 2012

79 Dealing with Continuous Data Processing, ConFoo 2012

CONCLUSIONS

• Plan for scale, as best you can• Where data collection is critical, use a message queue, ideally hosted or “cloud based”• Leverage products, services and platforms that excel at tasks you need help with, be it: hosting, message queue or email service• Carefully plan your database schema and data types; use only the data you need• Hire a genius DBA to push your database engine and your Storage Soltion• Make use of data caching systems to reduce strain on the database• Calculations and post-processing should be done during dead time and automated• Add more tools to your toolbox – PHP needs lots of friends in these situations• Watch out for Session race conditions: where they can’t be avoided, use session_write_close, a template engine and a single entry point• Reduce the number of continuous AJAX calls

Page 80: Dealing with Continuous Data Processing, ConFoo 2012

80 Dealing with Continuous Data Processing, ConFoo 2012

Q & A

Michael PeacockWeb Systems Developer – Telemetry Team – Smith Electric Vehicles [email protected]

Senior / Lead Developer, Author & [email protected] www.michaelpeacock.co.uk

@michaelpeacock

http://www.slideshare.net/michaelpeacock