20
About Me About Me Joshua Silver 4th year CS major – graduating in May Specialization: Databases Interests: The business side of computing … and no, not IT How can companies use technology to improve and enable their business Think Enterprise Web 2.0, mobile strategies, viral promotion on the internet, Netflix recommendation engine, e-commerce, etc. etc. Startups!

About Me Joshua Silver 4th year CS major – graduating in May Specialization: Databases Interests: The business side of computing … and no, not

Embed Size (px)

Citation preview

About MeAbout MeJoshua Silver 4th year CS major – graduating in May Specialization: Databases Interests:

The business side of computing … and no, not IT

How can companies use technology to improve and enable their business

Think Enterprise Web 2.0, mobile strategies, viral promotion on the internet, Netflix recommendation engine, e-commerce, etc. etc.

Startups!

Sleepers & Sleepers & WorkaholicsWorkaholics

Caching Strategies in Mobile ComputingAuthors: Dr. Daniel Barbará and Dr.

Tomasz Imielinski

Presented by:Joshua Silver, Fall 2008

Sleepers & Sleepers & WorkaholicsWorkaholics

Caching Strategies in Mobile Computing

Dr. Daniel Barbará Professor at George Mason University Several patents associated with mobile

caching

Dr. Tomasz Imielinski Professor at Rutgers University Senior VP: Search Technology at Ask.com

The Big Picture The Big Picture ProblemProblem Wireless devices have limited

bandwidth, limited storage, and limited battery life

To save power, devices go offline Mobile devices appear randomly in

new cells Makes data caching difficult since

server can’t track client caches

Then and nowThen and now Paper written in 1994 Devices, bandwidth, battery

limitations are different Essential problem still exists

With an explosion of wireless With an explosion of wireless devices, the problem is even devices, the problem is even greatergreater

Source: CTIA—The Wireless Association. http://www.infoplease.com/ipa/A0933563.html

24 Million in 1994

>240 Million in 2008

… … and that doesn’t even take into account and that doesn’t even take into account proprietary handheld units (like UPS driver proprietary handheld units (like UPS driver delivery computers , Amazon Kindles, grocery delivery computers , Amazon Kindles, grocery store handheld scanners, etc.)store handheld scanners, etc.)

Why Caching is ImportantWhy Caching is ImportantConserve: 1. Computational resources2. Battery life3. Network bandwidth

Can’t store entire dataset on handheld.-US maps on GPS unit-Delivery routes for UPS drivers-Contact list on Blackberry

Traditional Strategies FailTraditional Strategies Fail

In a traditional client-server model:the server keeps track of client cachespushes only the changes/sends cache

invalidation messages

BUT…. Server lacks knowledge of:Which units are in its cellWhich units are powered ON

Quintessential problem:Client caches in a mobile environment cannot be tracked by a server

The SolutionThe Solution

Purpose: "…to propose a taxonomy of different cache invalidation strategies and study the impact of clients' disconnection times on their performance."

Sleepers & Workaholics proposes a few solutions and evaluates their effectiveness with mathematical rigor

Evaluation Criteria Evaluation Criteria Complicated math! …. The paper’s appendices have

details.

Essentially: Define two types of Mobile UnitsSleepers (offline/off all the time) Workaholics (never go offline)Almost all real world devices fall in between

How do you compare?Normalize by defining “hit ratio” since it affects

overall throughput

size data total

hits cache validXH

Strategies to EvaluateStrategies to Evaluate

Proposed Strategies:Timestamps (TS)Amnesic Terminals (AT) (only remembering part

– like amnesia)

Signatures (SIG)

Control Strategy:No Cache (NC)

TimestampsTimestamps

-Each cache entry has a timestamp-Synchronous, history based, uncompressed in nature

SERVER:Communicates with clients every n seconds (and retries until

successfully connected)Sends a list of items and their associated timestamps (to accommodate for potential delay in transmission)

CLIENT:For each item in cache: If entry is in received report from server, purge from cache If NOT in report, simply update timestamp to current time

Amnesic TerminalsAmnesic Terminals-Each cache entry has a identifier-ALSO Synchronous, history based, uncompressed in nature

SERVER:Notify clients of identifiers of items changed since the last

invalidation report.

CLIENT:For each item in cache:◦ If in report, purge from cache◦ If NOT in report, do nothing◦ ALSO, if enough time has elapsed, drop WHOLE cache and rebuild

completely.

SignaturesSignatures-Checksums calculated over value of data to form Signature-Since the mobile unit does not have entire database, need

an algorithm to compute a partial checksum – see the appendix

-Signatures combined using XOR-Synchronous, state based, compressed reports

SERVER:Server broadcasts the set of combined signatures

CLIENT:Item in cache is declared invalid if it belongs to “too many”

unmatching signatures (suspected of being out of date)

No CacheNo Cache

There is no cache

SERVER:Responds to direct queries from the client with

appropriate information

CLIENT:Query the database directly anytime item is needed

Conclusions on Conclusions on EffectivenessEffectiveness

Strategy depends on circumstances:Signatures best for long sleepers, when the

disconnection period is long and difficult to predict

Timestamps best for query-intensive scenarios, when the rate of queries is greater than the rate of updates, provided that units are not workaholics

Amnesiac Terminals is best for workaholics, units that are awake most of the time

Still not satisfied …. how Still not satisfied …. how can we improve can we improve effectiveness?effectiveness?

Only 2 options:

1. Update less often or2. Send less info

Relax the Relax the ConsistencyConsistency of the of the CacheCache

Depending on data type, data may not need to be exact…EX: stocks, weather, etc.

Allow to vary by a set tolerance (like .05% for stock prices, outdated weather reports by 2 hours, etc)

Makes shorter invalidation reports possible

How Do We Decide to How Do We Decide to Update?Update?

- Consider cached copies to be quasi-copies

- Each quasi-copy has a coherency condition attached to it

Coherency Conditions:Delay Condition - updated based on timeArithmetic Condition - updated based on

difference between data and quasi-copy

CriticismCriticismWhich resources are most scarce is not

really still accurate (eg. bandwidth better than predicted, longer battery life)

Units rarely powered down◦Battery life better than predicted◦Battery life does not dictate use patterns …

reception does alsoUnits still lose reception frequently

◦Today’s most common “sleeper” condition -- explicitly excluded from definition in S&W