How to evaluate and improve the quality of mHealth behaviour change tools
31
Yorkshire Centre for Health Informatics How to evaluate & improve the quality of mHealth behaviour change tools ? Jeremy Wyatt DM FRCP ACMI Fellow Leadership chair in eHealth research, University of Leeds; Clinical Adviser on New Technologies, Royal College of Physicians, London [email protected]
How to evaluate and improve the quality of mHealth behaviour change tools
1. Yorkshire Centre for Health Informatics How to evaluate
& improve the quality of mHealth behaviour change tools ?
Jeremy Wyatt DM FRCP ACMI Fellow Leadership chair in eHealth
research, University of Leeds; Clinical Adviser on New
Technologies, Royal College of Physicians, London
[email protected]
2. Agenda Why mHealth BC tools ? Is there a quality problem ?
What does quality mean: In general ? For mHealth BC tools ? What
methods might improve quality ? How to evaluate the quality of
these tools ? Some example studies Conclusions
3. Why mHealth for behaviour change ? 1. Face-to-face contacts
do not scale 2. Smart phone hardware used by > 75% of adults:
Cheap, convenient, fashionable Inbuilt sensors / wearables allow
easy measurements Multiple channels: SMS, MMS, voice, video, apps
3. mHealth software enables: Unobtrusive alerts to record data,
take action Incorporation of BCTs (eg. BCTs present in 96% of
adherence apps, median 2 BCTs) Tailoring makes BC more effective
(corrected d=0.16 - SR of 21 studies on BC websites, Lustria, J H
Comm 2013)
4. Why digital channels ? 8.60 5.00 2.83 0.15 0 1 2 3 4 5 6 7 8
9 10 Face to face Letter Telephone Digital Costinperencounter Mean
public sector cost per completed encounter across 120 councils
Source: Cabinet Office Digital efficiency report, 2013
5. A broken market ? y = 3.7 - 0.1x R = 0.016 0 5 10 15 20 25
30 35 40 45 0 5 10 15 20 25 30 35 ApppriceUSdollars Evidence score:
high score means app adheres to US Preventive Service Task Force
guidelines Price ($US) of 47 smoking cessation apps versus evidence
score (data from Abroms et al 2013)
6. Privacy and mHealth apps Permissions requested: use
accounts, modify USB, read phone ID, find files, full net access,
view connections Our study of 80 apps: average of 4 clear privacy
breaches for health apps, only 1 for medical apps We know that - we
read the Terms & Conditions ! (this one only 1200 words, but
many much longer)
FirstFolioAsYouLikeItPublicDomainPhototakenbyCowardlyLion-FolioSociety
editionof1996 With Hannah Panayiotou & Anam Noel, Leeds medical
students
7. What is Quality ? The totality of features and
characteristics of a product or service that bear on its ability to
satisfy stated or implied needs (ISO 9000:2005 Quality management
systems -- Fundamentals and vocabulary) IE. Quality is about
fitness for purpose - a good quality product complies with client
requirements [That 30-page ISO document costs 96-50!]
8. Possible processes to improve quality of mHealth tools
Methods Advantages Disadvantages Examples Wisdom of the crowd
Simple user ranking Hard for users to assess quality; click factory
bias Current app stores MyHealthApps Users apply quality criteria
Explicit Requires widespread dissemination; can everyone apply them
? RCP checklist Classic peer reviewed article Rigorous (?) Slow,
resource intensive, doesnt fit App model 47 PubMed articles
Physician peer review Timely Dynamic Not as rigorous Scalable ?
iMedicalApps, MedicalAppJournal Developer self- certification
Dynamic Requires developers to understand & comply; checklist
must fit apps HON Code ? RCP checklist Developer support Resource
light Technical knowledge needed Multitude of developers BSI PAS
277 CE marking, external regulation Credible Slow, expensive, apps
dont fit national model NHS App Store, FDA, MHRA
9. BSI publically accessible standard 277 on health &
medical apps PAS is not a BSI Standard, but a supportive framework
for developers Steering group includes DHACA, BUPA, NHS AHSN, app
developers, notified bodies, HSCIC, RCP, NIHR mental health
technology centre, patient rep, medical device manufacturer Now
published on BSI web site free access
10. Regulation of medical apps by FDA, FCC If classified as a
medical device by FDA a product must demonstrate efficacy, but:
Only 100 apps so far classified as a medical device Decision to
exercise enforcement discretion on most medical apps So, FDA has
not actually banned any apps, yet However, the Federal
Communication Commission has banned some apps with misleading
claims, eg. Acne Cure (no evidence of claimed benefit of iPhone
screen backlight) Sharpe, New England Center for Investigative
Reporting, Many health apps are based on flimsy science at best,
and often do not work. Washington Post, November 12th 2012
11. We need to think differently Old think New Think
Paternalism: we know & determine what is best for users Self
determination: users decide what is best for them Regulation will
eliminate harmful Apps after release Prevent bad Apps - help App
developers understand safety & quality The NHS must control
Apps, apply rules and safety checks Self regulation by developer
community Consumer choice informed by truth in labelling App
developers are in control Aristotles civil society* is in control
Quality is best achieved by laws and regulations Quality is best
achieved by consensus and culture change The aim of Apps is
innovation (sometimes above other considerations) App innovation
must balance benefits and risks An Apps market driven by viral
campaigns, unfounded claims of benefit An Apps market driven by
fitness for purpose (ISO) & evidence of benefit The elements
that make up a democratic society, such as freedom of speech, an
independent judiciary, collaborating for common wellbeing
12. User ratings: app display rank versus adherence to evidence
Study of 47 smoking cessation apps (Abroms et al, 2013)
13. What went wrong with user rankings and reviews ? Wall
Street Journal Nov. 24, 2013 6:25 p.m. Inside a Twitter Robot
Factory: Fake Activity, Often Bought for Publicity Purposes,
Influences Trending Topics By JEFF ELDER One day earlier this
month, Jim Vidmar bought 1,000 fake Twitter accounts for $58 from
an online vendor in Pakistan. Mr. Vidmar programs accounts like
these to "follow" other Twitter accounts, and to rebroadcast tweets
http://on.wsj.com/18A2hr9
14. Promoting quality in the marketplace Quality is defined by
ISO as fitness for purpose However, app users and purposes vary, so
a simple good / bad quality mark is insufficient We need a
checklist of optional criteria to empower users / clinicians /
commissioners / app developers Select the criteria you need
according to complexity of the app and contextual risk (Lewis &
Wyatt, JMIR 2014) Labelling apps will make quality explicit, adding
a new Darwinian selection pressure to the apps marketplace
15. Risk Framework for mHealth apps Context in which app used
Lewis T, Wyatt JC. JMIR 2014
16. Our draft quality criteria for apps based on Donabedian
1966 Structure = the app development team, the evidence base, use
of an appropriate behaviour change model etc. Processes = app
functions: usability, accuracy etc. Outcomes = app impacts on user
knowledge & self efficacy, user behaviours, resource usage
Wyatt JC, Curtis K, Brown K, Michie S,. Submitted to Lancet
17. Structure: what should a high quality App include ?
Appropriate sponsor & developer skills Appropriate components:
An appropriate underlying health promotion theory A sensible choice
of behaviour change techniques Good user interface Accurately
programmed algorithms Adequate security Accurate knowledge from a
reliable source NICE ? Study method: independent expert review
18. Process: does the App work well ? Suggested measures: App
download rates [surrogate for user recognition] App usage rates
immediately after download & later Time taken to enter data,
receive report Acceptability, usability Accuracy of calculations,
data Appropriateness of any advice given Study method: lab tests
and surveys
19. Case study of CVD risk apps: methods Assembled search
terms: heart attach, cardiovascular disease etc. Searched for &
downloaded all iPhone / iPad free & paid apps that public might
use to assess personal risk Assembled 15 scenarios varying in risk
from 1% to 98% Assessed the risk figure, format and advice given by
each app Definition of error: App above 20% & GS below 20%, or
vice versa (old NICE guidance - or 10% - new guidance) With Hannah
Cullumbine & Sophie Moriarty, Leeds medical students
20. Overall results Located 21 apps, only 19 (7 paid) gave
figures All 19 communicated risk using percentages (cf. advice from
Gigerenzer, BMJ 2004) One app said see your GP every time; none of
the rest gave advice Some apps refused to accept key data, eg. age
> 74, diabetes Heart Health App
21. Misclassification rates for 20% threshold Varied from 7% to
33% 5 (26%) of 19 apps misclassified 25% or more of the scenarios,
3 (16%) misclassified 20-24%; 8 (42%) misclassified at least a
fifth of the scenarios Median error rate free 13%, paid 27% Error
rate for free apps significantly less than for paid apps (p =
0.026, M- W U test 13.5) 0% 5% 10% 15% 20% 25% 30% 35%
Misclassificationrate App misclassification rate (20% threshold;
paid = dark blue)
22. Assessing app accuracy Only applies to apps that give
advice, calculate a risk / drug dose etc. Need a gold standard for
correct advice / risk [QRisk2 in our case] Need a representative
case series, or plausible simulated cases Ideally, users should
enter case data or their own data How accurate is accurate enough:
Accurate enough to get used ? Accurate enough to encourage user to
take action ?
23. Intervention modelling experiments Aim: to check
intervention before expensive large scale study (MRC Framework:
Campbell BMJ 207) What to measure: acceptability, usability
accuracy of data input by users, accuracy of output whether users
correctly interpret output stated impact of output on decision,
self efficacy, action users emotional response to output user
impressions & suggested improvements
24. Example IME: How to make prescribing alerts more acceptable
to doctors ? Background: interruptive alerts annoy doctors
Randomised IME in 24 junior doctors, each viewing 30 prescribing
scenarios, with prescribing alerts presented in two different ways
Same alert text presented as modal dialogue box (interruptive) or
on ePrescribing interface (non- interruptive) Funded by Connecting
for Health, carried out by Academic F2 doctor Published as Scott G
et al, JAMIA 2011
25. Interruptive alert Interruptive alert in modal dialogue
box
26. Non-interruptive alert same text
27. Outcomes the bottom line What is the impact of a health
promotion App on: User knowledge and attitudes Self efficacy Short
term behaviour change Long term behaviour change Individual
health-related outcomes Population health Study method: randomised
controlled trials (eg. My Meal Mate App - Carter et al JMIR 2013)
surrogate outcomes
28. Online RCT on Foggs persuasive technology theory & NHS
organ donation register sign up Persuasive features: 1. URL
includes https, dundee.ac.uk 2. University Logo 3. No advertising
4. References 5. Address & contact details 6. Privacy Statement
7. Articles all dated 8. Site certified (W3C / Health on Net) Nind,
Sniehotta et al 2010
29. Why bother with impact evaluation cant we just predict the
results ? No, the real word of people and organisations is too
messy / complex to predict whether technology works: Diagnostic
decision support (Wyatt, MedInfo 89) Integrated medicines
management for a childrens hospital (Koppel, JAMA 2005) MSN
messenger for nurse triage (Eminovic, JTT 2006)
30. A proposed evaluation cascade for mHealth Apps Area Topics
Methods Source Purpose, sponsor User, cost Inspection Safety Data
protection Usability Inspection HCI lab / user tests Content Based
on sound evidence Proven behaviour change methods Inspection
Accuracy Calculations Advice Scenarios with gold standard Potential
impact Ease of use in the field Understanding of output
Intervention modelling experiments Impact Knowledge, attitudes,
self-efficacy Health behaviours, outcomes Within-subject expts
Field trials
31. Conclusions 1. The quality of mHealth tools varies too much
2. User & professional reviews, developer self-certification
and regulation are not enough 3. To help reduce apptimism and
strengthen other strategies, we need to agree quality criteria,
evaluate apps against them, & label apps with the results 4. We
have the evaluation methods (eg. rating quality of evidence,
usability / accuracy studies, RCTs) 5. This will support patients,
health professionals and app developers to maximise the benefits of
mHealth