15
Monitor, Diagnose, and Optimize Java Workloads on Premises and in the Cloud Oracle Openworld, 2015

CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

Embed Size (px)

Citation preview

Page 1: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

Monitor, Diagnose, and Optimize Java Workloads on Premises and in the Cloud

Oracle Openworld, 2015

Page 2: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

Agenda• The company

• Our platform

• JVMD on EM 12

• JVMD on EM13 - Expectations

Page 3: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

Therap Services• Integrated documentation and communication services for the Developmental

Disability industry

• Private and Government customers

• De-facto National Leader

• SaaS, 24x7 operation

• 1M+ lines of java code, Spring & JEE

• 100+ application modules, 20+ war in an ear

• Heavy use of JMS, Coherence, RMI

Page 4: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

Stats• Very efficient in resource usage

• ~250K active and heavy users, ~10K concurrent user (up from 8.5K 6 months ago)

• Translates to 46K+ peak requests/minute, ~2.2M requests/hour

• 600,000 application page views/hour

Page 5: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

Application Lifecycle• 3 major releases/year, minor/bug-fix releases every few weeks

• Multiple testing environments

• 2 identical sites each with

• 8 WebLogic instance (4 Hosts) - 12.1.2.0.0

• 1 in-memory DB (in-house)

• 1 Oracle DB - 12c (moved six months ago), Goldengate replication

Page 6: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

Our EM Deployment• EM: 12.1.0.4.0

• One EM instance per production data center (2), one for the dev/staging environment.

• Our system team owns it. Used by system team, operations, DBA

• Tries to move very to the latest version fairly quickly. Preferably within months but depend on product release schedule.

Page 7: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

JVMD Use Case at Therap• Primarily for Application Performance Management (APM)

• Proactively fix bottlenecks before they actually becomes problem

• Diagnosing incidents

• Right now primarily for production, limited high value use for big infrastructure change (WL upgrade etc.)

• Want to increase usage during dev cycles but it needs to get easier to use (13.1 looks promising)

Page 8: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

What it provides

• Active thread by state

• Thread transitions, automatic snapshots and detail informations in snapshots (request, ECID, sql, stacktrace)

• Top requests, methods, sql etc.

• All sorts of weblogic, JVM metrics that we get access to

Page 9: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

Log4j > Logback• Log4j sync logging had concurrency issue

• This was limiting request/second despite low cpu

• The undiscovered bottleneck was causing request piling up and crashing the server

• We moved to logback which had better concurrency and then we moved to async logging to make it even more scalable

Page 10: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

JVMD 13.1 - Better Data Exploration

• The UI is much faster and intuitive

• Allows us to slice and dice the data from many different angle

• Allows us to focus on specific time (remembers it)

• Allows us to create a focus workset (jvm, request, session etc.) and look at if from different perspective

Page 11: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

JVMD 13.1 - Better Information

• Gives us a lot more information (not just top 5 like before)

• Gives us information in a more statistically relevant fashion (max, avg, count)

• Allocation, DB information

Page 12: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

Allocation Information• Allocation information at a granular level is the most difficult information to

find

• Its extremely useful and essential for the cases where you need it

• You can take pro-active measurement based on allocation profile because bad allocation profile often suggest design problems

• Its very hard to guess whats causing allocation and such making a very lean application

• e.g. - At one point of reducing allocation I had to change int to Integer since the casting for map lookup was a big percentage of allocation

Page 13: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

SQL to Application Code

• You can look at the top sql and can find the stack trace

• Very important for us since we have over 100 modules and 4000 queries in the system, making it sometime very hard to figure out where a particular query comes from

• Its doubly important for generic queries where we can’t get the bind variables

Page 14: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

JFR Integration

• There is nothing that is beating JFR for the absolute low level metrics

• JVMD is the only tool that I know that can integrate with JFR

Page 15: CON9740_Mohammad-OOW - JVMD Session - Oct 2015 - Tahseen - V1

Thank You!

Tahseen Mohammad Principal Software Engineer, Therap Services

tahseen ~AT~ therapservices.net