5April2018

3 Questions to Measure the Performance and Efficiency of Your IT Strategy

Is Your Enterprise IT Strategy both Efficient and Performant?

In today’s ever-increasingly digitized business reality, where ever-more processes and interactions with suppliers, partners and customers are implemented in software and online 24×7, three over-arching pressures continue to drive Business strategy. However, what can’t be overlooked is the need to tie IT Strategy and Metrics to Business Strategy to deliver efficiency and performance.

In a recent blog: 3 Business Performance Challenges of Digitization, we discussed the main categories of those existential challenges to Business Strategy:

  • Agility – Businesses that deploy new applications and processes in software the fastest win
  • Performance – Business whose applications and services perform well capture market share, those who do not, lose it
  • Cost Efficiency – Businesses whose overall cost-model is the most efficient are the most profitable and capture the most market share

AIOps (Artificial Intelligence for Operations) is a growing theme with a game-changing premise: to apply machine learning technologies to automate the continuous management of IT resources underpinning critical business applications, services and processes. In How to Consolidate Performance Monitoring Metrics to Realize AIOps Benefits, we discussed a critical dimension to yield valuable analytic results; the requirement to understand time series performance metrics across time and in a fashion that contextually related to the precise infrastructure environments supporting those Business Services. Utilizing a consolidated and fully related monitoring metric data store to gain the ability to always have both business and technical performance context across all applications and the infrastructure supporting them.

While necessary, a consolidated, and fully related, time-series performance metric store across the entire application and infrastructure stack is not sufficient, on its own, to answer the next logical question the Business should be asking. If we follow the old maxim, “if you don’t measure it, you cannot manage it”, that question should be:

What is the actual performance and efficiency of our IT Strategy to dev-ops/agile delivery of ever more innovative applications, services and processes?

Three Components of “Measuring to Manage” Performance and Efficiency of IT Strategy

The questions needing to be answered are:

  1. What level of application and service performance delivery is required? And what does its normal performance look like when we are able to deliver it?
  2. When abnormalities or exceptions occur; why? And what is the fastest way to fix it?
  3. Over the business cycle, when and why will our delivery of business services at acceptable performance be constrained by our IT strategy?

Let’s take a closer look at each of these fundamental questions.

Question 1: What should application and service performance be and how are we doing?

Absolutely foundational is the ability to answer a two-part question:

  • What must the actual performance of our applications, services and process be? Hour by hour, and day by day – as measured by the only metrics that ultimately matter – How much work is getting done and how quickly?
  • What are the actual underlying IT resource requirements for the delivery of those required service levels?

As hand-in-hand with understanding the business requirement for performance over time, comes understanding the actual behavior of the entire system in terms of what is normal and abnormal. After all, if things are performing acceptably to service level requirements, and all associated and related metrics are also behaving normally, there’s “nothing to see here, move along”. Normal and abnormal behavior – of actual service delivery over time is, in effect, the “table stakes” understanding required by the business to properly manage IT investments.

By understanding what is required, what is normal, and importantly, what is abnormal, Business has the first and most important management by exception tool at their disposal: the ability to focus on only what matters in the context of the business.

Question 2: When things aren’t normal, why? And does it matter?

Once the foundational knowledge of required levels of service delivery has been established, and a system put in place to measure performance of delivery and its behavioral normality and abnormality over time, the next logical step is to answer the next set of questions:

  • When things are abnormal, what are those abnormalities?
  • Do they matter now, will they matter later?
  • And, if they do matter now or later, what is the fastest way to identify the cause of the abnormality and thus focus efforts to quickly resolve (or prevent) service delivery issues?

Only a data store of time-series performance metrics with an automated, continuous and calculation of the relationships across all dynamically changing application and infrastructure components of the business application, processes and services can deliver the foundational data set required to map abnormal behavior in service delivery. This is required to directly relate and associate abnormal behavior in and across the dynamic infrastructure technology stack needed to support the performance of those services.

The use of only event-correlation schemes against unrelated sets of objects and metrics not only limits one to after the fact forensics (which are not conducive to proactive/preventative processes), they also limit one to only being able to use statistical correlation or other types of pattern-based analytics to make a “best guess” at what “might be” causing outage and degradation events.

The use of time-series metric analysis schemes against sets of unrelated time-series metrics across the stack of applications and resources is hindered by the same “best guess at what might be” limitations. Simple correlation in time is not causation and leads to “false positives” which, at best waste staff time chasing, or worse, “false negatives” which lead to complacency and service degradations and outages.

Question 3: When and why will Business Services be constrained?

To optimize life-cycle investments in IT infrastructure, whether on-premises, hybrid, and/or cloud environments, a foundational requirement is to understand how much and what type of resources are required across the business cycle. However, this alone is again “table stakes”. This understanding will allow for the reaction to a need for more resource via processes that acquire and deploy additional resources. Traditional environments were limited to long-wave physical resource planning, approval, acquisition and deployment cycles that could in many cases take months (or even a year). More modern virtualized cloud platforms allow for relatively rapid (in the case of private cloud IaaS and PaaS environments) and automated resource provisioning in a near on-demand fashion. This can be further automated and optimized using public-cloud compatible applications and services.

But, just because one can react to an instantaneous real-time demand for additional physical resource quickly, does not mean one is fully cost-optimized. Public cloud environments are notoriously expensive per unit of compute and storage when used in such automated, “give me more now” models. On-premises private cloud IaaS and PaaS offer additional cost efficiencies when thus implemented, but still require pre-staging/provisioning sufficient extra overall shared hardware resource to meet spikes in demand.

The only way to extract the utmost levels of cost efficiency while simultaneously assuring acceptable performance of the delivery of services is to have a continuous forward looking understanding of when, and under what conditions, specific IT resources will become a resource constraint as indicated by Google’s 4 Golden Signals as described in “Site Reliability Engineering”. Only the use of forward looking analytics and analysis against the normal (and abnormal) behavior of applications, transactions and services can deliver this level of proactive insight with sufficient foreknowledge to enable the most cost-effective decisions to be made.

In addition to cost efficiency, forward looking analytics against a consolidated dataset of related cross-stack performance data enables a new level of service risk prevention. You will always know, well in advance, what and when will be the Business Volume constraints in your infrastructure.

Implementing Efficient and Performant Business Service Delivery

At OpsDataStore we understand the importance of proactively ensuring the performance and reliability of revenue-generating applications powering businesses. That’s why OpsDataStore delivers the whole picture, so you can to optimize performance, increase business agility, measure operational risk and drive ROI for every technology investment.

OpsDataStore uniquely collects the metrics across the application, compute, network and storage stacks and automatically and continuously creates the relationships between them. It continuously analyzes all the related data to calculate baselines of normal and abnormal behavior for every metric, in the context of all other related metrics, and all mapped to the business metrics that matter: application, transaction and service performance and response times. This enables the automated and ad-hoc delivery of dashboards, reports and analyses:

  • Continuous service level dashboards and reports of critical applications; their behavior over time, and when and where they are normal, and exception-based visualizations and reports of all related objects and metrics.
  • Continuous visibility and drill down on an exception-basis across all pair-wise anomalies to immediately discover the abnormally performing infrastructure causing performance problems with applications, transactions and services.
  • Continuous forward-looking analytics to “play offense” in the context of the Google Golden Signals associated with critical business services.

Summary

As enterprises continue the relentless pace of innovation and digitization, competitive pressures will increase the pressure on Business to ensure their IT investments are fully optimized for both performance AND cost – across the application, transaction and services lifecycle. To do this effectively, all “metrics that matter” must be instrumented (a la: Google Golden Signals), they must be related together in context over time, and they must be automatically analyzed as to what is normal and abnormal, when, and why.

Learn more about how OpsDataStore delivers efficient and performant IT to the Business.