3.3 Compute Management

What is it?

Being in control over the compute platform is crucial to have the right capacity at the right time. Therefore it is required to collect performance and capacity measurements and use them to make your short and long term plans.

You have to collect certain metrics to understand the installed capacity and how that is actually used. From an energy efficiency point of view you should collect these metrics:

  • Utilization of processor, memory and network

Does a server has space left? On virtualization hosts this is often based on available memory. Do you know why you bought that server with the fastest quad-core processor? And do you know how  much of that processing power is actually used? Without these kinds of metrics one can never decide if a new server is needed and what kind of sizing it should have.

  • Utilization of local disks or remote storage

Disks are kept separate from the metrics above as many servers today have a local disk for the operating system. Data is stored on a NAS, SAN or remote database. In case a local disk is used for data, monitor its capacity. And remote filesystems are worth monitoring too to understand the actual utilization of the provisioned capacity.

  • Power draw

Servers draw power and do you know how much? Can you relate server activity to power draw? Does the user of that server, or likely the application or business service that is provided by this server, pay for the energy? Do you see different power draws between similar types of servers? Having this data can help you to chargeback energy cost and to better qualify the specifications for new servers.


Why should you care?

Servers run the business applications and other services. And they use energy and require capacity of the cooling system. On average, servers have much more processing capacity than actually is used while they do not use much less energy under low load. Consolidating applications on a smaller set of servers will 1) reduce the required size of the data center, 2) consume less energy and 3) reduce the load of the cooling system.


How does it relate to the other items in the model?

Which KPI’s are directly related to this KPI?

  • KPI x86 Asset Efficiency and RISC Asset Efficiency require this KPI to improve. In combination with the capacity management process the amount and utilization of devices can improve.
  • KPI Change and Configuration Management is supported by this KPI to record and store capacity data of the devices.
  • KPI Product Lifecycle Management requires this KPI to understand the actual utilization of the devices. It is input to determine the operational value and to learn what to do when the device needs replacement.
  • KPI Capacity Management is the process that uses this KPI as its input to understand the past, the current utilization and the future utilization considering stable development.
  • KPI Service Level Management matches this KPI to the expectations that are agreed in the service levels.