IBM is upgrading its z/OS mainframe operating system with features that help customers predict failures and reduce the time it takes to recover from performance slowdowns.
A feature known as Predictive Failure Analysis is the most interesting aspect of the new software release, due out in September, says IBM distinguished engineer Robert Rogers. The feature was present in an earlier release but the new version is more capable of analysing error records and resource utilisation to predict system failures, he says. It uses modelling algorithms to determine what the normal state of a machine is and report on changes that aren't apparent right away but could eventually lead to major problems.
Predictive Failure Analysis "can be used during application testing to identify previously unknown potential problem areas before the application is put into production or can be used in production systems to help identify issues before they become serious," IBM says.
IBM updates the mainframe operating system every September, and previews the changes to be made six months beforehand. The upcoming version is z/OS Version 1 Release 12, and will be compatible with all mainframe hardware released in the past 10 years. Release 12 will also be compatible with the next version of the mainframe hardware, which is expected to be released later this year.
Some, but not all of the new features will be made available to customers who stick with older versions of the operating system. Once Release 12 is issued, Release 10 will be the oldest version still eligible for the highest level of support.
In the grand scheme of things, this year's operating system update "is not one of the biggest releases", Rogers says. "We haven't added any huge new componentry like our Parallel Sysplex clustering."
Similar to the Predictive Failure Analysis, the operating system includes a new Run Time Diagnostics function, which analyses key indicators of a system and identifies root causes of system degradations.
For example, the diagnostics function might identify an application that is looping, meaning that it is repeating a sequence of instructions over and over again without termination, Rogers says. When a problem arises the system provides mainframe administrators with several corrective actions to choose from to maintain high levels of availability.
One of the main goals is to increase time to recovery, or the amount of time between when a system slows down and when it returns to normal performance.
Mainframes have long had great uptimes, but such success can make it difficult to determine the cause of rare outages, and solve them without having to take the entire system down, Rogers says.
"If something almost never fails, then when it does, nobody knows what to do," he says. "Nobody knows how to tell what the cause of the failure is and how to correct it."
The new version of the OS will also include software updates in the areas of storage management and scalability, advanced cryptography, automatic portioning, and workload-driven provisioning.