HPM and managing the "9s"

For most of us that provide for or deal on a daily basis with Tier 1 TEMs, NSPs, and government agencies, a term that should be at the front of our minds is “High Availability (HA).” An important extension of that, particularly in the aforementioned spaces, is “Service Availability (SA).”
Service Availability is the use of redundancy, fault prediction and avoidance, low Mean Time To Repair (MTTR), and seamless recovery from system failure regardless of the software, hardware, or user involved in the delivery of a service to ensure continuous availability of that service. Without those system design considerations in HA, systems that typically strive for five-9s (99.999 percent) availability – equating to 5.26 minutes of downtime annually – risk the loss of precious system resources, seconds, and potentially a “9.”

Keeping ATCA HA

As a high-performance computing platform initially intended for next-generation communications, and eventually diversifying into military/aerospace and other applications as well, the architecture has been directly concerned with HA and SA since the development stage more than 10 years ago. At that point, concerns were raised about the AdvancedTCA shelf manager’s ability to conduct efficient operations within the shelf, as the standard shelf had grown to support 14 or 16 board slots, each of which could potentially house a blade needing management, among other Field Replaceable Units (FRUs). These concerns were addressed by a PICMG mandate for a (HPM) layer in all ATCA-compliant shelves, boards and modules. This layer is separate from the operating systems and applications that run on the boards and modules. It enables remote management of the hardware level aspects of ATCA systems, with required management controllers at the shelf level and locally on every board or module.

Consequently, but not coincidentally, as the AdvancedTCA specification (PICMG 3.x) was being constructed, a separate consortium was formed to address concerns regarding availability in COTS architectures, the Service Availability Forum (SA Forum or SAF). The SAF, which would eventually include members from industry organizations such as PICMG, CP-TA (prior to dissolving in October 2011 after a transfer of assets to PICMG), the SCOPE Alliance, and The Foundation, began work in 2001 on carrier-grade service availability middleware, including a Hardware Platform Interface (HPI), which defines an API for platform management, as well as an HPI-to- mapping specification.

Within PICMG, work on the HPM layer continued, with major portions of the and specifications addressing the HPM-related aspects of each of those architectures. In addition, work began on the first of a series of HPM.x specifications that focus on particular needs at the HPM layer and applied across xTCA. HPM.1, the Intelligent Platform Management Controller (IPMC) Firmware Upgrade specification, adopted in June 2007, defines an upgrade agent for xTCA board and module level management controllers (generically IPMCs), and is now pervasively adopted in the xTCA ecosystem.

Since then, the HPM layer for xTCA has continued to mature, including in the ATCA Extensions and 40 Gig Ethernet initiatives In the xTCA-wide PICMG HPM.x series, HPM.2 is nearing completion. It standardizes local FRU management controllers that are LAN-attached, or connect to in-shelf LAN, complementing the xTCA-mandated Intelligent Platform Management Bus (IPMB). Particularly significant in the development of this specification is the capacity to establish remote serial communications via the in-shelf LAN, to multiple serial ports on LAN-attached FRUs. Also nearing finalization, HPM.3 standardizes a method for retrieving operational parameters, especially communications-related parameters like IP addresses, for management controllers using the Dynamic Host Configuration Protocol (DHCP).

In this issue, chairman of HPM.x subcommittee, Mark Overgaard establishes a foundation for ATCA HPM in “COTS management building blocks for AdvancedTCA: Proven over the first decade,” which is followed by an industry perspective on effective implementation of a Shelf Manager by Mike Thompson of Pentair/Schroff. Finally, on page 19 of this issue, Jari Ruohonen of Nokia Siemens Networks (NSN) describes the criticality of the HPM layer in the success of their ATCA-based telecommunications products, and their need for continued evolution of that layer, including for the completion and wide adoption of HPM.2.

Increased availability

Recently there have been a couple of developments concerning the management layer above the HPM subsystem of xTCA systems. The OpenSAF recently announced Release 4.2 of its SAF-compliant middleware, and the SAF published a textbook, Service Availability: Principles and Practices. Information on both of those, as well as more on HPI, can be found at www.saforum.org. For more details on xTCA Hardware Platform Management, or to find out how you to get involved in the fall’s “Intelligent 40 Gig” issue, send me an e-mail at blewis@opensystemsmedia.com.
Happy managing!