Full-dress rehearsal readies AdvancedTCA systems

7TEMs can get AdvancedTCA systems to market faster, and with lower costs, by “rehearsing” the systems for their real-world roles.

Realizing all the benefits from using standards-based AdvancedTCA components requires a combination of telecom and AdvancedTCA ex- pertise. Two key benefits, improved modularity and multivendor compatibility, are predicated on solid thermal design and system integration practices, often derived from years of industry experience. Complementing the modularity and capability guidelines built into AdvancedTCA specifications enables developers to achieve higher levels of system performance, robustness, and reliability.

While PICMG 3.0 provides one-stop shopping for AdvancedTCA specifications, including detailed support material to enable interoperability of components at multiple levels, there’s no guarantee that multivendor AdvancedTCA components will flawlessly work together for all applications and targeted conditions. As a result, equipment manufacturers need to conduct thorough thermal characterization and perform rigorous integration testing to ensure system stability.

Thermal integration

A comprehensive set of specifications promotes AdvancedTCA interoperability, including requirements for thermal engineering such as the parameters under which vendors should measure airflow. For example, a range of values for acceptable board impedance is given, which is useful for conducting performance tests. The PICMG 3.0 defined airflow direction and distribution aid designers seeking to maximize board cooling. In support of AdvancedTCA, the Communications Platforms Trade Association (CP-TA) publishes thermal guidelines and develops interoperability test tools that enable vendors to speed up design and increase their confidence in interoperability. However, these specifications and guidelines alone don’t ensure that all components will work together perfectly. Sometimes there are unforeseen challenges, as each chassis configuration has unique thermal dynamics. For example, the airflow volume through a specific slot depends on the impedance of the board it supports as well as the impedance of boards in nearby slots.

Turbulence factor

Although airflow across actual system boards is turbulent, some vendors publish commercial airflow data that is based on unrealistic laminar flow test boards with minimal impedance. (See the Tutorial in the online edition of this issue: Fluid dynamics.) When specifying performance, most vendors only list results for the maximum achieved airflow, which is actually a transient operating condition that isn’t necessarily indicative of normal operation. Despite the comprehensiveness of the AdvancedTCA thermal specification, TEMs still need to characterize their system as a whole under real-world conditions. They should also consider using simulation tools to assess worst-case conditions and system response to faults.

Simulation tools

Board and chassis performance are tightly coupled, and it’s important to address their thermal requirements from a system-level interoperability perspective. Table 1 lists some of the simulation tools critical for modeling thermal interactions between various components.

Table 1: Thermal Design and Simulation Tools

Board-level thermal simulation

Thermal simulation tools, such as FloTherm from Mentor Graphics, are essential for optimizing component placement on a board. The FloTherm simulation results in Figure 1 show two design placement iterations of a 13U board. The initial simulation run (left board) indicates multiple components are too hot (in red) while others are receiving more airflow than needed. The airflow is bypassing hot components and flowing instead through other sections of the board, and/or the airflow is being preheated by other parts. This component placement disrupts airflow, squeezing high-powered parts into too small a space. Following some board layout changes (right board), a subsequent simulation shows that improved placement balances airflow and heat dissipation, leaving a smaller number of devices requiring additional attention.

Figure 1

Chassis-level airflow simulation

Simulation tools can model chassis thermal characteristics from various angles, as the 14-slot 12U chassis loaded with uniform impedance blade models in Figure 2 shows. The front chassis view (left side) shows the airflow is relatively uniform even across the blades, except at the outer slots, where blue indicates a relatively low level of airflow. The side chassis view (right side) again indicates the airflow is mostly uniform through the chassis; however, the rear of the front blade has more airflow than the faceplate side. This information identifies the worst-case locations that warrant the majority of the analysis time and is taken into account during system-level simulations.

Figure 2

System-level thermal simulation

The greatest payback from system-level thermal simulation occurs before the chassis and blade components have all been designed, because such simulation can prevent expensive redesign iterations. Typically, once the chassis and blades exist, it’s easier and more efficient to test the physical system-level configuration in the lab and then use simulation tools to help analyze the test results.

One approach, employed by RadiSys, is to use models of the boards and chassis to assemble a chassis using Computer Aided Design (CAD) with CPU blades installed. Blade-level thermal models are “installed” in a chassis model, and the simulation computes the airflow through the assembled chassis, as shown in Figure 3. The simulation allows the developer to walk through the chassis as a slice, represented by the purple plane going through the chassis, and look at any particular location within the chassis. This detailed and comprehensive perspective helps developers optimize component placement on the board – while it’s being designed and before it’s built – and see how it will run in an actual system. Simulations may be used to predict system behavior under different operating conditions, like full board power and decreased airflow to reduce system noise.

Figure 3

Simulating various system conditions

The previous section discussed how airflow across the board is affected by the chassis structure and board topology, but it’s also important to simulate system behavior under more difficult conditions. Some of these conditions may have been identified during the board and chassis simulations, like the highest power board, the worst chassis slot, and the worst interacting neighbors. Various conditions, such as maximum power consumption, may be difficult to reproduce in the lab, but they can easily be simulated. It’s also possible to simulate a component failure, like a faulty fan, or service procedures like Rear Transition Module (RTM) removal and fan tray replacement.

Figure 4 shows a simulated fan failure on a computing blade, where the temperatures of two CPUs are monitored. The simulation models the entire service procedure, from fan failure through to the fan tray replacement, including opening the chassis service door, which drives the temperatures to their highest levels.

Figure 4

Generally, complicated component interactions make determining the absolute worst-case condition difficult. However, simulation tools usually enable developers to evaluate a wide range of system configurations and operating conditions more easily than through lab testing. With its board and chassis models, RadiSys can simulate most combinations of board sets, chassis, and operating environments.

Thermal testing lab

As developers transition from design simulation to prototype testing, the RadiSys onsite environmental lab is used for board-level and system-level testing. The photograph on the left shows three RadiSys chambers, including one that’s large enough to test a 16-slot system and manage the heat produced by a system running at maximum power. The chambers support full environmental testing at the shelf level over all temperature and humidity ranges.

System integration testing and plugfests: different animals

System integration testing, like thermal integration testing, is essential for ensuring component modularity and multivendor compatibility. System integration is more than simply standards-based compliance testing. System engineers must understand the entire system in order to identify interoperability and compatibility issues and coordinate resolution with partners.

How does system integration testing differ from plugfests? Plugfests primarily address basic interoperability from electrical and management perspectives, whereas integration testing covers more functional areas, such as security, resource provisioning, and other system behaviors like performance and throughput. Additionally, system integration testing must be capable of testing large configurations, systems too big for plugfests.

Developed over a number of years, RadiSys system integration capabilities are particularly applicable to standards-based architectures like AdvancedTCA. System engineers and experts at RadiSys have worked with AdvancedTCA since its infancy and contributed to the foundation specifications. RadiSys engineers come from different disciplines and have hundreds of years of combined experience. They can track down and fix unexpected interactions among hardware, drivers, operating systems, and middleware, even if this means coordinating the necessary vendors to find the underlying problem. This integration and interoperability experience is invaluable for identifying and resolving issues early in the design phase. The team is constantly improving test coverage, increasing test efficiency, and lowering cost. Their broad perspective enables them to view systems like a TEM.

Venkataraman Prasannan (VP) is the Senior Director, responsible for AdvancedTCA Product Line at RadiSys Corp. He has more than 20 years telecom and networking experience in marketing, product, and business management. He has been involved in AdvancedTCA product planning and customer engagements since the inception of the industry standard.