Signal integrity performance optimization of the package interconnect structure with PCI Express Gen 3 interface

7Signal integrity issues across the PCIe communication interface have become increasingly problematic as system crosstalk has risen with the data transfer rates of next-gen devices. However, as Jitesh Shah explains, first-pass performance evaluations and Gen 3 test devices can lead to crosstalk performance optimization.

The next generation of PCI Express (PCIe) devices transmits and receives data at an 8 Gbps data rate. These devices are intended to be pin-compatible with the previous generation of PCIe devices with a SERDES interface switching at 5 Gbps. With this leap in data rate, signal integrity issues, which were of minimal concern for prior generations of devices, have become a challenge for the next-generation PCIe device family. Most of the signal integrity issues are attributed to the device pinout, with respect to the Transmitter (TX) and Receiver (RX) pin locations on the device pinout.

This discussion explores design techniques implemented in the interconnect path from the chip to the PCB, mitigating the impact of a pinout constrained by pin compatibility and package size, and improving the signal integrity performance to on par with or better than the previous generation of devices.

PCIe communication interface: speeds and signal qualities

PCIe transmits and receives data serially, one bit at a time, and over a communication channel, as compared to the legacy PCI interfaces, which use a parallel bus (with multiple interconnects sending and receiving data simultaneously). A serial link may appear to transfer less data per clock cycle, but it can be clocked considerably faster than parallel links and achieve an order of magnitude higher data transfer rate. A parallel bus is typically limited to data transfers in the 100s of MHz range, compared to a serial interface transferring data in the GHz region. For example, the typical bandwidth per pin ratio between PCI-X (64 bit, 133MHz) to PCIe Gen 1 is 7.1 to 100. Factors allowing a serial interface to clock data faster include:

·    Clock skew between different interconnect channels is irrelevant

·    Since a wide bus is not required, a serial interface occupies significantly less space, allowing room for better noise isolation from neighboring aggressor signals.

·    The serial interface employs differential signaling technology, which has an inherent advantage of better power supply and/or crosstalk noise rejection capability


With the initial decrease in challenges going from parallel to serial, a new crop of challenges have surfaced and are increasingly difficult to address as interface speeds increase from one generation to the next. A constraint in the design of a device with a PCIe interface for next-generation protocol is pin compatibility with the previous generation of devices. One of the issues, which is addressed in this discourse, is the impact on signal integrity performance when speeds increase beyond 5 Gbps but the interconnect structure remains constant. Strategies to mitigate degradation in signal quality are also discussed. As shown in Figure 1, two transmit and two receive pairs are located adjacent to each other without adequate power or ground pin shielding between them on the pinout.

Figure 1: Inadequately shielded package pinout of a device with a PCIe Gen 3 interface results in signal integrity issues.


The vertical interconnect transition for the routing from the bump to the solder ball in the package is typically dictated by the pinout. In scenarios where the pinout has limitations such as those shown in Figure 1, this unshielded structure is unbounded and will electro-magnetically couple to each other, causing excessive pair-to-pair crosstalk and other associated signal integrity problems including excessive jitter and EMI radiation.

Interconnect description and first-pass performance evaluation

The device on the PCIe interface is a flip chip and is in a Ball Grid Array (BGA) package. For a first-pass performance evaluation, a section of the package with three transmit differential pairs is extracted and modeled using an FEM-based 3D electromagnetic tool. The extracted model has sufficient bandwidth to simulate signals with edges as fast as the 10 ps range. Three pairs are extracted to enable victim-aggressor-type crosstalk simulations. This first-pass design is well optimized to meet the return loss specification of -15 dB at the fundamental frequency, and -10 dB at the first harmonic frequency of the data rate.

In an ideal situation, differential crosstalk is non-existent due to the common-mode noise rejection property of a differential system. In the real world, however, crosstalk will always exist due to uneven coupling between the aggressor and victim pairs. A typical crosstalk setup and the associated Near-End Crosstalk (NEXT) and Far-End Crosstalk (FEXT) measurement points are as described in Figure 2.

Figure 2: The NEXT and FEXT measurement points in a differential system.


Packaging, board vias, and connectors are the major contributors to crosstalk in a system. The optimized design of the package with a PCIe interface should be within the NEXT and FEXT targets of 5 percent of the total aggressor swing.

For the evaluation, the top and bottom pairs of the extracted model are excited with a pseudo-PCIe Gen 3 driver and with the center victim pair held quiet. The NEXT and FEXT observed on the center victim pair exceed the allowed crosstalk by 80 to100 percent, as shown in Figure 3.

Figure 3: The first-pass interconnect layout and crosstalk performance.


Although it is assumed that the predominant source of crosstalk is in the vertical transition region of the core via and solder ball, an attempt is made to discount potential noise injection on the victim pair emanating from other regions of the package interconnect structure.

The center and bottom pairs are shielded from each other in the vertical transition region due to the presence of ground pins between them. In order to prove that the majority of noise is induced due to the locations of unshielded transmit pairs in the solder ball region, another simulation is performed with the top pair driven and the center and bottom pairs held quiet. If the crosstalk seen on the center pair is close to or equal to that seen in Figure 3, this will prove that the biggest source of crosstalk is caused by the pin location of the two transmit pairs. This also proves that the solder bump region and the transmission line region do not contribute to a large extent of crosstalk induction on the center victim pair. As can be seen from Figure 4, both NEXT and FEXT magnitudes are nearly identical to those in Figure 3.

Figure 4: First-pass interconnect crosstalk performance with the bottom aggressor pair held quiet.


Optimizing interconnect structure to improve crosstalk

Since it is proven that the biggest contribution of inter-pair crosstalk is in the solder ball region and the pinout is unable to change due to backward compatibility, the majority of optimization must be done around the vertical structure of the core vias in the package. The drilled vias through the core of the substrate are 800 μm thick, and are the longest unbounded coupled regions of the entire interconnect. In the optimized version, instead of the core vias of each differential pair aligned with their respective solder balls, they are shifted away from each other with ground return vias centered in between them, as shown in Figure 5. The associated crosstalk performance improvement is also shown in Figure 5. The NEXT and FEXT targets, 5 percent of the total aggressor swing, are achieved with this simple modification.

Figure 5: An optimized interconnect core via’s layout and crosstalk performance.


The discontinuity through the core via and solder ball transition is predominantly capacitive, and with the addition of the extra return vias, the additional capacitance does marginally degrade the transmit pair return loss performance. But the return loss is still well within PCIe Gen 3 specifications so as not to be an issue.

Validating the simulation

Validation of the expected crosstalk performance of the device under consideration was performed with another test device, which used the same PCIe Gen 3 SERDES interface. This test product has been exhaustively validated in a Gen 3 environment with no signal integrity issues. In order to benchmark the signal integrity performance of the proposed package under consideration, a similar approach of model extraction and simulation is used to simulate the crosstalk performance of the test device. Figure 6 highlights the differential pairs extracted and the associated crosstalk performance of the quiet transmit and receive pairs.

Figure 6: Validating the optimized interconnect performance with another PCIe Gen 3 device.


There are a few things to consider about the results in Figure 6. This device is packaged with wire-bond interconnects between the chip and package. The pad ring is well optimized, with the right mix of return path connections interspersed with signals to minimize crosstalk in the wire-bond region, which is evident from the crosstalk performance seen on the RX pair. Since RX wires are close to the aggressor TX wires, a pad ring with inferior pad layout could exhibit significant crosstalk on RX, but in this case the crosstalk is well within PCIe Gen 3 specifications (less than 10 mV peak-to-peak for both NEXT and FEXT). The crosstalk on the quiet TX pair when the neighboring TX pair is switching is 3x more than that seen on RX. This once again proves the dominance of crosstalk due to the pinout. There is no optimization done at the solder ball region in this package since the crosstalk is within specifications. Low crosstalk in the wire-bond package due to the pinout is attributed to the relatively smaller solder ball and via geometries, as compared to that of a typical built-up flip chip package shown in Figure 3.

Walking the walk despite the crosstalk

By correlating the crosstalk performance of the optimized package for the device with a Gen 3 interface to an existing Gen 3 test device, it can be shown that even with a constrained pinout, the optimization strategies used to isolate core vias guarantee an acceptable signal integrity performance.

With the continuous march toward higher bandwidth, the techniques discussed herein are critical for the successful implementation of the device in a system. IDT expends a significant amount of effort in getting the design right the first time by employing a variety of tools and expertise, as demonstrated in this study. Similar design and optimization techniques are continuously deployed as needed in a range of IDT devices such as PCIe- and SRIO-based switch and signal integrity products.

Jitesh Shah is a Principal Engineer within the Central Manufacturing Group at Integrated Device Technology. Jitesh has more than ten years of experience managing component and system-level design, signal and power integrity, thermo-mechanical and thermal management issues. Jitesh can be reached at