MicroTCA.4 continues culture of COTS signal processing at DESY
In this interview with engineers in the Machine Beam Controls Group at DESY, Holger Schlarb and Michael Fenner discuss the data acquisition requirements of the accelerator community, as well as how a tradition of using COTS signal processing solutions continues to improve uptime and maximize performance for some of the most complex machines on Earth.
SCHLARB: We are in charge of all of the fast feedback systems that control the electron beam in the DESY accelerators. That includes orbit stabilization systems and arrival time stabilization, RF controls of the accelerating structures, timing distribution, RF synchronization, and optical synchronization down to the femtosecond level. Our part of the group mostly does the analog and digital hardware design for systems that preprocess data and then digitize it later on. We also do quite a bit of digital design because of the high data rate and the feedback system topologies that are built into the FPGAs.
For the accelerator, high data rates are required to optimize the slow controls or to debug the machine, but most of the data is just very strongly compressed at the end and then thrown away, so there’s not really physics behind it, it’s just to get the maximum out of the controls or to increase reliability. What we do is take rather high sampling rates of all analog data – typically 125 megasamples per second (MSPS), 200 MSPS, and sometimes 500 MSPS at 14-16-bit resolution. This is then compressed in FPGAs. So there is a first data reduction that is done, then it’s shipped over to the CPU and a second data reduction is done, and what’s remaining is what will go into the actual system. This data will be kept for a week or two weeks before being further compressed.
FENNER: We also have extremely high demands regarding analog performance, so we need an environment that allows extremely fast analog design – we’re talking about low noise, we’re talking about space for the analog electronics. At the same time on the digital side we have a need for very high data throughput, so digital links with speeds up to 10 Gbps using PCI Express Gen3. In addition to that we have high computing power requirements, so we use Intel Core i7 CPUs inside the system.
SCHLARB: What you need, specifically for analog performance, is very good clocks going into the analog-to-digital converters (ADCs), otherwise you ruin your data and you have to clean it up locally. Also, everything is triggered, so the machine is always synchronously running, so there is a trigger sent out, and at that time every station knows the electron beam is coming. The precision we need here is on the order of 10 picoseconds for many applications, and some high-end applications deal with sub-hundred femtosecond clocks. So basically we try to reach the upper limits of the ADCs, which is on the order of 50-60 femtoseconds, which is pretty demanding and not easy to achieve.
What we’ve observed over the past few years is that the power of FPGAs has increased so dramatically that we can now do things that we couldn’t have done 5 or 10 years ago. This allows us a much higher sampling rate for preprocessing, which gives us more capability in controlling the machines to get more precise data. With the improvement in digital electronics, suddenly there are much higher data rates we have to cope with. The other thing we see that increases the data rate is that we are now able to use camera systems to monitor the beams and beam sizes, which is also contributing to these really high data rates. This was not feasible a few years ago. So the hunger for having more bandwidth, more processing power comes from these developments and allows us to be in a better position when managing the controls, which is really an advantage for rather little money I would say because when you look at the investment cost of the European XFEL that we’re currently building of about 1 billion euros, machine beam controls is maybe 5 percent of that. So the development and the entire volume of investment is 20-25 million euros, which is a rather small amount but you can gain a lot if you just increase the reliability by 5 percent. You make back the beam controls investment after just a few years of operation.
Given these factors, can you provide insight into DESY’s involvement in MicroTCA.4?
SCHLARB: In 2005 the decision was made that for the next collider we would move away from the VME architectures that are commonly used in our community because the bus is parallel, it’s too slow, it’s very noisy, and all of the reliability issues would not have been met. So this time we decided to look at the xTCA family because of reliability reasons, and in 2007 the decision was made to develop with xTCA for the European XFEL accelerator.
We had two groups, one that looked at MicroTCA.0 (mTCA.0) and one that looked at AdvancedTCA (ATCA). The conclusion was that the form factor of ATCA was too big because we have a lot of distributed systems – we have 200 racks in the XFEL distributed over 3 km, so one about every 10 meters. ATCA was not really suited for that because there are a lot of cables you have to bring to one point, and mTCA.0 wasn’t suitable because the PCB area was not enough for the analog preprocessing electronics to be installed. At this point there was an effort triggered by the Stanford Linear Accelerator Center (SLAC) and DESY, along with many others, to adapt the xTCA family to the physics community. This was mTCA.4, which was driven by need to develop a specification that enabled all the benefits of a telecommunications architecture like reliability and high data throughput, but also incorporate the demands of the physics community. The outcome was a platform that is a little bit bigger, but especially that also made the rear transition modules (RTMs) larger as the RTMs were very small at that time.
FENNER: A unique selling point of mTCA.4 is that it provides as much space for the analog part as for the digital part. That is one point that is not found in, for example, ATCA, and in addition to that we paid a lot of attention to getting clean power supplies from the market. Because of our high requirements on the analog side, we have much stronger requirements for noise in such a system, which is why the power supplies that were good enough for digital mTCA.0 applications were not suitable for us. So we put a lot of effort into pushing the manufacturers to develop very clean and very low noise power supplies, and we supported the rack manufacturers in making systems that were low noise in terms of EMI.
What also differentiates xTCA from other standards is the full management of the boards, so supplying power, removing power, supervising temperature, increasing fan speeds if something gets too hot, hot plug capability – all of this is something that is new compared to old systems like VME. It’s of high importance that all of this exists, because without this we would not have such high reliability in the system.
SCHLARB: As I already mentioned, we have about 200 racks and 5,000 complex electronic AdvancedMC (AMC) boards with a lot of components on them, and the other thing is that these electronics are installed in the accelerator tunnel so there is limited access. We have access once a week for a few hours, so reliability and diagnostics is very important for us because if something breaks we have to be able to fix it externally, or at least be able to diagnose what is broken so that on the next maintenance day we can exchange the particular component and immediately continue running. For the controls group this is very important because there are only one or two dozen people that have to maintain all of this. There, standardization is really important so that not everyone is using specialized electronics that have different interfaces, and so on. Standardization is what makes it possible to run such a complex system with a very limited number of personnel.
Now, we are pretty happy. At the moment we can say that this standard meets the analog performance requirements, and we also have the reliability and digital performance from the telecommunications background. We are in large-scale production now, so we are ordering all of the boards and starting to install them into the facility.
What other improvements does mTCA.4 include?
SCHLARB: What we found for RF in the mTCA.4 specification as it’s written right now is that we have to deal with very complex cable management because we put a lot of RF cables into the system, and there were a lot of redundant cables. We introduced a second digital transfer AMC backplane, but behind this is an analog backplane that transports only analog signals that are very high quality. This is becoming part of the mTCA.4 standard, and it basically allows us to ease cable management because it reduces the number of cables you need. This is of general interest because the radar community or the lidar community, anyone who deals with a lot of RF signals or has a lot of sensors to integrate runs into similar problems. The RF backplane allows you to reduce this cable management issue significantly, and there are also some additional modules for that. We aren’t limited in performance compared to proprietary, isolated RF modules, as we can transport RF signals with femtosecond precision over this RF backplane, which meets the requirements of the market.
A particular point of interest as well is the separation between AMCs and RTMs, because AMC boards are usually more complex digital boards that have big FPGAs and modern ADCs, while the RTM is more for analog, which are cheaper and have much longer lifecycles because analog electronics don’t change quickly (it’s normally about 10-20 years before you reach end of life). For the AMC boards you have much faster turnaround time because of the FPGAs, CPUs, and DSPs that are used, so it’s more like two or three years until new ones reach the market. This separates the lifecycle management of your electronics, and it also separates the developer groups – so one is more of a digital developer group, and the other is analog electronics people that have no clue about how MicroTCA works, but they know analog development. That’s a big plus for this standard.
Why are standards-based solutions so important for an organization like DESY?
SCHLARB: The open architecture is very important for us because we have to worry about vendor lock-in situations. We have to be very careful that we can always supply our machine with electronics, so we have to make sure we protect ourselves against companies that might make different strategic decisions. So multiple vendors is very important for us for our components, because obviously we don’t want to build standard components ourselves. We are not building CPUs – there are highly specialized companies for that. There is also a pretty long tradition at DESY of supporting open standards.
DESY Twitter Facebook