Integrating flow processors to scale cybersecurity to 40 Gbps and beyond, part 2
In part 1 of this article, we discussed specifically what flow processors were and how they differ greatly from cache-based general-purpose CPUs and network processors. We also described a heterogeneous flow processing architecture that utilizes multiple proven techniques to increase overall system performance, including dynamic load balancing and stateful, per-flow traffic management and action processing.
In this second part, we will look deeper into what flow processing is and delve into descriptions of specific applications that require it. We will also discuss potential hardware implementations to realize the vision of flow processing.
What is a flow?
More network users, more applications, and a movement into big data and cloud computing are some of the fundamental forces driving the dramatic increase in network throughput that we are experiencing. This combination of factors ultimately results in more individual conversations, or flows, traversing our networks at any time. In fact, measurements have shown that a fully saturated 10 Gbps link may be carrying millions of flows simultaneously.
A flow is a unidirectional sequence of packets, all sharing a set of common packet-header values. From as few as two packet header fields and source and destination IP-addresses to as many as 40 varied Layer 2 through Layer 4 header values can identify and define a flow (Figure 1). Most network equipment employing ASICs or fixed-function network processors – including Ethernet switches and IP routers – are unable to process traffic on a per-flow basis, but rather make decisions based solely on information contained in datagram headers and typically support only a few packet header fields for flow definition and analysis. These devices process traffic packet by packet, retaining no in-memory or state information about previous packets after each forwarding decision. This stateless packet handling is adequate for core switching and routing devices, but fails to meet the requirements of flow-based cyber security, network analytics, and intelligent Software-Defined Networking (SDN) applications.
Software-Defined Networking and OpenFlow
In taking a more detailed look at some of the applications that require flow processing, one must look no further than SDN. OpenFlow is a standards-based SDN technology that has the potential to disrupt the existing switching and routing landscape. OpenFlow is based on a switching device with an internal flow table and a standardized interface to a centralized controller to add and remove flow entries and apply actions. OpenFlow provides an open, programmable, virtualized switching platform where the underlying switching hardware is controlled via software that runs in an external, decoupled control plane.
OpenFlow changes how networks are fundamentally built by allowing network administrators or programmable software to define the paths that flows take through a network, regardless of the underlying network topology and the particular hardware over which traffic traverses. OpenFlow allows networks to be carved into slices, where a particular slice is allocated a flow-specific path through the infrastructure and may optionally allocate portions of the network resources across that path. In addition, the OpenFlow standards are progressing to allow services to be included on top of this flow-based forwarding plane (www.openflow.org).
A processing architecture for implementing OpenFlow uses specialized, flow-aware, programmable processors that support line rate flow forwarding, high flow setup rates, massive flow tables, flow identification based on any set of flow ID fields, and flexible action processing. In contrast to other architectures like Ethernet switches and configurable network processors (NPUs), flow processors are completely programmable by nature and offer a very high instruction rate that can be applied to incoming flows. Flow processors can also provide stateful flow-based forwarding/pinning, nested flow forwarding actions for millions of flows, and dynamic flow-based load balancing. Additional benefits include the ability to offload traffic classification, provide flow content analysis, perform protocol offloads, packet rewriting, connection splicing, protocol termination, and support for PKI/symmetric and public key cryptography operations.
While SDN and OpenFlow are evolving technologies, the concept of flow processing is not a new development – it has been a mainstay in many network and cyber security applications for years. Considering the evolution of todays threat landscape, numerous cybersecurity applications would prove ineffective without flow-based, stateful processing of network traffic at line rates.
Intrusion Detection and Prevention
Cybersecurity applications like Intrusion Detection and Prevention Systems (IDS/IPS) rely on stateful processing of flows considering that modern attack vectors use invasion techniques to avoid detection such as spreading malicious traffic across packet boundaries, payloads, and even IP fragments. The popular open source IDS/IPS application Snort, for example, includes a preprocessing module that reassembles an entire Transmission Control Protocol (TCP) flow to run signature-based rules against the connection payload, rather than simply examining traffic on a per-packet basis.
Lawful Intercept and traffic management
Lawful Intercept and traffic management applications using Deep Packet Inspection must retain a per-flow state because reliable analysis often requires seeing across individual packet boundaries to identify protocols and applications. These applications may also use flow-based heuristics and behavioral analysis to reliably detect applications or protocols even if advanced obfuscation or encryption techniques are in use.
Network forensics and Data Loss Prevention (DLP)
Network forensics, Data Loss Prevention (DLP), and antivirus applications terminate flows at the TCP layer, parse the application protocol (such as HTTP, SMTP, and peer-to-peer), and potentially even reassemble file attachments to scan for threats and monitor for confidentiality breaches.
Stateful next-generation firewalls, or devices that integrate traditional firewall and IPS capabilities, are taking the flow processing concept even further by combining application and user identification, DPI, SSL inspection, and flow-based security processing with data plane Layer 2 switching, Layer 3 routing, network address and port translation, and VPN termination.
These applications would all be impossible without stateful flow-based processing, but differ from OpenFlow in that they dont use flow-based processing for forwarding, but rather for higher-layer security processing.
When integrating a flow processing architecture, designers have a variety of choices from the perspective of hardware implementation. At the most fundamental level, flow processors are available as silicon for custom hardware designs. This allows architects the utmost flexibility in bringing flow processing solutions to the market. For customers that are more focused on software applications and do not typically undertake hardware projects, flow processors are also available as accelerated plugin PCIe network interface cards and reference appliance platforms. This allows x86-based COTS servers to be used as a base platform for applications requiring flow processing in combination with x86 GPCPUs.
Many manufacturers have selected AdvancedTCA (ATCA) as their chosen architecture, as it provides an open, blade-based standard with an incredible ecosystem of hardware and software options. This openness and flexibility offers manufacturers a complete and varied set of potential building blocks upon which to develop applications.
ATCA allows the rapid addition of flow processing capabilities through ATCA-based flow processing blade designs in development throughout the ATCA ecosystem. ATCA also offers the ability to maintain pace with advances in x86 processing based on Moores law and ever-increasing CPU core scaling, whereas a proprietary designs would require a re-engineering stage every 18 months to two years to keep pace.
High-speed flow processor-to-GPCPU datapath
Regardless of the hardware platform chosen, utilizing multiple processor types in a combined architecture demands that the varied elements in the solution have an effective technique for sharing data. Due to the nature of the applications in question, each and every data packet may need to ultimately reach the GPCPU for processing. This demands that there is a large data pipe connecting these processors supporting up to 100 Gbps of throughput.
PCIe is the appropriate technology to tightly couple these discrete processors. The PCIe Gen 2 bit rate is five GigaTransitions per second (GTps), using an 8b/10b encoding scheme that provides an interconnect bandwidth of 4 Gbps per lane. By changing the encoding scheme to 128b/130b and introducing a technique called scrambling, PCIe Gen 3 supports a useful bandwidth per lane of 8 Gbps with only approximately 1.5 percent latency overhead. Considering that most PCIe devices offer multiple interfaces with 8 or 16 lanes per interface, either PCIe Gen 2 or Gen 3 products can offer the bandwidth needed to connect flow processors to GPCPUs.
Flow to 40 Gbps and beyond
Whether using proprietary hardware designs, PCIe cards in COTS servers, network appliances, or ATCA, the application dictates the overall computing workload that needs to be handled. Flow processing has been a stalwart technique to offer effective security applications for many years, and new and exciting technologies like OpenFlow are also purely based on flows rather than packet processing. An effective overall product architecture to scale these applications to 40 Gbps and beyond with support for millions of simultaneous flows and high flow setup rates combines flow processors and GPCPUs in a heterogeneous architecture. This has proven to be an operative way to offer high-performance data plane throughput, scalability, and reliability while retaining security effectiveness on a variety of hardware footprints.
 Scrambling involves applying a known binary polynomial to a data stream where data can be recovered on the other side of the interface by running it through a feedback topology using the inverse polynomial.