Software framework for custom Network Intelligence applications
Deep Packet Inspection (DPI) technology is maturing and a variety of DPI software and system solutions are entering the market. These solutions are typically focused on identifying various widely used network applications such as e-mail protocols (SMTP, POP, IMAP), webmail services (Yahoo, Hotmail), or IM/chat and social media services (Facebook, Twitter). While these services are mainstream, there still exists a vast array of regional services, specialized applications, and custom usage models for DPI where these solutions fall short. So what about these specialized applications? Must developers create their own DPI implementations from the ground up in order to implement monitoring, data retention, or surveillance products for regional services? Online gaming monitoring? Special usage models?
In this issue, we’ll explore requirements behind specialized DPI applications and take a look at a recently announced custom plug-in software development kit for DPI that provides an example of support infrastructure for specialized DPI applications.
How DPI works
DPI is predicated on the ability to parse through the various layers of a packet in order to determine if it is “relevant” – relevance depends on the specific application. For example, in a surveillance application where the government has a warrant to obtain a person’s e-mail correspondence, the relevant packets are limited to POP3, IMAP, SMTP, or webmail packets. All other packets can be ignored. Or maybe an Internet gaming company is monitoring their system for bad or inappropriate language. In this case, only the chat packets within the context of the game server environment would be relevant.
Looking at the IP and TCP/UDP layers of an IP packet doesn’t provide a lot of information, and certainly doesn’t identify application-level services other than perhaps identifying the IP address of servers that might be running a particular network application or service. In order to adequately process application content, individual IP packets need to be reassembled into the original network service application message. For example, if an application needs to scan for the word “bomb” within the body of all e-mails on the network, the DPI software needs to capture and reassemble the IP packets, decode the POP3 or IMAP protocol, identify the user transaction being performed (such as log in, mail read, and mail send), gather the body of the e-mail in the message for mail reads and mail sends, then scan the mail body, subject line, and attachments for the desired words or phrases. In order to do this, TCP reassembly (and often higher layer decompression or analysis) is required.
Figure 1 shows a diagram of how TCP reassembly works. Each individual TCP datagram contains a sequence number and size. Using sequence numbers, the TCP receiver can place each datagram in the proper sequence to reconstruct the entire message riding on top.
There are other things that “get in the way” of DPI applications. Services like webmail ride on top of the Hypertext Transfer Protocol (HTTP). The HTTP protocol is a layer that passes HTTP commands (called GET or POST messages) and responses. Within these HTTP headers lie cookies and other information about the HTTP connection. Further, large responses are often chunked and compressed so the messages use less bandwidth. Therefore, before determining the ultimate deep packet scouring, the reassembled HTTP messages might need to be dechunked and decompressed as well. There is also a further complication – a single logical client HTTP session may hop around through various TCP ports (and even server addresses). So, when tracking a single HTTP session, the DPI software must have the ability to recognize an HTTP session in the midst of ever-changing IP four tuples (client IP address and port, server IP address and port).
Architecture for DPI
IP Fabrics, Inc (www.ipfabrics.com) has recently announced a product called OpenIntelligence, available for their DeepProbe intelligent probe. Traditionally, the DeepProbe contains a set of modules that perform DPI (and something they call Deep Application Inspection) to track e-mail, webmail, IM/chat, and other social media services. The OpenIntelligence system consists of a software development kit that enables custom application processing to be written, loaded, and run on the DeepProbe. These custom application plug-ins make use of the existing DPI infrastructure within the DeepProbe, which enables faster and easier development of specialized monitoring, sensing, and data retention applications.
Before looking at the customizable environment, let’s look at the DeepProbe DPI product architecture, shown in Figure 2.
The overall DeepProbe DPI architecture consists of a configuration and provisioning block, packet processing block, Linux environment, statistics and alarms, and NPU packet processing block.
The configuration and provisioning block stores information regarding relevant packet flow definitions. This might include what network services are being looked at, specific scan strings being looked for, or specific endpoints (such as e-mail addresses, and usernames). The information in this block is typically persistent, and on reset or power cycle the configuration and provisioning information is re-established in the system.
The packet processing block contains a number of components labeled “SM” in Figure 2. These SMs, or Surveillance Modules, implement specific packet processing for a network application (such as webmail service, IM/chat service, and Twitter capture). The rest of the packet processing block implements the infrastructure mentioned at the beginning of the article – flow tables, TCP reassembly, HTTP dechunk/decompress, and so on.
Deep Packet Inspection requires a lot of horsepower. At 1 to 10 Gbps, a minimum-size packet (64 bytes) must be completely processed within 0.512 microseconds in order for the DPI engine to maintain wire speed. As a result, multicore processors and/or network processors are used to provide stages of packet processing. Figure 2 shows a network processor block that performs some of the initial IP/TCP/UDP packet processing in order to get the overall traffic to a manageable amount for the DPI engine to process. In DeepProbe’s case, the NPU also does a small amount of content string matching in order to identify HTTP GET/POST messages. For example, a significant portion of global Internet traffic is now video, so an NPU’s ability to prevent that traffic from entering the DPI engine is a big win.
The plug-in environment
Now that we’ve covered the overall DPI environment, let’s take a closer look under the hood at the OpenIntelligence custom plug-in capability. There are two components to the OpenIntelligence capability – the development environment and the runtime environment.
The runtime environment is shown in Figure 3. The figure illustrates all the services provided within the plug-in environment. The gold block labeled “CPI Library” is the component that’s filled in by the developer of the custom application. Each of the CPI Library components run within the context of a single Linux thread (called the “CPI Thread” in the diagram).
When the system starts up, the CPI process (which runs within the packet processing block from Figure 2) launches a Linux thread for each CPI library in the system. The CPI thread links to its library, then calls the library entry point for initialization. Provisioning and configuration information is sent to the CPI library on power cycle/reset or when provisioning commands come in. The CPI library utilizes the “Load Signatures” service that sends IP address, port number, and content string signatures to the network processor. This provides the pre-filter information needed by the network processor to send relevant packets to the CPI library.
The CPI library has at its disposal a number of services within the DPI engine – among them flow table management, TCP, and HTTP protocol processing. The information comes to the plug-in as a completely reassembled (and in the case of HTTP, dechunked and decompressed) message. That simplifies the coding effort within the plug-in.
What a “flow” is can be deceiving. Common industry understanding of a “flow” is a five tuple: the source IP address and port number, destination IP address and port number, and the IP protocol. However, as described previously, if a logical webmail flow jumps around from various client ports and even server IP addresses, there must be a mechanism in place to identify these higher layer logical flows as a single entity and map the single logical flow to the changing five tuple flows. IP Fabrics describes this as a key part of their Deep Application Inspection (DAPI) capability. The custom plug-in library can define these higher layer logical flows and take advantage of the DAPI infrastructure.
Flows being processed by the plug-in library typically consist of client-to-server commands and server-to-client responses. The CPI library can control whether the DPI/DAPI engine passes the flow packet-by-packet, entire command, or entire response. For example, for HTTP transactions, typically the application processing block wants to see the command packet-by-packet until it identifies the HTTP request being made. Once it knows, the information really wanted comes in the response. At the point it knows what the command is, the CPI library can tell the environment to stop delivering command packets and just deliver the entire HTTP message response when it arrives. This enables a high degree of efficiency in the processing environment.
The OpenIntelligence product is essentially a software development kit that includes the source code and libraries needed to build, load, unload, and run plug-in libraries on the DeepProbe product. An example application, tutorial, and library documentation is also included. One of the examples is the MailDotCom service, so right out of the box the developer can get familiar with how the interface and processing works by getting a MailDotCom account, setting breakpoints in the MailDotCom library, and tracing through the processing and output messages delivered.
Application and inter-application tracking
Most DPI applications today focus on tracking a specific network application, but sometimes that doesn’t provide all the information necessary. Custom DPI plug-in capabilities like the new OpenIntelligence product enable creative and unique usage models for Deep Packet Inspection. For example, let’s investigate how a custom DPI framework could be used to follow not just a single application flow but interactions between multiple flows.
Let’s take an example like online transactions using eBay and PayPal. The general operational model here is that a buyer has a registered eBay and PayPal account, as does the seller. The seller posts a product on eBay, potential buyers bid, and the winner pays the seller via the PayPal accounts.
From the DPI perspective, the general event tracking for eBay might be account creation, account log in, account posting a sale, account action/sale complete, account log out, and account close. The PayPal events might be something like account creation, account log in, payment received, payment sent, log out, and account close.
Now let’s look at a DPI use case involving potential online fraud. One method of online fraud involves an eBay seller selling something they don’t have, collecting the money via PayPal, and then closing their PayPal and eBay accounts. If a DPI application were tracking the PayPal and eBay applications separately, the PayPal or eBay events by themselves wouldn’t throw any red flags – they are all perfectly legal, valid transactions. Therefore, nothing happens and some number of weeks later when an angry buyer hasn’t received their product and the seller accounts have been closed, a phone call or complaint is registered and investigations begin. But a custom DPI application that was created with eBay/PayPal interaction rules, such as “notify if an eBay seller closes their PayPal account while they still have a pending eBay transaction,” would identify potential fraud in real time and increase the chances of preventing it. Doing this kind of application interaction tracking has the potential for powerful network intelligence.
Hopefully this look into the technical details of DPI has provided some insight into the infrastructure behind the technology. Custom frameworks built into Deep Packet Inspection solutions that take advantage of this infrastructure can help enable and accelerate the development of a wide range of Network Intelligence capabilities for telecom, law enforcement, and the enterprise.
For more information, contact Curt at email@example.com.