The Deep Packet Inspection spectrum: Evolution from technology development to plug-and-play products

(DPI) has been around for a while and many companies in a myriad of industries have DPI capabilities in their products. For example, intrusion detection and virus scanning are both common technologies that utilize DPI to identify possible threats to the enterprise or computers. Another trendy DPI application involves personalized marketing where analytics programs utilize DPI to monitor what websites you visit and the kind of content you are interested in. Using the information gathered enables smarter ad placement on the sites you visit and the content you download.

Software technologies tend to go through phases. First, the "in-house" development of software utilizing the technology occurs. Then software vendors typically provide software packages, development kits, and libraries so companies that do not want to shoulder the development can incorporate the technology. Sometimes, the technology evolves into an open-source initiative, which tends to drive down the price of the off-the-shelf solutions. Finally, interoperability with specific applications and platforms occurs and the technology has finally evolved from "do-it-yourself" to "plug-and-play."

Where does DPI sit on this "technology-to-product" spectrum?

DPI explained

In the general sense, DPI involves looking at traffic flows above the transport layer and inside application messages to gather information about who is running the application and what the application is doing. In order to understand the technical complexity behind DPI, you first need to think about all the layers involved. The fundamental building block within an Internet Protocol (IP)-based network is the IP packet. IP packets have a fixed maximum size (usually around 1,500 bytes although jumbo frames can be as large as 9,000 bytes per packet if the network supports it). The transport layer above may group multiple IP packets together to form a logical transport message - for example, the UDP protocol creates "datagrams" above the IP layer, while the TCP protocol provides a byte-stream abstraction where the flow is simply thought of as a continuous stream of bytes. Above the TCP layer, the Hypertext Transport Protocol (HTTP) represents the largest percentage of data running around the Internet since it is the predominant language used to communicate between web browsers and websites. HTTP further abstracts a TCP flow into a series of chunked, HTTP transactions between a client and server.

So, in order for DPI software to look into the application layer data, it must:

  1. Put together multiple IP packets to reconstruct the transport layer data
  2. If the flow is HTTP, perform dechunking and decompression
  3. Correlate the multiple HTTP transactions into a logical client/server session
  4. Identify signatures in the HTTP transactions to identify the web application, who the communicating endpoints are, and what the endpoints are doing

A classic example of DPI involves e-mail services. This kind of DPI may be used for lawful intercept or network surveillance. There are three common e-mail service protocols - the Post Office Protocol (POP) and Internet Message Access Protocol (IMAP) used between mail clients and mail servers, and Simple Mail Transfer Protocol (SMTP) for server-to-server mail relay. DPI applications in this environment attempt to recognize e-mail addresses within these mail communications protocols. That means the DPI software must not only be able to piece together these POP/IMAP/SMTP messages, but they must then understand the mail transfer protocol in order to extract the mail headers that contain e-mail addresses for the e-mail. Many mail protocols operate on standard layer 4 ports (called "well-known" port numbers). But DPI software cannot bank on this because mail servers can be set up to use different transport ports, putting an additional burden on DPI software to identify content signatures of the traffic flowing on all transport ports.

Once the e-mail protocols are decoded and the e-mail addresses are identified, DPI still needs to determine "what" the e-mail application is doing. Is the client reading e-mail? Sending e-mail? Does the e-mail contain attachments?

DPI is not easy - it involves the ability to classify packet streams, understand common network transports, reverse engineering of application communications, and the extraction of relevant information for the application.

DPI "do-it-yourself-ers" through new players

On the spectrum, there have been a number of "Do-It-Yourself-ers" - Cisco Systems, Juniper Networks, and a number of network router, intrusion detection, and intelligent firewall companies have all created their own classification and DPI software for use in their more advanced equipment. Other networking players in the market like Procera Networks, Sandvine, and Allot Communications focused on DPI-enabling systems for telecommunications and enterprise applications.

Over the past 10 years, new DPI entrants have popped up around the world based on the premise that DPI was going to be a big industry. Companies like ipoque, Qosmos, and the recently acquired Vineyard Networks all had or have these kinds of packaged DPI software solutions. Some also released appliances with DPI built in. IP Fabrics is a company that was founded in 2002 on the premise of making network processors easier to program for DPI applications. IP Fabrics has since evolved from DPI programming languages for network processors to more fully featured DPI software and appliances.

Last year, Wind River strongly promoted their Application Acceleration Engine that provides a robust software environment and includes things like Internet Protocol Security (IPSec), DPI, pattern matching, and flow classification. Software from companies like Qosmos and Sensory Networks fit within this framework to provide a more complete, interoperable, multi-vendor solution.

Sensory Networks is a new company that provides a unique version of software pattern matching called HyperScan that provides low latency, low overhead, and massively parallel matching technologies to accelerate DPI applications.

What about open source?

Yes, Virginia, there is an open source initiative. A long time ago it was called OpenDPI. This initiative eventually became privatized, but its codebase evolved into the nDPI initiative. This package is primarily a flow classification library that can identify signatures for over 160 different network services or applications. The open source codebase provides good insight into recognition and classification of flows, but taking that package and integrating it into a product still takes a significant amount of effort.

DPI and packet processors

DPI takes a significant amount of processing power to string together packets, identify flows, and perform dechunk and/or decompression. Therefore, DPI applications often make use of packet processors. Packet processors take some of the "heavy lifting" activities away from DPI software by identifying the offsets within a packet where the IP, TCP, HTTP, and content starts. Packet processors may also perform some identifier matching in order to help with classification or application endpoint identification.

Cavium has developed some DPI software for their OCTEON packet processors called TurboDPI. This package consists of software that runs on the OCTEON packet processor and performs a lot of the heavy lifting involved with DPI.

Where is DPI on the spectrum today?

It is clear that there are revenue-generating "killer apps" for DPI, and many companies have developed DPI variants to incorporate into their products. Along the way, software companies have developed libraries and packages to incorporate into DPI applications, and even packet processor companies are providing supporting software. The nDPI initiative shows that the open source community is also involved.

However, there are a couple of significant barriers to bringing DPI software from where it is now to truly "plug-and-play." The first is the lack of standardized software interfaces and reference diagrams. Without a standardized framework, it is very difficult to provide plug-and-play operation for software systems, and there are a lot of moving parts to standardize: classification interfaces, the application decode interface, and software management are just a few.

The second barrier involves the sheer volume of network applications and the rate at which they change. There are literally tens of thousands of network applications being used and more being developed by the hour. These applications increase or decrease in importance, so having the right functionality in your product is hit-and-miss. And since these protocols change (some change frequently), it takes a lot of effort to keep the decoders up to date. Any change by the network application provider creates the possibility of the DPI software breaking for that application.

Still searching for plug-and-play DPI

DPI has certainly evolved, and if you are keeping score, you can check off almost all the milestones along the way. But due to the multitude of applications, varying requirements by industry, and the complexity of maintaining the code, the plug-and-play milestone may be hard to reach anytime soon.

Topics covered in this article