Architecture, modeling, coordination, and synchronization of multicore applications

A recent Venture Development Corporation study on cites that the commercial market for components, tools, and related services for multicore reached $286 million in 2010, an increase of more than 36 percent from 2009. Further, the number of design teams incorporating multicore processors has more than doubled from 2008 (6.3 percent) to end of 2011 (16.1 percent).

According to the same VDC study, two of the primary issues impeding the adoption of multicore are:

  1. Software reworking is required to transition to multicore
  2. A lack of familiarity with parallel programming techniques

In this month’s column, we’re going to take a closer look at architecture options for , then dig a bit deeper to address these two main obstacles to multicore adoption.

Multicore architectures

The first step is to figure out the most effective way to utilize multicore for your application. There are two common software paradigms here – Symmetric Multiprocessing (SMP) and Asymmetric Multiprocessing (AMP).

One approach would be to run the same software on all cores. This approach will (in theory) scale performance of the system as more cores are added. This approach lends itself to SMP. Operating systems support this by providing an “SMP version,” which virtualizes the multicore processor. The operating system then takes care of executing the software across those cores to maximize the performance of the system and utilization of each core.

Another approach is to separate your system into functional components, then assign each of those functional components to a core (or group of cores). This is the approach used in AMP. For example, say your application breaks down into a graphical display interface, a user input interface, and a real-time processing component. You could dedicate one core to running the graphical display, one core to running the user interface, and one core to performing real-time processing. The benefit of this approach is typically cost reduction and miniaturization of the system. Traditional systems may employ a couple of processors and perhaps an ASIC or for all the functional components of the system. If you can eliminate these components and run these functions on top of a multicore processor that partitions its cores, then you remove component cost and reduce the footprint of the platform.

There are also hybrid approaches where you group multiple cores together and run those cores in an SMP environment, but separate out other cores for dedicated processing.

In general, if you’re trying to miniaturize your system, lower component count, or you require software isolation between certain functional components, you are probably interested in AMP. If you have a lot of small software algorithms that don’t have a lot of dependencies among each other and you can benefit by breaking up these algorithms into threads and applying more independent compute engines, then you’re probably interested in SMP.

Multicore software modeling

Software design, analysis, and modeling may be perhaps the most challenging part of developing a multicore application, for you have to “think parallel.” The meaning behind this is to think of your software as consisting of multiple processing code blocks. Some of blocks may have a dependency on other blocks, and some may be able to run without any dependencies. The key to “thinking parallel” is to be able to visualize the code blocks that have dependencies and those that do not, then implement the threads of execution to enable those without dependencies to run as independent threads and group dependent threads on one another into a thread of execution. Of course, these are general guidelines and there may be good design reasons to implement some blocks with dependencies independently.

The following description and examples apply primarily to SMP. Figures 1-3 attempt to illustrate the “thinking parallel” concept. Figure 1 shows a program that consists of logical code blocks A through E. We’re going to apply these code blocks and various dependency examples to illustrate the performance impact using a four-core multicore processor.

Figure 1: An example of a program consisting of five blocks of executable code.

Figure 2 shows what happens if the execution times of each code block are equal and E depends on D depends on C depends on B depends on A. You’ll notice – due to the dependencies – no speed-up is achieved. In fact, it’s possible that due to the overhead of communications and data passing between cores, the performance actually goes down and you end up with multiple cores that are significantly under-utilized!

Figure 2: This example illustrates the amount of time needed for a four-core multicore processor to execute code when each code block is dependent on the preceding code block.

Figure 3 shows what happens if B depends on A and D depends on C, and E has no dependencies. Applying this to a four-core processor can reduce the execution time to close to the longest execution time of B/A, D/C, or E. If the execution time of each of the blocks is equal and takes time “t,” we’ve reduced the execution time from 5t to 2t on the multicore system.

Figure 3: Fewer or no code block dependencies reduce execution time dramatically over code blocks with multiple dependencies.

Of course, these illustrations are simplified. Not only might there be dependencies, but there may also be shared resources to consider. Shared resources within a multicore software environment typically need to be made thread-safe through the use of locks, and passing information between threads in a multicore system may be more complex if you aren’t using an SMP operating system, which can hide a lot of these complexities.

There are some tools on the market that can help you with modeling performance of your software on multicore before you spend the effort reorganizing. Prism, from CriticalBlue, can perform analysis of source code, identify where synchronization points and dependencies are, and guide your development to provide some analysis/benchmarking information to make sure you meet performance expectations once you’ve moved to multicore.

Companies like Wind River and MontaVista Software also have a lot of experience in SMP operating systems, approaches, , and development tools. Access to those kinds of resources can be very helpful in accelerating the multicore learning curve.

Coordination and synchronization of multicore applications

The next level of issues that arise once modeling is complete is one in which dependencies are identified (and eliminated if possible) and work begins on performing communication between dependent code blocks that may run on different cores. The dependencies can vary – perhaps it’s a shared resource among multiple code blocks like an I/O device or a dynamic library call that is not thread safe. In some cases, the driver or library can be made to run within a multicore environment by doing its own locking and synchronization so it’s transparent to the threads of execution. This is typically the preferred method when it comes to sharing resources.

Data passing is a different animal. If you’re running within an SMP operating system environment, the operating system typically has the same mechanisms available to the threads as they would normally use for data passing – memory queues, pipes, shared memory, and so on. But if you’re running without a virtualized software layer, care must be taken so that no matter which core the code blocks doing the data passing are running on, a shared memory path is available for reading and writing in addition to the synchronization method.

Software trends

Software support for embedded multicore systems is also picking up steam. I mentioned CriticalBlue, a company whose software offerings are targeted specifically for multicore applications, and traditional software players like Wind River
and MontaVista are doing their part as well, updating their operating systems for SMP operation. Traditional software companies are also developing hypervisors, tighter coordination mechanisms between different operating systems for AMP operation, and upgrading analysis and debug tools to be able to better find and fix bugs that occur during testing. All these updates and improvements promise to make it easier to develop multicore embedded systems in a more efficient and cost-effective way.

MontaVista presented at an OpenSystems Media webcast last year entitled “Beyond Virtualization: A Novel Software Architecture for SoCs.” At this webcast, they introduced the concept of “containers” – lightweight virtualization that isolates processes and resources without the complexities of full virtualization. Containers could be a nice way to implement some applications that would benefit from AMP for the isolation and protection of components. Another interesting innovation presented was the concept of the “Bare Metal Engine.” This is something inside MontaVista that provides a user mode environment with direct access to the underlying hardware. There isn’t any kernel mode programming required and you can maximize performance on a given core within a multicore processor this way. So there are a number of innovations happening in the software world to provide flexibility for the wide range of applications that can take advantage of multicore.

Hardware trends

Before finishing up, it’s worth mentioning some hardware trends for supporting multicore in a more robust way. Virtualization often plays a big role within a multicore environment – as mentioned earlier, an SMP operating system or hypervisor middleware component can significantly decrease the complexity of developing multicore software. But adding layers can be slow, so many multicore processors are adding instructions to assist with virtualization tasks. For example, Intel has included the VT-x instruction set, and the Freescale e500mc and ARM Cortex-A15 also include instructions assisting with virtualization that SMP operating systems and hypervisor middleware developers can take advantage of. Within the processor, multiple levels of caching and/or independent caches for each core can be important for faster instruction execution.

Smarter I/O and device drivers are also a significant trend supporting multicore. As mentioned previously, offloading network protocol work, providing multithread/multicore queuing, and delivery within an I/O device also helps with multicore software development. Some I/O devices also provide special interrupt configurations to allow for special use cases in a multicore environment.

The multicore trend

Can your embedded system benefit from multicore? Hardware and software support for multicore processors is reaching critical mass. If you’re looking to consolidate components and increase your performance/power ratio, now is the time to be investigating how a multicore solution might work for you. While it’s true that multicore development can be more complex than traditional single-core CPI development, tools and technology have advanced to the point that multicore adoption is accelerating and an increasing knowledge base is available to help. In light of the fact that embedded systems competitors are likely already looking into multicore, perhaps the better question might be “Can you afford not to explore a multicore solution?”