Tuesday, May 29, 2007

Programming in our parallel future

The future of computing is parallel, or “multi core” as it’s currently called. In 2005 dual core processors became available, 2007 it’s quad core, presumably we’ll see octa core machines in 2008. Intel has shown a prototype 80-core processor, they expect to ship such processors in 2011. In 10 years time a typical computer will probably have hundreds of cores. Many applications have a 10-year life span, so how will you ensure that the system you are working on now will be able to utilize the computer hardware of the future? Let’s examine some ways to introduce opportunities for parallelism into your system.

The sun always shines
The first method is that of the lazy optimist: someone else will fix it. Improved compilers, library implementations, runtime environments or RDBMSs will take care of the problem and you’ll just keep coding as usual. I wouldn’t bet on it. More specifically I wouldn’t bet my company, or my client’s company, on it.

Is the old school the new school?
A common architecture in Unix applications (and Windows applications designed by Unix people) is that of Pipes and Filters. The application consists of a number of processes communicating through pipes or sockets. There is a natural parallelism here: each process could run in parallel on a its own core. There can also be a nice correspondance between the problem domain and the executing system, often each process represents a domain object. In practice there is rarely more than a dozen processes, and most of the work is often performed by two or three of the processes, so the actual speedup when running on a multicore system is not linear. After 2008 or so your system won’t be able to take advantage of new hardware, it will continue to run at the same speed on newer nominally faster computers. One trick to speed up the system is to have several instances of the critical processes but if they weren’t designed for that it will probably be a lot of work to make it possible.

Windows style
A long time ago Windows wasn’t very process-friendly and the pipe implementation wasn’t very good either, so Pipes and Filters was not a good fit for the Windows platform. Dynamic GUIs that didn’t freeze up were important however. So instead of partitioning applications into processes, there was only one process with several threads of execution. In theory there is not much difference between a process and a thread; a process has its own adress space while a thread uses someone elses adress space. In practice they are used quite differently. Threads are more often transient: created and destroyed dynamically. Threaded designs usually make no attempt to map well on to the domain as the processes of a Pipes and Filter system, the threads are considered implementation details (on the other hand, this can be a source of strength: it means that we can have lots of threads). Also, threads are a constant source of headaches in a way that processes aren’t. One problem is that of synchronization. Usually a lack of synchronization in some unusual situation which makes the system crash, sometimes too much synchronization which makes the system freeze. In a Pipes and Filters system you get this synchronization for “free” from the operating system. A practical tip here: multithreaded applications absolutely must be developed on multicore machines. I’ve worked on no less than three different multithreaded systems developed on single-core machines. When deployed on multicore machines for performance reasons they all promptly crashed. The applications were under-synchronized, which wasn’t obvious on a single-core machine because the thread scheduler introduced some implicit synchronization: the threads were scheduled in the same order almost every time. On the multicore machines the scheduling became non-deterministic and things fell apart. This brings us to the big problem with threads: they’re just too hard. These systems were carefully designed and reviewed by very smart people. Ad-hoc thread designs require an inordinate amount of work the be successful, so to be productive we need structured approaches to using threads. Fortunately, there are at least two threading patterns that scale well and isolate the application programmer from the plumbing. It’s no accident that both are in use at Google, a company that relies on massive clusters of standard PCs. MapReduce (in its current form) originated at Google, distributed event systems have been independently invented elsewhere.

Distributed event systems
In a distributed event system, the entire system consists of event handlers. An event handler may generate a number of new events, each triggering an event handler. The potential for parallelism arises when a large number of events may be active in the system simultaneously, their handlers executing on different cores. This paradigm frees the application programmer from concerns about threads, synchronization and which handler should run on which core, but someone has to implement an event distribution framework which takes care of that efficiently. Such frameworks have been developed inhouse at Google and other companies, but as far as I know none are available commercially or in open source.

MapReduce
Anyone who has programmed in Lisp or other functional languages should be familiar with Map and Reduce. Map applies a function to each element of a collection, returning a new collection containing the results of the applications. An example could be applying the uppercase function to a string, returning a new string of the same letters in UPPERCASE. Reduce also applies a function to the elements of a collection, returning a single value. The canonical example is applying ‘+’ to a list of numbers, returning the sum of the numbers. So why is this interesting?

  1. It turns out that many algorithms can be expressed, elegantly, in terms of Map and Reduce (MapReduce).
  2. MapReduce can be efficiently implemented on a wide variety of multicore hardware.
  3. The application programmer using MapReduce does not have to worry much about concurrency, threads or synchronization. Leave that to the übergeeks implementing MapReduce.
In addition the implementations used inside Google, fortunately there are open source implementations of MapReduce available in Java and Ruby.

Conclusion
It is possible to design systems now that will utilize both current and future hardware efficiently, and still be understandable and maintainable.

Labels: , , , ,

0 Comments:

Post a Comment

<< Home