We have designed and implemented a prototype board to evaluate the performance of various DMA initiation algorithms. The board is plugged on the TurboChannel I/O bus of a DEC Alpha 3000 model 300 workstation. All the logic is contained in a single FPGA that is directly accessible from user applications via shadow addressing. The board runs at 12.5 MHz. For each DMA method we perform a simple test of initiating 1,000 DMA operations. Successive DMA operations were done to(from) different addresses, so as to eliminate any caching effects that intervening write buffers may induce. In the Repeated Passing of Arguments method, a memory barrier was used to make sure that repeated accesses to the same address were not collapsed in (or serviced by) the write buffer. Table 1 presents the (average) time it took for each algorithm to start a DMA operation.
We see that kernel level DMA costs close to 19 s, which is a little more than the cost of an empty system call on this workstation. Fortunately, we see that all user-level DMA methods perform about an order of magnitude better than the kernel-based DMA. Best of all methods is the ``Extended Shadow Addressing'', which takes a little more than one microsecond. This is as expected, since this method needs only two assembly instructions to pass all DMA arguments to the network interface. The other user-level DMA methods take 2.3-2.6 microseconds, which is also expected since they use twice as many accesses to the network interface.
We should mention, however, that our implementation is pessimistic, and user-level DMA can achieve quite better performance in modern systems, that use faster buses. The TurboChannel bus that we used runs at 12.5 MHz, while recent buses, like the PCI bus run at frequencies as high as 66 MHz.
Table 1: Comparison of DMA initiation algorithms.