Popular contemporary computing environments are comprised of powerful workstations connected via a network which, in many cases, has a high throughput, resulting in systems called workstation clusters, or Networks of Workstations (NOWs) . The availability of such computing and communication power gives rise to new applications like multimedia, high performance scientific computing, real-time applications, engineering design and simulation, and so on. Up to recently, only high performance parallel processors and supercomputers were able to satisfy the computing requirements of these applications. Fortunately, the development of superscalar RISC processors increases the computing ability of modern workstations and microcomputers significantly. At the same time, recent improvement in high speed link technology has lead to the development of communication networks that sustain bandwidth in the order of Gigabits per second (Gbps). To allow fast processors to make efficient use of all the available bandwidth, several user-level memory-mapped network interfaces have been developed [2, 5, 9] and manufactured [7, 4, 14]. Most of these interfaces use Direct Memory Access (DMA) operations to transfer data from one workstation to another. DMA has been heavily used to transfer data between (fast) main memory and (slow) magnetic disks to free the host processor from the burden of transferring the data itself.
DMA management has been traditionally done by the Operating System kernel. The Operating System is the only trusted entity that is allowed to access DMA registers. User applications are not allowed to initiate DMA operations by themselves. There are two reasons for the necessity of the Operating System involvement in starting a DMA operation in traditional systems:
In the previous decades, since the overhead of the operating system involvement in the initiation of a DMA was small compared to the DMA data transfer itself, no attempt was made to allow user applications to start DMA operations. However, in contemporary fast local area networks, starting a DMA operation from inside the operating system kernel may take more than the network transfer operation itself! For this reason, several researchers have started to address the problem of letting user applications initiate a DMA. Pioneering work in the SHRIMP  and FLASH  projects have pinpointed the importance of user-level DMA operations and have proposed initial solutions to user-level DMA. Unfortunately, these approaches to user-level DMA require modifications to the operating system kernel. To function correctly, both mentioned approaches modify the operating system context switch handler, in order to enforce atomicity of user-level DMA operations, and avoid race conditions. The SHRIMP approach requires that the context switch handler aborts all half-started DMA operations  (so that no race condition may happen), while the FLASH approach requires that the context switch handler informs the DMA engine about the identity of the running process at context switch time, (so that the DMA engine has enough information to avoid race conditions). Although a few lines of code to the context switch handler seem a trivial change, they may turn out to be a major obstacle to the success of user-level DMA for the following reasons:
In this paper we propose several solutions to the user-level DMA problem that require no modifications to the operating system kernel. Two of them are novel, and the other two are elaborations of our older designs. Our methods allow user applications to securely and atomically start DMA operations from user-level without needing to change the operating system kernel.