Figure 3: The key-based approach to user-level DMA
Although the PAL code approach to user-level DMA is simple, it requires the host processor to be the Alpha processor. In this section we will describe a method to provide atomic user-level DMA without the need to execute uninterrupted code. The idea behind our approach is the following:
The DMA engine is equipped with several (say 4 to 8) register contexts. Each context has a source register, a destination register, and a size register, with the obvious meanings. Each context is mapped into memory address space so that the processor can access it. Distinct contexts are mapped into distinct memory pages so that each process gets access rights for only a single context. Each process that is allowed to start user-level DMA operations is allowed to write into one such context (the operating system divides the sets fairly and efficiently among competing processes). These registers are being used to keep arguments to the DMA operations for each process. Thus, if a process gets interrupted while starting a DMA operation, its arguments can not be mixed with another process's arguments, since each process has its own set of context registers to write its arguments into.
The idea sounds simple: each process has its own space in the DMA engine so that DMA arguments from two different processes do not get mixed as a result of context switch.
Unfortunately, a user-level application can not use regular load and store operations to access these registers and load them with the arguments of a DMA operation. Recall, that in user-level DMA, argument passing is done using shadow addressing - the address of the load/store operation is a shadow address, and is used as an argument to the DMA operation. Thus, a process that would like to pass a physical address to a register context, will pass the context identification as data argument of the store operation, since the address argument of the store operation has already been reserved to pass the shadow address. For example, to write the physical address that corresponds to virtual address vaddr, into a register context, a user-level process would execute the following instruction:
STORE context_id TO shadow(vaddress)The above instruction is interpreted by the DMA engine as follows: Extract the paddress from the shadow(paddress), and put it in register context context_id. Effectively, to start a DMA, a process makes a sequence of uncached store operations like the above one. Unfortunately, in this way, any user process will be allowed to write an address argument into any register context. To prohibit this erroneous behavior, along with the context identification, a key is passed in the data argument of the store operation. The key is given to the user process by the operating system. Possession of the key implies that the user process is allowed to write to this register context. Thus, a physical address is passed to a DMA engine as follows:
STORE key#context_id TO shadow(vaddress)The above instruction is interpreted by the DMA engine as follows: Use the physical address that corresponds to shadow(paddress), and store it as an argument in the register context context_id, only if the provided key matches the key stored by the operating system in the DMA engine, in memory locations un-readable by user processes.
Using the above instruction the address arguments of the DMA operations are securely passed to the DMA engine. However, one more argument needs to be passed: The size of the DMA transfer. This is passed using a regular store operation to the address that corresponds to the register context. Any store operation to any register within a context is being performed to the size register only, i.e. the user can not read/write the source, and destination registers of a register context using regular load/store operations, otherwise, (s)he would be able to start DMA from/to illegal addresses. Thus, although the register context is mapped in an process' address space, the process can only modify the size register of the context. A read operation from a register context returns the number of bytes that need to be transferred yet (-1 means failure, 0 means completed DMA operation).
A user-level DMA operation is initiated as shown in figure 3.
The first two STORE operations pass the physical addresses that correspond to the arguments of the DMA operation. The third operation, stores the size of the DMA transfer to the register context of the current user process. Finally the last operation initiates the DMA operation and reads the status result back.
The reader will notice that both address arguments are passed using store instructions, while in previous solutions, the source address argument was passed using a load instruction. This restriction implies that only processes that have both read and write access to the source address will be able to do user-level DMA operations from it. We believe that this is not a significant limitation. Most parallel and distributed applications that send data using DMA, have both read and write access to these data.
Another limitation of this method seems to be its probabilistic nature: a lucky user may ``guess'' a key and may start illegal DMA transfers. We believe that this is highly unlikely: In 64-bit architectures, there will be close to 60 bits available for the key field, which makes the probability of guessing correctly practically zero. It would be easier for a malicious process to guess the UNIX password of another user, rather than to guess a DMA key!