Physical Address Extension
Old hands may already have groaned at the preceding heading. The means for a 32-bit operating system to use physical addresses above 4GB was built into Intel’s 32-bit processors well over a decade ago1 and has been supported by Microsoft since Windows 2000. If you haven’t heard of it, or haven’t thought that it applies to Windows Vista, then one reason may be that Microsoft advertises it only as a feature of the server editions such as Windows 2000 Server and Windows Server 2003, and only then for the more expensive levels with names like Enterprise and Datacenter. However, even Windows 2000 Professional can be configured, without contrivance, to access memory above 4GB by using Physical Address Extension (PAE). This is old technology. It’s also widely and deeply misunderstood technology, arguably more than any other in the history of personal computing.
The essence of PAE is that the 32-bit registers used by 32-bit instructions in a 32-bit operating system do not in practice address physical memory. This is because of very old technology called paging. The 32-bit register holds what is called a linear address.2 The processor translates linear addresses to physical addresses by looking through page tables, which are configured by the operating system. For the 80386 in 1985, each page table entry (PTE) was 32 bits and allowed for a 32-bit physical address. However, there is nothing fundamental to that. What’s fundamental is only that every linear address must either map to a physical address or be not-present. There is no reason at all that the linear and physical address spaces must be the same size. With a suitably different translation algorithm, the physical address space can be as big as Intel wants to allow. This theoretical point, which I expect was appreciated at Intel at least as early as 1985, got its real-world implementation in the P6 family of processors, beginning with the Pentium Pro in 1995. Since then, with only very few exceptions, Intel’s processors that are suitable for running 32-bit Windows all have enough address lines for accessing 64GB of memory and all support a translation algorithm for using all that memory in 32-bit code. PAE is this alternative translation algorithm.
The practical outcome for 32-bit operating systems in general is that although any one instruction can form addresses for only 4GB of linear address space, those 4GB can be drawn from anywhere in any size of physical address space. For Windows in particular, the design is that the linear address space changes for each process. In 32-bit Windows, a process’s user-mode code is allowed between 2GB and 3GB of linear address space (depending on the increaseuserva boot option), with the remainder of the 4GB being reserved for system-level code. Both 32-bit and 64-bit Windows can use all of physical memory, including above 4GB, but 32-bit Windows can give no more than 3GB to each application.
The difference between that and the “fully utiltise” in Dell’s fine print seems very fine to me, especially while I don’t have any real-world applications that need (or even use) as much as half a GB for each running instance. Until such software becomes common for ordinary use outside of specialised contexts, this difference from full utility does not of itself justify a rush to 64-bit operating systems—and certainly not of disturbing a working, trusted installation of 32-bit Windows Vista. If you have a program that uses memory by the gigabyte, then upgrading to a 64-bit version of that program to run on a 64-bit operating system is your only path ahead. If your concern is only that the system and all your applications may together use all your 4GB or more, then keeping your 32-bit operating system is at least an option for you—or would be, if Microsoft would provide you with license data to let you use the PAE support that Microsoft has already coded into the product.
PAE Is An Ugly Hack
Some commentators seem to have trouble grasping the naturalness of a physical address space that is larger than the linear address space. Perhaps they have been distracted by paging’s historical role as technology for dealing with a shortage of physical memory. Perhaps they have in mind the history of MS-DOS, which was kept alive for many years with ever more ways that programmers might write new code to access ever more memory than the basic 640KB.
PAE is nothing like that. It is no more a concern to any software than is paging. After all, it is nothing but a variant algorithm for paging. Just as hardly any software is concerned that linear addresses are translated to physical addresses, even less software is affected by how linear addresses are translated to physical addresses. Application-level code and even most system-level code is entirely unconcerned and unaffected. Except for the operating system’s memory manager and for the relatively few drivers that work directly with physical memory, e.g., for such things as Direct Memory Access (DMA), no 32-bit software needs any recoding to benefit from a more-than-32-bit physical address space.
Even kernel-mode drivers don’t need to know anything specific to PAE, much less be written specially to support it. All that’s required is a general awareness that physical memory addresses may be wider than 32 bits and that accommodation of this comes naturally from following the documentation. Far from being an ugly hack, PAE requires pretty much nothing of anyone. Indeed, to write a driver whose faults will be exposed by PAE, you actually have to work at it.
When working with physical memory addresses, device drivers need to do 64-bit arithmetic. This should be natural to them since Microsoft’s development kit for device driver programming has recommended it for well over a decade, including to define a 64-bit PHYSICAL_ADDRESS type for all functions that receive or return physical memory addresses.
For the particular matter of working with DMA, device drivers need to conform to the long-documented functional requirements for setting up and managing their DMA transfers. In particular, they need to be aware that the DMA functions may succeed only partially, and need to be called again to complete the request. The most significant, but not the only, reason for partial success is that the necessary double buffers could not all be set up. Double buffering is a technology for when a device cannot handle the full range of possible physical memory addresses. For instance, an old type of device (such as a floppy disk drive controller) may be limited to 24-bit physical addresses. To get data from the controller to physical memory above 16MB, the driver must use the DMA functions properly, so that the controller actually reads to a double buffer below 16MB and the DMA functions then copy the data to where it was wanted. Of course, most devices can handle 32-bit physical addresses and increasingly many can handle 64-bit addresses. Some drivers for 32-bit Windows assume that since all physical addresses fit 32 bits, their 32-bit device needs no double buffering. They then take shortcuts with their use of the DMA functions, especially to skimp on handling failures or partial successes. If these drivers are not fixed, then the use of physical memory above 4GB will expose the liberty that they have taken with the documented coding model. Note that if the device can handle 32-bit physical memory addresses but not 64-bit, then its driver needs to be fixed for 64-bit Windows, too.
PAE and Performance
Some commentators say that PAE comes with some hideous cost in performance. Compared with the original algorithm that maps 32-bit linear addresses to 32-bit physical addresses, PAE is slower. It has one extra level to its page tables. Each PTE is twice as big. The operating system therefore has more work to do when preparing and maintaining the page tables, and since the Translation Lookaside Buffer (TLB ) has only half the capacity, memory references are more likely to miss the TLB and require additional bus cycles. The reduction in performance is surely measurable. If you have no need to access memory above 4GB and are concerned enough, then you would not enable PAE. Note however that Microsoft itself does not regard this performance cost as worth troubling over (as will be clear shortly, under the heading Data Execution Prevention).
Anyway, for access to memory above 4GB, the appropriate comparison is not between using PAE and not, but between using PAE and using the native 64-bit algorithm. For this comparison, not only are the PTEs the same size but the algorithms are very similar. To the processor, it’s PAE that is slightly simpler and plausibly quicker, but the memory manager in a 64-bit operating system can benefit from using 64-bit registers when working with the PTEs. These are very fine trade-offs relative to the enormous overheads that embellish some of the wilder misunderstandings of PAE on the Internet.
For a rough-and-ready assessment of these trade-offs, consider Microsoft’s own performance measurement, as given by the Windows Experience Index. Surely this is meant to have some objectivity, even if comparison of ratings for 32-bit and 64-bit Windows may not be strictly fair. On this article’s test machine, the “Memory (RAM)” component of the Windows Experience Index is consistently 5.0 in 64-bit Windows Vista and is just as consistently 5.1 in 32-bit Windows Vista, even with PAE and the use of memory above 4GB.
Choosing PAE
Whether the memory manager in the Windows kernel uses PAE is configurable through the pae boot option. Indeed, 32-bit Windows Vista is supplied with two kernels:
* an ordinary kernel which uses 32-bit PTEs without PAE, and has no code for working with physical addresses above 4GB;
* a PAE kernel which uses 64-bit PTEs with PAE, and does have code for working with physical addresses above 4GB.
The two kernels are respectively NTOSKRNL.EXE and NTKRNLPA.EXE, both in the Windows System directory. The loader (WINLOAD.EXE) knows how to set up the linear address space for mapping to physical addresses with or without PAE, but each kernel is specialised to one algorithm for the mapping. The pae option tells the loader which kernel to load.
Data Execution Prevention
If you have a modern machine of the sort that manufacturers are fitting with 4GB of RAM, then you very likely are running the PAE kernel already. This is not so that you can benefit from PAE and use physical memory above 4GB, else this article would not exist. It is instead to give you what Microsoft calls Data Execution Prevention (DEP). This protects you from programs that try to execute data, whether in error or from (suspected) malice. The connection with PAE is that DEP depends on the Execute Disable bit that Intel has defined in 64-bit PTEs, such that DEP can only be enabled if PAE is also enabled. Because Microsoft wants you to benefit from DEP, the typical practice of Windows Vista is to select the PAE kernel if you haven’t specified that you want it and even if you have specified that you don’t want it. (If your machine supports DEP, then a necessary condition for disabling PAE is that you also disable DEP by setting nx to AlwaysOff.)