Single Address Space Operating System

In a Single Address Space OperatingSystem (SASOS) all processes run within a single global virtual address space; protection (if provided by the OS) is provided not through conventional address space boundaries, but through ProtectionDomain?s that dictate which pages of the global address space a process can reference. This operating system organization is facilitated by microprocessor architectures that support 64 bits of virtual address space and promotes efficient data sharing and cooperation, both between complex applications and between parts of the operating system itself.

Relevant papers:
"Unlike the move from 16- to 32-bit addressing, a 64-bit address space will be revolutionary instead of evolutionary with respect to the way operating systems and applications can use virtual memory. Consider that 40 bits can address a terabyte, two orders of magnitude beyond the primary and secondary storage capacity of all but the largest systems today, and that a 64-bit address space, consumed at a rate of 100 megabytes per second, would last five thousand years."

"Large address spaces remove a basic assumption that has driven operating systems since the 1960s, namely, that virtual addresses are a scarce resource that must be multiply allocated in order to supply each executing program with sufficient name space. The small address spaces of current architectures cause most operating systems to execute each program in a private virtual address space. While private address spaces facilitate protection, they interfere with sharing and cooperation between applications, because virtual addresses in shared data are ambiguous; that is, their interpretation depends on the process using the address."

Examples of SingleAddressSpaceOperatingSystems with protection boundaries:

Examples without:

Single Address Space is a low-level implementation detail

Why? Because as a high-level concept, it is a NameSpace which has the following properties:

Obviously, this violates all the NameSpace ideals. How so?

Flatness means that you can run out of space before the namespace is exhausted. For instance, you can run out of contiguous space very easily. In a hierarchical space such as a filesystem, you can never run out of space because you just create it as you need it where you need it.

Finiteness means that you do run out of space no matter what. Now, some idiots quoted above claim that 64 bits should be enough for everybody (where have we heard that before?) but in fact 64 bits is awfully small when you need to use the first 32 bits to denote the IP address of the machine the data is on.

Why would you need to do that? Because at a high level you must deal with distribution, and deal with it in a uniform manner, which means that the mechanism you provide to address remote data must be the same mechanism as that for local data; memory-mapping. And of course, the namespace must be globally consistent. Using IP numbers is the simplest solution.

Finally, a 64 bit address space is implementation specific. Notice how the first 32 bits are IP addresses. This means that it isn't human readable or abstract. It is a distinct CodeSmell when you have implementation specific details at the highest levels of abstraction.

It would also be a problem when everything has to be converted to 128 bit IPv6. At that point, you need two cells for the IP, and a cell for the on-machine address (may as well use a full 64 bits). At that point, it may be reasonable to use one more (to keep alignment even) as a mask into a 64-bit minimum addressable unit instead of addressing each 8-bit byte with 256 bits of address info.

What properties should a NameSpace have at a high level?

Question: Is inclusion of an IP address as part of virtual memory really part of the referenced documents or is this an ad hoc extension? A hard-coded IP address would seem to violate the concept of a virtual memory space.

It's an extension, since I don't think the OSes above tried to provide a globally consistent namespace (shame on them!). But consider that anything but an IP address would violate OAOO, replicating the DNS and internet name assignment authorities. And those are more odious violations of common sense than hard-coding IP addresses.

It seems to me, however, use of an IP address is a physical map not a virtual map. Per the above scheme, when I exceed the virtual memory in my machine, the next byte of memory I try to access would come from my IP address + 1. Also, note that IP address space is not fully populated, some addresses are not valid by design, and some addresses (i.e. broadcast) overload multiple machines. It also appears that requiring every machine expose 32 bits (no more, no less) of virtual memory per task is probably not feasible. Although the idea of allow virtual memory to cross over to multiple machines is interesting, I don't think requiring fixed size memory segments of every machine nor assuming contiguous IP addresses is valid. Underneath the virtual memory space one needs to providing mapping to a set of machines and each of these machines may expose a different size memory segment for use. I'll leave the discussion of how to do a task switch to later.

I'm gratified you find memory-mapping interesting. Myself I find it deadly dull. A trivial implementation detail of no relevance to anybody. Some points:

Memory mapping has no relevance to anybody? That statement strikes me as rather obtuse if not downright ignorant - the sort of grandiose "I don't care about the details" argument that an ArmchairArchitect would make.

Is a SingleAddressSpaceOperatingSystem a requirement for OrthogonalPersistence? It seems, at least, that it would make it more convenient.

No. Single-address space is a requirement for absolutely nothing. Nor does it make anything more convenient at a sufficiently high level. Single address space is a patently low-level idea that has zero influence at a sufficiently high level. Especially with transparent distribution thrown in to gum up the works.

All a single-address space enables is the use of physical pointers in place of logical pointers. So instead of having process1234:memory_element1235, you have simply memory_element12341235. IOW, single-address space makes it easier for OS and application designers to screw up by conflating two separate levels of the OS. It allows someone to write an application that, for example, can only run on one particular machine at a specific location in memory of a vast networked system. Single-address space provides programmers with more rope to hang themselves with. And for what? Because in theory it's supposed to be faster.

Such anger! I note in passing that "transparent distribution" is probably an oxymoron and at the very least an AlarmBellPhrase

Executing on a remote CPU should be as simple as executing on a particular one of the local CPUs. Accessing remote data should be as simple as accessing local data from a particular module or component. In a well-designed system, distribution is so non-special as to be a non-event; an event indistinguishable from all the other common events. That's what I call transparency.

If you mean something else, such as the idea that a process shouldn't be able to know where it's executing, even when it cares. That abstractions should be provided which make lower-level details opaque, then that's because you're ignorant of the subject matter. Something is not transparent just because you hide it behind an impenetrable brick wall and call it transparent.

And I have seen the kind of mistake described above. The guy started out by explaining that since it was his master's thesis, he wasn't concerned with speed or efficiency. Then he conflated the physical (hard disk) pointers with the logical (log segment) pointers. In the end, he had all kinds of trouble because his logs couldn't be backed up and restarted on another disk or machine.

But to reiterate; while single address space is good because it fits the ExoKernel principle of OS design, it remains an entirely low-level detail. It is irrelevant to the high-level programmer, thinker or OS designer.

Nor does it make anything more convenient at a sufficiently high level -- Funnily enough, there is no level of abstraction, at least on any computer with which I'm familiar, at a sufficiently high level that good performance isn't a convenience, and all other things being equal, SAS should always perform at least as well as a multi–address-space system, by virtue of the fact that no work is cheaper than some work. Even disregarding this point, things like object persistence and data sharing are a hell of a lot easier to reason about in a single address space. Ideally, this wouldn't make any difference at your "sufficiently high level" of abstraction, but in practice, features tend to be a bit more convenient when the guys who are responsible for curating the low-level facilities underpinning those features have a snowball's chance in hell of reasoning about them.

Many RealTimeOperatingSystems also use a single-address space architecture, such as VxWorks. Of course vxWorks has nothing resembling protection domains (the traditional version, not vxWorks AE, which does have them) - essentially in vxWorks, everything (including the kernel) is one giant incestuous process.

Yeah, I don't like all the qualifiers on the definition at the top of this page. MicrowareOsNine used a single address space. All programs had to be position independent so they could be loaded anywhere. If one program wrote into another programs memory, that was just tough.

Uhhh... what happens when the transitive closure of dependencies of a program is millions of lines of C? The program crashes a lot. You'd be better off forcing everything in the OS to use a capability language if you want any sort of robustness.

A RelationalWeenie perspective on this is that OS memory management is (or should be) basically a database technology issue. The data "slots" in a database are usually virtual. Such a "cell" could be indexed by program name, subroutine name/ID, instance ID, variable name, etc. This would be slower than current approaches perhaps, but indirection and flexibility often cost CPU cycles.

Thankfully, it has been 25 years since I have seen programs that don't fit into physical memory. There are known operating system techniques to handled program overlays, so I would stay away from using a database for program segments. Virtual memory is usually driven by data quantity instead of code quantity, and current database technology probably provides a better solution for many applications. It may not be appropriate for large, "flat" data elements like large word processing files or graphic images, though. Also, database caching might solve some of the performance issues, but may restrict the use of stored procedures. If the performance issues are resolved, it may not be necessary to keep large sets of data in the memory space.

Wow, so much ranting here. There seems to be a lot of confusion over what a single address space system might look like, and how it might be implemented. There are certainly architectures that do not have the problems of the single *physical* address space system strawman created above.

For example, the forthcoming "Cell" architecture by sony features a single global distributed virtual address space. These virtual locations are cached in a given systems local memory with a MOSEI style protocol.

Now, if we chose to be positive and creative, we could combine these ideas with some from the 10 year old single address system research. A distributed system with a single, non-reusing, virtual address space might be very interesting for some applications. A single distributed system image with inherent multi-versioning?

Maybe I'm not enough of a pessimist, but I think there are some interesting ideas under this topic, and it doesn't deserve to get blasted just because someone believes there's only one valid conceptualization of a namespace.

Some of the more offensive ranting has been edited.

A single address space for an operating system - including all applications - is feasible if we use a memory-safe language for both the OS and the apps. Such an approach has been pursued in Singularity and Midori, but hasn't really made it out of the labs.

Applications in such an OS would be a bit different: instead of compiling down to x86, you'd compile down to some memory-safe intermediate language or bytecode. The OS could then compile it the rest of the way, into native code, and cache the result. The OS as a whole would be bootstrapped by this process, perhaps modulo a few special modules that are hand-written in assembly.

A lot of hardware-level security (TLBs, rings, full context switches, etc.) could be eliminated in such a system, resulting in simpler hardware and simpler, richer, more efficient IPC.

So, can any of the gurus tell me if SGI IRIX on Origin qualifies as a SASOS? ~~~~ 2014.04.08

One could literally hide physical memory in the 64-bit space and it would never be found. This opens interesting security questions. ~~~~ Menachen was here


EditText of this page (last edited April 9, 2014) or FindPage with title or text search