The Unix and CeePlusPlus
way of spelling DoesNotUnderstand
Most often seen as a result of attempting to dereference a broken or null pointer.
Also known as a SegFault
is the error which arises when a program tries to access a memory address to which it is not allowed access (if the hardware and OS provide MemoryProtection?
, that is). SegFault
is a UnixOperatingSystem
term; in the Windows world, the terms GeneralProtectionFault
The name derives from the fact that unix organises memory into virtual memory pages or segments, each potentially with its own read and write permissions; writing to a page marked read-only, or reading from a page that does not allow reading, will cause a hardware interrupt which is translated by the kernel into a software interrupt: a SegmentationViolation?
The word "segment" dates to the 1970s, when Unix ran on systems (e.g. the pdp11, and later, the 80286 and 80386) that did not have virtual memory paging, but did have a small number of memory protected "segments"; the two terms are sometimes used synonymously, other times they are used in contrast. When contrasted, typically all "pages" on a system have the same size (e.g. 8192 bytes), while "segments" typically are variable length. [See Note 1]
Also the term "page" tends to imply hardware supported virtual memory, that allows pages to be loaded on demand, in response to a hardware interrupt on attempted access (in which case no signal is delivered to the user process), whereas the older "segment" implied hardware memory protection but no ability to demand-load pages; instead entire processes were swapped in and out of memory.
Compare with "bus error", the software signal (standard value 7) that means that, although the memory page was potentially accessible according to its read/write permissions, the attempted memory access caused the hardware to complain with an interrupt. This occurs most often on a RISC machine when word-oriented memory is accessed on a non-word-aligned boundary. It can also happen when non-existent or buggy memory is mapped into the address space.
C programmers can spend their whole careers in environments where Bus Error never happens, since it depends on the nature of the machine and details of the kernel implementation. Also some Unix kernels arbitrarily decide to deliver all such hardware complaints as Segmentation Violations, so it's not guaranteed that one can get a Bus Error on any architecture.
- No they can't. Bus error also happens when getting an I/O error when paging in executable code.
Related signals that are translations of hardware conditions:
SIGILL - illegal instruction (e.g. cpu doesn't recognize opcode)
SIGTRAP - trap ("interrupt") from user space to kernel space (some CPUs have special instructions to do this, typically used to implement system calls; this can indicate e.g. a trap instruction not used by convention for a system call)
SIGIOT - IO trap : error during execution of IO instruction (not all CPUs have such instructions) (Some CPUs only execute IO instructions in "supervisor mode". When a user-mode program tries to execute such instructions, the hardware traps it, switches to supervisor mode, and lets the OS handle it. Some paranoid OS assume the program is up to no good and/or has started executing random data -- so the OS kills that user-mode program. Other OS that need to run legacy applications emulate the instruction and return to the program -- this allows me to run ancient legacy applications that were designed to print by diddling with the parallel port hardware, but now I can tell the OS to print to a network printer or email a PostScript
file. That's more than you needed to know -- sorry.)
SIGFPE - interrupt triggered by floating point unit; typically floating point divide by zero.
See /usr/include/*/signal.h for definitions (not descriptions), e.g. on Linux /usr/include/asm/signal.h
SegmentationFault is the way a static-typed system spells DoesNotUnderstand.
- More specifically, SegmentationFault is a way a type-unsafe language spells DoesNotUnderstand. In a typesafe statically-typed language like Haskell, you shouldn't see segfaults.
- Does Haskell have a "cast" operator? No. What's unsafePerformIO then?
- None of the above. It's the way the hardware spells DoesNotUnderstand. Some languages do their best to prevent this from happening, but a language implementation bug can allow SegmentationFault to occur in programs written in any language whatsoever. It's not so different than divide-by-zero, which hardware also dislikes, and which, again, some languages try hard to prevent. Granted, a bug-free implementation of Haskell should indeed prevent all segfaults (in the absence OS bugs and hardware bugs and cosmic rays etc etc), but consider that a bug-free assembler language program will also avoid segfaults.
- Hmm, the hardware spells TechnicalFailures?, but SegmentationFault is the way the OperatingSystem spells DoesNotUnderstandhandled? (by using the CPUs MemoryManagementUnit?). Hmm? You are saying the same thing as the first part of my bullet point immediately below. Why?
- Even more literally, SegmentationFault is the Unix OS way of representing hardware-DoesNotUnderstand, and can be thrust upon innocent processes--i.e. "kill -SIGSEGV process-id". The point of all of which is that this doesn't truly have anything to do with type safety after all; that's a conflation.
- It should be pointed out that type errors cause UndefinedBehavior, of which SegmentationFault is a common case. Type errors are certainly not guaranteed to cause a seg fault or other processor/OS exception; programmers are lucky when they do. Often, they silently tromp over something, doing damage which wont be observable by the user or by the environment for some time. When type errors do cause segfaults, it's only a (fortunate) coincidence.
- Agreed in spirit, but segv itself is not literally "undefined behavior". I know exactly what the behavior is. And I have been known to catch segvs and continue, under the right circumstances, because of this, so it's more than a NitPick. People in some camps break out in hives at the thought of UndefinedBehavior, but really, there is no such thing; it's a causal universe (even though causality is odd in relativistic and quantum domains).
- Segfaults and their causes are usually well-defined; agreed. Type errors are what I was referring to when I said "undefined"; and they are. Of course, your observation below is point on.
- Things that are called "undefined" are actually more like "the standards committee/vendor ain't gonna guarantee anything on the subject even in the present, let alone how it might change in the future, so you're on your own in that area". Well, what else is new. Systems programmers frequently have to deal with things that are supposedly "undefined". I like static type safety; I don't like "UndefinedBehavior" being treated like campfire stories about the boogeyman coming to torture naughty programmers.
- Of course, when you go from application software (which in many cases, strives towards "portability", even if portability means "runs on both Win2K and WinXP") into the systems domain, many things which are undefined at one layer become well-defined at lower layers. System-level ABIs have to frequently set in stone the things that language-defining standards bodies won't touch. When engaging in systems programming; there are many behaviors which you need to depend upon (and are dependable on a given system), which are "undefined" at the high, portable layer. A common one in C/C++ systems code is, "what happens when I cast an int to a pointer"? Applications programmers should, in general, never do such things (nor should their problem domain ever require it); but it's a natural and normal thing for programmers who work on the bare metal to do. At any rate, "undefined" behavior is bad at any layer; but behavior which is well-specified for a particular layer of the software stack isn't undefined when its use is constrained to where it's defined--the defined-ness of a given construct is context dependent. OTOH, there are things whcih even systems programmers should shy away from--such as uninitialized pointers, which are just as dangerous and useless in the systems programming domain as in the application programming domain.
Whatever happened to the comic strip http://segfault.org/
1. The term "segment" has a very specific use in the world of Intel X86 CPU architecture. X86 machines determine address access by applying two overlapping 16-bit (or larger) registers. The high order register determines the "segment" of memory that is being accessed, offset to the left by four (or more) bits. The low order register is then added to this offset to determine the final address.