Self Modifying Code

A PathologicalCase? of SelfReference. (Hmm, maybe that's stretching the term SR too far?)

Self-modifying code is a piece of software which achieves its goal by rewriting itself as it goes along. It is almost universally ConsideredHarmful now, but it used to be a neat trick in the toolbox of programmers of small or old computers.

But hang about, people work by modifying their own code!! Isn't it going to rather hamstring the development of Artificial Intelligence if we prohibit SelfModifyingCode. -- DavidElsdon?

We're self-modifying, sure, but AreWeCode?


Some older computers (circa 1960) had subroutine returns that worked by changing an address in an instruction. Thus it wasn't a trick but an essential part of the architecture. The architecture was, needless to say, "quaint" by modern standards -- RobertField

It was also used to step through an array, by changing the address part of a memory reference instruction inside of a loop. This was done in the Edsac before it got index registers. The PDP-1 had an instruction for overwriting just the address part of a word in memory from the address part of the accumulator.

Now it is mostly used for fun, or to confuse people.

Usually, the term refers to code which causes the compiled instruction stream for the underlying processor to be mutated; it does not (usually) refer to any of the following:

But DynamicCompilation is SelfModifyingCode - for a suitable definition of "self". Or must the compilation be sufficiently advanced (read: generic)? I think SelfModifyingCode carries the air of an ad-hoc solution, a hack. But a) that need not be so and b) the principle is the same: write an instruction, that is executed by the same executable later (by the way, executable is a ver highlevel concept to use to define such a simple concept).

Examples

Someone suggested a TuringMachine (presumably meaning a UniversalTuringMachine). I don't think this need be an example; the "code" of a TuringMachine is in its state table, which can't be changed. If you use this TuringMachine to emulate another one, I don't think there's any reason why you can't do so without separating "code" and "data".

Some old games used self-modifying code to confuse disassemblers and make it harder to disable copy protection.

It was commonly used as part of program initialization on the BbcMicro to save memory, although strictly speaking there isn't much actual modification to "shuffling yourself down to overwrite the disk filing system storage area". Other tricks included deleting from the BASIC part of the program the assembler code, once it had been assembled to machine code.

The practice was knocked on the head somewhat, at least in the AcornComputer? timeline, mid way through the AcornArchimedes machines. The early ARM chips handled self-modifying machine code without any problems (VonNeumannArchitecture), but the StrongARM upgrade with its HarvardArchitecture broke the assumptions made by many programmers. (argh, memory fog! What was the problem the ARM3 had, caused by caching?)


Someone suggested:

Variables that are set and changed for the purpose of controlling the behavior of code are commonly called flags. FlagsAreSelfModifyingCode.

That is NOT self-modifying code. Flags control which, pre-existing code gets run. They do not modify the pre-existing code. They do control behavior, but it is possible to usefully distinguish running immutable code with varying behavior due to varying inputs vs. direct mutation of the code.

Machine code controls which pre-existing behaviors of the CPU get run. How is changing one different from changing the other?

Huh? Neither of those is necessarily SelfModifyingCode. If you change one or the other, it was clearly not a self modification. For machine-code, SelfModifyingCode would be machine-code running on the CPU that reaches into its own in-memory or on-disk representation and tweaks that representation. And, even in the case of FPGAs and transmeta toolsets where you have a CPU that is modifiable, it generally isn't self modifying.

In many situations commonly accepted as SelfModifyingCode, the machine code being modified is a different part of the program than that doing the modifying.

But still the same program, right? i.e. there is already code in the running program that is reachable and configured to execute to the location being modified. SelfModifyingCode isn't the same as runtime compilation or dynamic link loading.

Still, for the sake of argument, going by your definition, what's the difference between a flag that instructs the program to modify the flag doing the instructing and a machine code instruction that instructs the CPU to modify the machine code instruction doing the instructing?

The answer to that is the same as the answer to: What's the difference between flags and code? Code is for describing and composing behaviors. Flags don't offer the latter capability as, by nature, flags cannot reference one another. Additionally, flags lack the former capability; the only 'behavior' they can describe is whatever primitive behavior is imbued into them: every flag is a new primitive. Flags don't constitute code. Code may be data, but not all data is code.


Game over

The HarvardArchitecture, at least as the term is used here, has separate caches for data and instructions. This means that modifications to code just about to be executed may not yet have been written to main memory, or may be obscured by an older copy in the instruction cache. (The x86 processor is a notable exception to this). In addition, many operating systems disallow execution of user-writeable pages (and conversely disallow user modification of the CodeSegment?); any attempt to create SelfModifyingCode on such a platform results in a GeneralProtectionFault.

Sudden changes like banning self-modifying code make backwards compatibility difficult in some cases. They generally seem to spawn numerous ViciousHacks? to provide the backwards compatibility at some other cost. NamelessConcept: employing horrible kludges in the name of backwards compatibility. All systems I've seen have done it. Win16 --> Win32 is not one I have suffered from directly, but I hear many complain about it. ShoehorningCompatibility?? Backwards contortability? CompatiblityBarnacles??


There are, of course, cases for SMC. I wrote a program several years ago that would run on an 8080/8085 and, naturally, the Z80. There were places in the code, however, where using the Z80's LDDR/LDIR and other high-speed block instructions was a desirable thing, if available.

Part of the program's startup procedure was a bit of sword swallowing to determine which flavor of CPU was in use and, if it was a Z80, patch the jump/call vectors of certain routines.

A similar technique was used to patch the jumps/calls to certain video handling routines based on the kind of display.

The rationale was simple: don't continually evaluate things whose state can be known at the start and which will not change during execution; instead, pre-determine such execution paths and let the thing just run.

Exactly. This can be one way of implementing OnceAndOnlyOnce. My example was the inner loop of a TextEditor search routine. It turned out that forward and reverse searches had inner loops that were identical but for a couple of machine instructions. It ended up saving code and duplication to have a single search routine and copy in the appropriate instructions when the forward or reverse function was called. Of course, this is serious micro-optimization; I only bothered because this was written in AssemblyLanguage already, and it was meant to be a "twee" editor (< 4K). -- IanOsgood


To lower the maintainability cringe-factor, this sort of stuff is only done in extreme cases these day. It's hard enough to debug a normal program, but debugging a self-modifying program can be plain impossible.

A notable exception to this trend, however, is JustInTimeCompilation. The JIT can simply create different code depending on factors that are constant at runtime. While this is different from all-out SMC, it still gets you a lot of the same benefits.


From coding on the Atari 800 up until DOS TSRs, I wrote dozens of self-modifying programs. This was occasionally intentional.


CategoryEvil

EditText of this page (last edited July 18, 2014) or FindPage with title or text search