Separate Io From Calculation

A special case of SeparationOfConcerns.

Problem: Methods are not reusable because they have too much responsibility. In particular, they do IO and they also calculate, so whenever you want to do one of those things, you end up doing both.

Solution: Separate IO and calculation as much as possible. Methods which do either IO or do a calculation are considered reusable. Methods which do both are not considered reusable, because whenever you want to do one of them, you end up doing both..

Can we have someone finish up the pattern template?

Related patterns: ModelViewController, ModelDelegate


SeparateIoFromCalculation is a rule you can't follow all the time. Somehow, you must glue together I/O and calculation. For better reuse though, having functions that do I/O and functions that do calculation is better.

Example: You have a method that calculates sqrt and prints sqrt at the same time. It is called from several thousand places in your program. Then you need to calculate sqrt but you don't need to print it. Or you need sqrt of the same number several times, but you don't want to calculate the expensive operation several times.

It is better to separate this method into calculateSqrt and printSqrt. printSqrt is so similar to print that probably it is better to just use print. Then calculateSqrt() can be simply called sqrt().


Some of the SeparateIoFromCalculation definitions on this page are naive approaches that only work with data sets that always fit in available memory. They fall apart when you can't read and write all the data at once.

Given the most recent responses above ("then you will have more time to deal with such issues", "rule of thumb"), we may have finally reached violent agreement on the matter - if that truly is the summary.


The above seems to arguing on different LevelOfAbstraction. By focusing on a single implementation, reading to memory then calculating, one may miss the point is to separate by code module and not by time. The use of ContinuationsAndCoroutines, streams or other forms of LazyEvaluation could work around the memory issues here. -- TylerMac


In the OO world, a similar separation is applied on methods: Some are mutators and others are inspectors. Creating a mixed method is considered bad practice, since then it becomes harder to use and reuse.

Consider, for example, a class used for creating XML documents. You can add new nodes (addNode) and you can obtain the resulting XML (getXml). If getXml() performs the document construction, it is underperforming and probably it won't maintain correctly its invariant, unpleasantly surprising their users.


This is probably the most familiar abstraction in computer science, right before DataStructures (AbstractDataTypes).

The idea is that I/O must not be performed by the same methods/functions that perform calculations.

The most basic example of this is the original BASIC keywords LET, PRINT and INPUT. Then when you wrote your first structured program in Pascal to calculate the square root of a number, you could functionally decompose it in: Input number, calculate root of that number, print number.

Is this a DesignPattern?

There are many ways to decompose the program above:

  1. Write all in one big method.

  2. One method loops, another inputs the number, and another calculates and outputs.

  3. One method loops, another inputs the number and calculates, and finally another outputs.

  4. One method loops, another inputs the number, another calculates, and finally another outputs.

Only number 4 is as correct as possible according to SeparateIoFromCalculation.

Also notice that in number 4, the method that loops does I/O and the calculation. It is considered better only because it has less methods which break this rule.


This page promotes a process decomposition. The example given is also known as a ReadEvalPrintLoop. A variation on this is known as the MasterControlProgram, or MCP, which is cast as the central evil by AlanKayIsTron because over-application of this pattern leads to heavily moded software. See TheEnd. An alternative decomposition is as objects that can read and print themselves, or the spreadsheet that decomposes calculations as a sea of cells that are directly viewed and manipulated.

See NakedObjects.


The problem is that sometimes it is more code to keep them separated.

One shot:

  get x
  for each i in x
    alter i to j
    print j
  end for
Separated:

  get x
  for each i in x
    alter i to j
    store j in y
  end for
  .....
  get y
  for each j in y
    print j
  end for
Can you please change this example so that actually it makes any sense?


The code above is the wrong way to separate the code because you transformed 1 loop into 2 loops. At first glance it may look correct, but it is performing slower and using additional storage. Besides the first function does IO and calculation, while the second does the same, it is only separated by a blank line. That is not what I was trying to imply. SeparateIoFromCalculation is about a function or method doing either IO or doing a calculation.

More code doesn't means worse. In the above code, where IO and Calculation is done in one method, it is less clear about the purpose of the method. When you see "get x" in a line of code, it's name doesn't tell you any thing. You could change the name from "get x" to "getAndPrint x". But what would you do if you only want to print all the value in x? You can't call "get x" because that would alter the content of x. So What would you do? The solution you use in separate method solution is not the correct way. The purpose of the original method is to "modify and print" so you should have

 modify x
 for each i in x
      alter i to j
 end for

print x for each i in x print i end for
I assume the modified i is stored back to corresponding place in x. If not, it means you probably do that calculation just to do the output formatting; that way, the calculation may be ok. Or you could have done

 print x
 for each i in x
       print format x
 end for

format x return altered x
What if we wanted to save the altered thingies, but also print them?

  get x
  for each i in x
    alter i to j
    put j into x replacing i
    print j
  end for

I'm quite unconvinced that 'saving' a value is fundamentally different from I/O. It is just output to memory rather than to the UI. However, saving a temporary, where that temporary is used and discarded as a purely computational side-effect (a side-effect of calculation), would qualify as different. If, however, you're provided an address or reference to someplace to save data, then requesting or sending information from or to that address (whether it be to a printer or elsewhere in memory) is definitely I/O.

SeparateIoFromCalculation, taken fully, means ReferentialTransparency for calculations... since the calculations neither receive input (except to initiate their computation) nor provide output (except to report their result).


The profit of SeparateIoFromCalculation is that the calculation logic is independent of the IO logic or the data source. For example, there is sin(double) function that is used to calculate sin value of a double. What if its signature is changed to sin(File) which can calculate the sin value of the first 4 bytes double from the file?

Separate IO from Calculation makes your code more reusable. usually the IO/GUI code changes when you are polishing your software (improve program looks, support new format). As long as your calculation code is separate from IO code, you can be sure the calculation is always correct, whether or not you change the IO part.

Surely not all calculation can be separate from IO. For example, how would you write FileWriter? or ExcelFileFormatReader? if you shouldn't be interacting with the file? The point is to keep the IO code to the low-level class. Only the class then really needs to know its source of input.

The other alternative would be to write a lot of calculation functions without IO. Then when IO is needed, upper level functions would link them together.

It has been my experience that the first choice (low level objects print themselves) gives you better ObjectOriented systems, in which you really do not care about the IO. All IO is resolved in lower level classes: UI, database, etc. The program is just a bunch of rules. I would say construction rules. For example, to get a doctor you need a person that has a medical "speciality". So you can either go get a "speciality", then the list of doctors appears (persons), the user selects in the UI which doctor he/she will prefer. Or perhaps you are inputting a new doctor in which case you select its "speciality" and then add the person related information.

The second choice (upper levels in the system connect the model and the UI) is exactly like StructuredProgramming. This leads to systems that are very hard to change. -- GuillermoSchwarz


I know that by now this page is really a DeadHorse?, but I think I'll beat on it a little more.

I was astonished that there was even any debate on separating I/O from calculation, but then it occurred to me that most of us have never programmed in AssemblyLanguage.

To someone coding in assembly, the question, "should I mix the I/O with the calculations" would never be asked: it's absurd on its face. Perhaps the most general thing that is immediately apparent is that, while calculations are essentially abstract concepts, I/O is (in lower level languages) quite married to the platform of the moment. Consider:
  LOAD POINTER REGISTER WITH LOCATION OF BUFFER
  LOAD ACCUMULATOR REGISTER WITH "FETCH" API CODE
  CALL OS API LOCATION  ; OS returns number of bytes in COUNTER REGISTER
  JUMP TO BOTTOM OF LOOP IF COUNTER REGISTER ZERO
  TOP OF LOOP
  LOAD BYTE REGISTER WITH BYTE AT CURRENT POINTER REGISTER
  (do something useful [calculation] with the byte)
  INCREMENT THE POINTER REGISTER
  DECREMENT THE COUNTER REGISTER
  JUMP TO TOP OF LOOP IF COUNTER REGISTER NOT ZERO
  BOTTOM OF LOOP
so, where did the byte come from?

The OS owns the I/O for this routine, and we don't want to have to know how it does that, we just want the bytes to arrive in our buffer.

In those instances where we *must* write an I/O driver to fetch or put data, we certainly don't want the code that does the meaningful calculations tangled up in it!

Now, in a higher level language, since "Boy, bring me another byte" handles I/O as a fairly abstract concept, we may see no real need for separation but, trust me, you want the nuts and bolts of I/O done away from your pure-bred algorithms.

Now, don't get me wrong, I have nothing against I/O routines as a whole - heck, some of my finest code is I/O routines - but I wouldn't want one to marry my data.

-- GarryHamilton


I/O can't be separated from calculations. I/O is just one form of calculation.

I/O is necessary for computation, not calculation. Calculation is transformation of values, and is performed via computation.


A better way to describe this may be to move from this:
 loop {
    x = highRiskRead()
    process(x)
 }
To this:
 loop {
   x = highRiskRead()
   insertIntoStructure(myStruct, x)
 }

loop { x = getNextItem(myStruct, ...) process(x) }
This can simplify error-handling because other than running out of disk space (assume data structure is cached), we should not have to do heavy error-handling in the second loop because we can just conclude that something is too messed up if there is a problem and suddenly stop without worrying about cleanup. However, the first approach may need complex error-handling to undo half-done processing because we know the IO is full of risk. I find error-handling is just simpler when high-risk processes are separated from low-risk processes. Ideally, all errors should be handled gracefully, but I find that complex error-handling for rare systems-related problems is not worth the code volume and tough to test, so displaying an error message and closing down is often sufficient. For example, I don't put disk-full error-handling on every structure "insert" command because if the disk is full there are probably far worse things to worry about. It is usually not an application's job to monitor disk or cache space, except maybe in life-support systems or bulk-load batches.

Ideally, a language would have a central handling routine for system errors so that we can give a message and perhaps write to a log if there are problems, but otherwise don't worry about graceful recovery. If task-specific graceful recovery is needed, apply specific handling code to only those critical sections.

It might be true that in resource-sensitive domains that separation may be too costly because it tends to require an intermediate buffer structure. It may be a trade-off between developer productivity and machine productivity.

-- top

The 2nd example above is a good example of an "unbounded memory" algorithm. There's no upper limit to the memory it will consume. It might not exhaust virtual memory, but you can't reliably predict how much memory it will use. That means you can't predict how many instances can be hosted on the same machine or what its impact will be on other processes. This kind of algorithm doesn't scale gracefully and should be avoided.

The preferred alternative is to use a bounded memory algorithm that solves the same problem. The 1st example above might be one of those.

 startTransaction
 loop {
    x = highRiskRead()
    process(x)
 }
 endTransaction

 loop {
    startTransaction
    x = highRiskRead()
    process(x)
    endTransaction
 }
Even if you do all of the reads before processing and all of the writes after processing, you still need to do them inside a transaction. It's never OK to leave the system in an inconsistent state, no matter when you do your I/O. If you interleave them, you can (depending on the requirements of the processing and other parts of the system) use smaller transactions that cover single iterations instead of the entire loop.

This is the standard model for steps in a workflow, or requests in a server. Start a transaction, wait for a message, read the message, process the message, send zero or more messages, commit the transaction. This allows you to pipeline operations and distribute I/O & CPU bandwidth between multiple processes.

The goal is not to outright eliminate transactions and I/O. It is to simplify them and/or "bulkify" them in order to reduce the need for complex escape conditions and error handling. It simplifies the portion of the code that does the I/O. Of course, there are always exceptions to rules of thumb.

    sub generic_source_loop(&risky, :&onErr:($)) {
        loop {
            try {
                risky();
                CATCH {
                    onErr($!);
                }
            }
        }
    }
    sub do_calculations(*@input) {
        map {prosess $^x} @input;
    }
    do_calculations generic_source_loop { highRiskRead() };
Tadaa!


In ObjectOrientation, this is about separating MutatorMethods from InspectorMethods, CommandQuerySeparation.


I've encountered a situation where SeparateIoFromCalculation and data-driven-programming (perhaps a variation of TableOrientedProgramming) seemed to reduce the need for HOFs (HigherOrderFunctions). I had this domain-specific report presentation engine. It provided formatting and did optional row and column totals. However, each report cell needs custom formatting for different usages (instances) of the report engine. Either I copy-and-paste the report engine for each instance to customize it, or I find a nice way to apply situation-specific formatting rules to the cells. A bunch of domain factors could be involved in how a cell is formatted. One approach I started with went something like this:

  function reportEngine(dataStruct, cellDisplayHOF, ...) {
     ...
     while (R = rowLoop(...)) {
        while (C = columnLoop(...)) {
           ...
           cellDisplayHOF(dataStruct[R,C].value, factorA, factorB, factorC, etc...)
           ...
        }  // next C
     }  // next R
     ...
  }
The problem is that the report engine had to track and pass all the situational factors that could affect cell formatting, such as cell colors, alignment, value formatting with things like "$123,456.00" for some dollar amounts, etc. The situational logic would go something like, "If this is in region X and is one of the special product categories specified by person Y after date Z, then make the cell red".

Some factors that affectted formatting came from parameters passed to "reportEngine", and some were attached to the data structure because they were more cell-specific. Different situations would use different factors such that for consistency I had to assume the widest number of factors. I was shuffling around too much info through the report engine. It was a middle-man that shouldn't have to care about all this situation-specific info.

I decided that a better solution would be a data-driven one that separated the format calucation from display. A structure/table like this was more useful:

  struture: reportCells
  ---------------
  rowID
  columnID
  rawValue  // needed for totals calcs
  cellFormatString  // TD attributes
  displayValue  // may have formatting, such commas in big numbers.
One routine generates the reportCells table/structure, and the other simply displays it. The display routine no longer has to care about domain factors that may affect the display. Here is a rough idea of how the display portion works:

  function displayReport(reportCells, ...) {
    ...
    while (R = rowLoop(...)) {
      print("{tr}");  // HTML row start
      while (C = columnLoop(...)) {
         print("{td %1}", reportCells[R,C].cellFormattingString");
         print("%1{/td}", reportCells[R,C].displayValue);
      }  // next C
      print("{/tr}");  // HTML row end
    } // next R
    ...
  }
  // (Braces used in HTML tags instead of angle-brackets to not confuse wiki.)
The "computation" of the cell formatting no longer has to pass through the display portion. It is already done in a prior step. (This is an oversimplification, but gives an idea of what is going on. For one, the data structure may have had other columns used in other calculations and stuff, but it didn't matter because the display routine ignored extra columns.)

The downside is that some of the looping structure is duplicated in the format calculator and the display portion for many usages (but not all because the order of calculation depends on usage-specific issues). But it is worth it.

In this case, use of data-driven programming and SeparateIoFromCalculation avoided the need for HOF's, and simplified the app.


See also SeparateDomainFromPresentation, AvoidExceptionsWheneverPossible, ResultSetSizeIssues
CategoryInfoPackaging

EditText of this page (last edited February 18, 2011) or FindPage with title or text search