Procedural Methodologies

Discussion of various procedural methodologies and techniques


I find it best to divide up procedural software into "tasks" or "events". Each task reads and sometimes changes the "state" from/to the database. It is a variation of the old "input-->process-->output" paradigm, except that a good portion of the i/o is with the database. The database becomes the backbone, not the code. The tasks are like cows feeding from (and to, yuk) a trough (troff?), which is the database. (See BigSoupOfClasses)

There might be some "master" tasks which dispatch to the tasks. In larger systems, the database is often used as a guide to this dispatching. For example, a table may store menu and/or UI event items along with the name of the script (filename) to run upon that menu or event. (This is a little harder to do in static languages, but is still possible using big switch/case statements to translate from a name to a routine. I suggest it be auto-generated from a compile script if such approach is used.) Related: EvalVsPolymorphism.

In my observation, the size and complexity of these tasks stays roughly the same regardless of the application or database size, at least for interactive systems.

But as the system gets larger, it is the *interaction* between the tasks that will kill you. The impact of any change in any task cannot be assessed without going through ALL other tasks, and no one can keep all that in his head, unless the tasks are grouped into modules with associated tables which other tasks cannot touch (but that breaks the simple "input-->process-->output" paradigm). "Gee, what if taskX puts a '2' in this row here instead of '1'?" "Uh... taskY will barf because it only expects '0' or '1'...)".

I don't see how it is any worse than OOP. The "flapping butterfly" thing will apply to anything. Do you want a log of which procedure made the data change in the DB? I need more specifics about what is bothering you.

The simple "input-->process-->output" (IPO for short) paradigm means that the DB is basically a big global variables store. By interactions I mean Non-locality, No Access Control or Constraint Checking, Implicit coupling, see GlobalVariablesAreBad

How is every task having potential access to the DB more evil than every class having potential access to every other class? In other words, talking to a People table or a People class is not that much different. They are a protocol of sorts. It is not "variables" which are global, but *protocols*, in both cases (DB and classes). DatabaseNotMoreGlobalThanClasses

in short, the IPO paradigm means every task is couple with every other tasks. In addition, since the data in the DB are not just some random bits, but represent some reality, there are rules about them that is *implicit* among *all* of the tasks, but is not spelled out anywhere. If you spell out the assumptions as DB checks/triggers, you have just rediscovered one reason of putting code and data together in OOP.

E.g., if a numeric column represents AGE, all tasks will probably assume the values will be positive. But what if the age is not known? Is it okay to put a NULL there? You wouldn't know unless you have all the details of the all tasks in your head, which will get impossible if the system has 20, 30 or more tasks. Any tasks that do a simple JOIN or any comparison will fail, such as counting records by age groups.

Sorry, but I find this example murky. Whether it is okay or not to put a Null there is part of the table interface (schema). One needs to know that just like they need to know a class interface. IOW, "ask the interface description" in both cases.

What's more, any tasks that takes the AGE, do something and put it back to some other table will spread the problem to other tables. So if you have a production system using this IPO paradigm, any bad/unexpected data may contaminate any number of records until someone notices the problem. Even when the problem is noticed, you got to have the production data to find out why, because the chance of you putting in the same bad data to reproduce the problem is practically zero.

And no, this does not apply to OOP because you can spell out any such assumptions in the Person objects representing such entity, in fact, the possibility of unknown age will be evident when you try to write the constructors of the Person class.

Triggers and referential integrity rules can do such validation. Plus, such validation applies to *any* language that uses the DB. If you do it with OO classes, then most likely only the language written in the same language as the constructors will be able to use it. IOW, it is a wider GateKeeper. Sure, you could perhaps have something like SOAP, but that is very similar to a "remote procedure call".

(BTW, this seems to be drifting toward the topic of FlawsOfProceduralDiscussion? or something rather than just descriptions of methodologies.)

Is it more appropriate to call this IPO a paradigm rather than a "methodology"?

I don't know. The borders between the two are blurry.


How is every task having potential access to the DB more evil than every class having potential access to every other class? In other words, talking to a People table or a People class is not that much different. They are a protocol of sorts. It is not "variables" which are global, but *protocols*, in both cases (DB and classes). ...[and later]... Sorry, but I find this example murky. Whether it is okay or not to put a Null there is part of the table interface (schema). One needs to know that just like they need to know a class interface. IOW, "ask the interface description" in both cases. ...[and then]... Triggers and referential integrity rules can do such validation. Plus, such validation applies to *any* language that uses the DB.

Either you are putting logic and constraints into the database or you are not. The initial description: "Each task reads and sometimes changes the "state" from/to the database. It is a variation of the old "input-->process-->output" paradigm, except that a good portion of the i/o is with the database" suggest that you should not, otherwise the it is no longer simply "input-->process-->output". If the logic is not in the database, you cannot "talk" to the People table as you could with a class, because the People table is just a data-store."

If you *do* put the logic and constraints in the DB, you are just putting code and data together as in OOP, but in the DB instead of the usual application program. So this ProceduralMethodologies is just doing OOP with the database, or is there more?

Well, I suppose this gets into the messy issue of what the precise definition of OOP is. It might be warrented to see if database constraints/triggers pre-date OOP and/or if they were motivated by OOP or created independently. Related is FileSystemAlternatives.

Let's ignore chronology for a moment, but concentrate on the difference between this ProceduralMethodologies and the usual OOP approach. What are the difference? From the description at the top, it seems that in ProceduralMethodologies, you should have:

But further down, it becomes:

Which is then similar to OOP with one class monopolizing all access to DB (which is how some OOP systems are done), and with a BigSoupOfClasses each representing a "task". What is the difference? Is the DB dumb or intelligent? Is the business rules in the "tasks" or in the DB?

Suppose it was "dumb". We could make functions that are the "official" access to DB entities (usuall for updating). Same role as a class. But if we consider the "smart" approach, I don't think that qualifies as OO. A BigSoupOfTasks? is usually easier to navigate than a BigSoupOfClasses because first, they are all tasks and only tasks. You know what they are. And second, often there is a table(s) or "dispatching" master task(s) that can be used to navigate to them. IOW, a kind of "task catalog". You can often easily navigate directly from the UI to the proper task. The relationship between UI's and tasks/events is usually cleaner and simpler and more consistent than between UI and OO classes IMO. There is usually a "backbone" which dispatches directly to tasks in procedural. That backbone is either a dispatching task, or an event and/or menu table(s). That backbone serves as a pretty good map to the tasks.

Regarding whether this is different from OO, OO tends to blur the distinction between the database and the application (see ReinventingTheDatabaseInApplication). Database tend to encourage one to divide stuff between shared concepts and application-specific concepts. OO tends to ignore the distinction or grow it organically. Second, is the "table shape" of relational that OO does not have to follow (see TablesAndObjectsAreTooDifferent and OoLacksMathArgument).


Re: "Is the business rules in the "tasks" or in the DB?"

It does not have to be mutually-exclusive. See FileSystemAlternatives.

I suppose "business rule" is kind of a vague term also. If the Customer table contains a code(s) indicating whether the best way to contact him/her is via email, phone, and/or paper letter, isn't that a "business rule"?

If it is just a character like 'P', 'M', 'E', then no. If there is another table mapping 'P' = phone, 'M' = mail, 'E' = email, and it gets to affect how the system works (i.e. shown to sales people on screen, or triggers the email system) then yes. The reason is if you only have the character 'P' in the DB, some task will have to know that 'P' = phone. OTOH, if there is another table or stored procedure storing 'P'=phone, any task just need to join that table or invoke that procedure and display/pass the result to other system.

The most important thing is the effect on the tasks. If you only have 'P' in the DB (i.e. business rules in tasks), you can change one task put 'T' in the DB meaning "telegram", BUT you have to update all other tasks that expects only P/M/E to understand T properly.

I find that a rather minor distinction to pivot the definition off of. Repeating the description rather than factor it to one spot might be bad factoring, but it does not really change the nature of what is going on. Related: ConstantTable.


Would it be correct to say that Procedural Methodologies group methods based on cohesion of functionality, while Object Oriented Methodologies group methods based on cohesion of data? For example, in a Procedural approach, one would start with a base Initialize() method and add lower level Initialize methods roughly in the chronological calling sequence. In an Object Oriented approach, one would have multiple objects each with an Initialize or Constructor method. In the Object Oriented approach, the chronological order is hidden, while the interactions between the Initialize method and other methods on common data is made obvious.

Some might consider that "scattered" rather than "hidden". What is the measure for "hidden" anyhow? One could argue that the sequence is easier to inspect if it is together. Relational can also make it easier to query by chronology or whatever factor you want to look at if you control that with data (see CodeAvoidance.) OO still has to deal with chronology, whether you can readily see it or not. Perhaps a sample application would help resolve this.


Re: "How do you share context between ever smaller tasks?"

I prefer subroutines/functions that have the option of "inheriting" the variable scope from the parent. (Support for it in most languages is often poor.) This is *not* good for general libraries or highly-shared subroutines, but is very helpful for breaking up specific tasks into sub-tasks. Unlike some others, I don't believe in trying to make every subroutine try to be globally "generic" by using only parameters for passing context. In other words, "leaky" scope is fine for nearby task-specific stuff. I generally divide up my designs into "tasks". Each task is usually composed of one "starting point" routine (similar to "main" in some languages), and the rest of the routines in the task are subservient to the main routine. Thus, the code tends to look like:

  ----Task A----
    mainRoutineForTaskA(...){...}
    supportingRoutine1(...){...}
    supportingRoutine2(...){...}
    supportingRoutine3(...){...}
    ....
  ----end Task A----
  ----Task B----
    mainRoutineForTaskB(...){...}
    supportingRoutine1(...){...}
    supportingRoutine2(...){...}
    supportingRoutine3(...){...}
    ....
  ----end Task B----
  ----Task C----
    mainRoutineForTaskC(...){...}
    supportingRoutine1(...){...}
    supportingRoutine2(...){...}
    supportingRoutine3(...){...}
    ....
  ----end Task C----
  ...etc...

Ideally, the supporting routines are invisible to other tasks to reduce name collisions. However, occasionally we do want to make them visible. Thus, such hiding should probably be an option per routine instead of forced by the language. We thus have two somewhat orthogal "settings" for a given routine:

--top--

So how do you share context between the supporting routines? If it isn't global data and it isn't passed as arguments, I don't know where you are keeping it. (BTW, your design looks like most of my classes, with the only difference being that I can group tasks together to share a common context at a scope smaller than global.)

"Regional" variables. In otherwords, module-level. Many languages use regional variables instead of caller-scope to achieve more or less the same thing.

Your modules sound a lot like classes. What languages are these that use "regional variables"?

Most "scripting" languages use them. Perl for example. In most of them, variables declared in the "main" section become regional whether you want them to or not. In Pascal you make them by nesting routines. Maybe your OO is not really OO after all.

It doesn't matter if my OO is "real OO". It sounds like the only difference between your code and my code is that I can create distinct instances of these regional contexts instead of reusing a single instance all the time.

What does that get you? DeltaIsolation does not work very long.

It has nothing to do with DeltaIsolation. It lets me create as many regional contexts as I need.

Why do you need it? What is an example?

Why do I need multiple regional contexts? To build software. If I want 10 foobars, I get 10 foobars. Why would I settle for one foobar? And my regional contexts don't accidentally colide with other regional contexts.

Usually there is a data-centric solution to such. And, they don't suffer from PolymorphismLimits. Is InternationalUiExample related to "regional contexts"?


Procedural Design Rules of Thumb

A more extensive list can be found at TwentyFiveOrSoRulesToBuildSoftwareThatWorksAndWhichIsEasyToMaintain (pending a better topic name).


Why does this page have so much about databases on it?

Because TopMind was here. Everywhere he goes he talks about databases, regardless of context. I think he gets a penny for every database sold.

PageAnchor: nested_reports

There are many problem domains that do not involve interacting with databases. How do the above suggestions apply to scientific computing, signal processing, or interactive graphics, just to pick three?

My domain is custom business applications. I cannot speak for all domains. There may be some related discussion in AreRdbmsSlow. If you wish to propose a topic name change or add your own observations about procedural methodologies in other domains, be my guest. -- top


Noun Indexing

One of the complaints against task-oriented grouping of code is that the associations with entities are not included (contrast with ResponsibilityDrivenDesign). I agree that such association can be a nice thing to have when searching or exploring code. However, there are two problems I find with the tight associations found in other paradigms:

Thus, although I agree that indexing by noun/entity is useful, it should be "soft" rather than hard-wired or imposing itself heavily into the code. Here are some suggestions for "gently" providing such information into a code repository:

             // entities: customers, orders, products

--top


In ProceduralProgramming, usually there is some kind of interface module or table that dispatches the "tasks" based on user input or other "events". For example, in LegacyCode it might like like:

  select on userOption
  case 'D' 
    deleteSalesOrder()
  case 'A'
    addSalesOrder()
  case 'V'
    viewSaleseOrder()
  ....
  case 'H',' '
    salesOrderHelp()
  end select

In larger systems, a ControlTable may be used for such. In GUI systems the "tasks" generally correspond to events. The code for dispatching is usually hidden from the application developer, but generally would look similar to the above or a ControlTable.

Batch setups are even simpler. You generally have something like:

   stepOne()
   if (condition....) {
      stepTwo()
   }
   if (condition....) {
      stepThree()
   }
   if (condition....) {
      stepFour()
   }
   .....
   cleanUp()

Note that I prefer to have the NounModel mostly in the database. Thus, my code is generally not shaped-by nor grouped by entities or noun taxonomies. Maybe some ProceduralProgramming or FunctionalProgramming fans do such, but I am assuming otherwise here. Well, I take that back: I might group some related task modules by entity if there is an opportunity, but this depends on the language and environment.

In short, there is a usually a "backbone" of some sort that dispatches the tasks (or event handlers). This backbone is fairly consistent and usually closely tied to the user interface so that it is fairly easy to map from user requests to tasks. Most user requests are described in terms of interfaces.


Re: if a numeric column represents AGE, all tasks will probably assume the values will be positive. But what if the age is not known? Is it okay to put a NULL there?

(Probably should move the below text to a new page, but discussing it here for now)

A practical option for eliminating the NULL in a column with numbers would be to use negative values, assuming your column supports negative numbers (you can change the type specification of the column if it currently only supports positive values).

Take an example: the AGE column was input via a data entry person, who was reading the AGE off some forms that were written in pencil or pen by customers. If the "Customer Did Not Provide Age" then put in -1 into the AGE column. "Cannot Read Age Due To Messy Handwriting" would be -2, and "Customer Did Not Wish to Disclose Age" is -3. The negative values act as an enumeration (-1, -2, -3, ...) and can be expanded when needed. This way, when anyone queries the database for bad AGE data, he can find the AGE data factually. One can query all the positive values of AGE and find legitimate ages. One can find all unknown AGE's by searching for negative values. In fact, it is no longer unknown - you know WHY the age is not entererd because now you have facts about why the age wasn't entered.

Using negative values, one can create another table that maps these negative values to their descriptions - then one can query this table and find out info about what the negative values mean. This is much much more factual and helpful than NULL.

I suspect this has to do with why Codd wanted to offer several null values later on in his research rather than a single NULL - because he wanted more descriptive "nulls" that had more meaning. Unfortunately, decisions about NULL'S and what to do with them are one of the biggest problems in databases. Deciding how to avoid NULL's elegantly is extremely tough - and no, I do not find Date and Darwen's complex "How to Handle Missing Information" solutions really elegant or convenient, although at least they attempt to resolve the NULL problem.

Possibly some would argue that using negative values as suggested above, is kind of like supporting several different types of nulls. Not so, because negative values can be queried and strictly specified in another table. This is much more specific and factual than a NULL or multi option NULL.

A problem arises when negative values already exist in the data column. No longer can negative values be used, then, for errors or missing information. This is a similar problem to error return codes from function, and the decisions that have to be made when returning errors in API's (and no, throwing exceptions and killing the program is not the most elegant solution either).

A problem arises with string columns, or columns of a type that do not support negative values.

Maybe databases need to support a special placeholder value for errors and missing information, which references another table that one can extend. However, BOOLEAN values should always be BOOLEAN.. without any choice of NULL or any special value for missing information.

Is this bringing back three valued logic? No: because booleans remain only TRUE and FALSE.. while other columns that aren't booleans can contain error values that are factually represented via negative numbers, or some other special representation such as ~~Error1, ~~Error2, ~~Error3, where these errors reference another table with key 1,2,3 referencing descriptions of these errors.

There was a big discussion a few weeks ago concerning complex values along with error messages or indicators, but I cannot remember where it is right now. I'll link back to it if/when I encounter it again. --top


See Also: TableOrientedProgramming, BigSoupOfClasses, EventDrivenProgramming, RelationalVsXp, QueryAndLoop, FundamentalFlawsInProceduralDesigns, StepwiseRefinement
DecemberZeroSeven

CategoryModellingLawsAndPrinciples CategoryMethodology

EditText of this page (last edited February 20, 2010) or FindPage with title or text search