Data Structure Centric View Discussion

This is to discuss a viewpoint that "everything is a data structure" way of looking at things.

It is motivated by issues raised under TableOrientedProgramming and ObjectsAreDictionaries.

Note that viewing everything as a DataStructure is not necessarily the same as wanting to work with everything as a direct DataStructure (as opposed to code). It is simply a conceptual exercise for development, theory, and discussion.


(Abbreviation: DSCV = Data Structure Centric View)

Most software can be viewed as merely data structures, whether you agree with the value of doing such or not. Compilers and interpreters often convert code into some kind of data structure in order to process it. (Not that its particular format is necessarily best for humans.)

If this is the case, why not take this approach instead of the code-centric approach that ObjectOrientedProgramming seems to take. In other words, code-as-data-structure versus data-structure-as-code. While I don't aim to have zero code, I think OO goes too far in the code direction. DSCV-based paradigms (like TableOrientedProgramming) tries to minimize dependency on code and complex code structure. Under these, code tends to be small and local "snippets" that solve particular tasks or handle particular events.

Not only is my own thinking clearer with a DSCV, but one can use query or query-like languages to inspect data structures from different perspectives. This is something that is not found (naturally) with code. IOW, I want to be able to issue SQL-like queries on my software structures and contents to give me custom views that hilite what I want to study and hide away what I don't.

In theory, a better view of the software reduces maintenance effort and reduces errors caused by not being able to see different aspects of the software.

For a thought experiment, assume your favorite piece of software is turned into a large data structure(s). For the sake of discussion, lets assume OOP software that is converted to something similar to what is in ObjectsAreDictionaries. One could not say it is "worse" for being like that, because it does and is the same thing. Only its representation has changed. (Lets ignore speed issues for the moment. Note that it is necessarily faster or slower, but speed issues would just mucks up the discussion at this point IMO.)


Re: Most software can be viewed as merely data structures

I don't understand. How can executable code be viewed as a data structure? There is a high degree of coupling between data and methods, but that does not make them synonymous.

That is a lower level than I like to focus on, but RAM can be viewed as one big array. The level I like to focus on would be more like the diagrams in chapter 8 of Bertrand Meyer's "Object Oriented Software Construction, 2nd.Ed". Those are the kind of mental structures that run in my mind when looking at OOP code. It shows objects as boxes with sub-boxes holding data or pointers or references to other objects or classes (a dynamically typed language would be a bit different). I find the relational approach "cleaner" in that one is dealing with a higher-order math in a sense. I cannot quantify "high-order", it is just a feeling at this point. I am working on trying to describe that feeling better, but it is tough because it is at the very edge of my articulation skills. It might all be subjective, but I can't tell yet.


If one agrees with the above, then a second issue arises: Which kind of data structure? Common candidates include the HierarchicalDatabase, NetworkDatabase, and RelationalDatabase. The relational approach is superior in my opinion. OO tends to resemble the first two.


The LispLanguage is often regarded as the first attempt to meld or blur the distinction between data and code in a high-level language. Thus, the concept itself has a lot of followers. The biggest remaining issue is what kind of data structure philosophy to base it all around. Concepts like TableOrientedProgramming can be viewed as extending the LISP list concept into a relational view of "structures". The network model (NetworkDatabase) and tree-based models (HierarchicalDatabase) are other contenders.

It is also one thing that keeps people away from lisp, because some people don't like blurring the distinction. Many skeptical people would start scratching their head when something is muddying the water instead of clarifying.

"code-centric approach that ObjectOrientedProgramming seems to take"

I don't agree with your basic strawman proposal. Object oriented programming combines code and data into a unit. The argument is not data-centric versus code-centric, but rather "data without code and code without data" centric (data structures and procedures) versus "data with code" centric (classes).

I am not sure what you mean here. There are basically these possible "repositories":

  A. Textual Code
  B. Database (or shared "structured" storage of some sort)

1. Algorithm in A, Data in B 2. Algorithm and Data in A 3. Algorithm and Data in B 4. Algorithm in B, Data in A (probably rare except for xfers and conversion)

(There are more if we copy to both, but that is often a violation of OnceAndOnlyOnce IMO)

Which combination are you focusing on?

I'm really not sure where you are going with this, but I think the proper representation is:

 * Algorithm and Data in A, Algorithm and Data in B

I am not sure what you mean by "repository" and I assume you are referring to the operating environment. "A" (Textual Code) runs under an computer operating system (such as UNIX, WindowsNT, or [stretching the definition a bit] IIS or a Java Virtual Machine), while "B" (database) runs under a database management system (such as Oracle 9i). "B" tends to be data structure + procedural code, while "A" can be either data structure + procedural code or object oriented classes.

Think of the difference between managing a grid, list, or table of information via a grid widget (spreadsheet-like) versus managing it via ASCII HTML tags (LI, Table, TR, TD, etc.). Most OOP code resembles the latter in "feel" IMO. It is usually stretched out in a linear fashion and has a lot of "packaging" surrounding each item of information. Viewing it as a data structure, such as table grids and/or outline trees, makes it visually easier to navigate, see patterns, and apply "structure transformation operations" to alter your view of it in an ad-hoc or temporary way in order to study or work with it from different perspectives.


What I usually see in practice (and this seems to be relatively successful) is groups of classes bound together via passed data structures. You can see this either inside a program, often when tying GUI classes to business rule classes, or between programs via pipes, sockets, MQ messages, XML, or whatever. Data-centric or class-centric views are equally valid and it merely depends upon what details you are interested in at the time.

Pipes and sockets for GUI's? Do you mean client/server like?

[Aside: I intended the above to imply pipes, sockets, etc. are used between programs not for GUI classes within a program.]

Most GUI to business logic coupling is done by exposing the internal class data values and copying them from one to the other. The pure OO approach would be to combine a GUI and business class into one in order to encapsulate and share the data.

I disagree. OO means Object Oriented. That means that you make objects out of things. That means that each different class is an object ... a hammer is a business tool. It does something, namely hit things. Your eyes are a visual tool. They see things. A TV is also a visual thing. It displays something for your eyes to see. You could consider a board with nails waiting to be hammered a data repository. As you can see with my oversimplified example, the pure OO approach would never combine a GUI and business class into one. Instead, you'd have many different business classes doing work (hammers, saws, and drills, for example) on data classes (boards, nails, and screws), the result of which is being transmitted to and displayed by a GUI class (the TV showing you the home improvement show). Thus, the object oriented approach uses various different thingys with lines connecting them, as described above. -- Dan Cook

There is no consensus definition of "object" nor OOP. You will get a different story from each practitioner. Your definition seems to be the "self-handling noun" version, which is close to Simula-67's roots.


Here is an attempt to consolidate my reasoning for preferring a data-structure-centric view and TableOrientedProgramming.

1. The program structure can be represented as a data structure. (Or at least a good portion of it.)

2. Having data structures allows us to sift through information and alter our viewpoint in an ad-hoc fashion to improve our "work surface" based on current needs. (This is done by query languages and data structure viewers/browsers/navigators, etc.)

3. Programming code representations generally don't have the benefits of #2.

4. The above imply that programming code is best represented as data structures. (Or, at least a good portion of it.)

5. There are generally two competing general-purpose data structures in common use: NetworkDatabase-like (a web of dictionaries with pointers linking them), and Relational.

6. Out of the two listed in #5, relational has the best theory, query languages, and presentation techniques surrounding it (OoLacksMathArgument).

7. If we put #4 and #6 together, the implication is something like TableOrientedProgramming.

Feel free to poke holes in this line of reasoning.


Re: The program structure can be represented as a data structure

I am not sure what this means. Could you elaborate on how a program structure can be represented as a data structure?

Well, see ObjectsAreDictionaries for an example of how objects and/or classes might be represented as a DataStructure. EntityRelationshipDiagrams could be represented as a data structure. A typical subroutine using structured programming could be divided up into a tree to reflect the block structure. Chapter 8 of ObjectOrientedSoftwareConstruction 2 (see above) has some graphical examples of object representations. (This is not necessarily saying that it all should be represented as such, but merely that it can.)

See: AbstractSyntaxTree

I have read through the references and I still don't understand. How would I represent a many to one relationship in an ERD as a data structure? Doesn't a data structure only allow a one to one relationship between members (although, I guess, a zero to one relationship could be implied). Also, I am getting confused as you seem to be jumping around between "program structure," "objects and/or classes," and "EntityRelationship? diagrams." I would not consider those to be synonomous. A little further clarification would really help me in understanding your point.

I suppose this pivots on the definition of DataStructure. Note that I consider a relational structure to be a DataStructure also.

(Layout example moved to EntityRelationshipDiagram)


This is a little tutorial to illustrate how data-structuralists (DS-ists) tend to think about things. I will base this on some common OOP concepts to server as a comparative guide. Note that other DS fans may have slightly differing views. Please chime in if you have a different take.

Encapsulation

Encapsulation has multiple meanings in it. One aspect is physically grouping related operations together. A DS-ist does not "see" things as grouped physically together. Things being together is an "operation" or "query" or "view" of or on a data structure. You apply a formula or criteria to generate a view of what you want to see together at any given moment. In my observation one's desired view of a given thing is highly situational and relative. In this view, DS-ist philosophy helps bring about as-needed views of things. In other words, encapsulations can overlap. Things being "related" is often multi-dimensional. But, printouts and text tend to be one-dimensional. Thus, only one relatedness factor can be emphasized at any given time (per viewer).

(There may be absolutionist DS-ists out there who think that one view or grouping should always serve as the primary view {see PrimaryNoun}, but I have not met any. Thus, I will assume a relativistic emphasis.)

Another aspect of encapsulation is "protection of innards". I will divide this into state integrity and user permissions. State integrity is ensuring that state (data) will fit certain criteria. One technique for enforcing integrity is the GateKeeper approach--have only one access point so that any changes to state have to go through a pre-defined gate guard (interface). as long as the gate-keeper checks everything that comes in, any item or change that passes through is sure to have passed the inspection. Another approach is "update triggers". Whenever the state is about to change, an algorithm is executed that inspects the change request to see if it is valid. DS-ists tend to use or prefer the trigger approach. However, the two may really be different ways of looking at the same thing.

User permissions is the management of who or what is allowed to access something. (It can be further divided into types of access, such as read-only, add-only, etc.) There are various ways to enforce user-permissions. Scope modifiers are one approach. For example, some programming languages will have scope modifiers such as Local, Protected, Friend, etc. The problem with these is that they don't provide much discrimination to the outside world. IOW, they are all-or-nothing more or less. A more flexible approach is an AccessControlList (ACL). ACL's are a natural fit for data structures. However, they can be more complicated to manage. But that is often the price for flexibility.

Inheritance

The biggest problem a DS-ist has with inheritance is that as a tree structure, it is one of many possible data structures (views) to apply to something. If you need a tree data structure, that is fine. However, if the same information needs to participate in some other structure (view), then that should not be hindered. Being an IS-A relationship under one DS view should not prevent nor hinder it from being part of an HAS-A relationship in another situation.

Inheritance is basically a "look-up" strategy. It is a kind of search-path that is used in order to find a sought-after item (data or algorithm). If node A does not have the item, then we check node A's "parent list" for other nodes to go searching in. A tree-shaped look-up strategy is considered no more or less important than any other look-up methodology. A DS-ist will wish to be able to switch or alter a look-up strategy with relative ease. Thus, the tree should be virtual or ad-hoc.

Polymorphism

As in inheritance (above), a DS-ist views polymorphism as just another look-up strategy. Nodes are simply searched to find a unique match under polymorphism. Polymorphism often does this look-up based on the "type" of the node. If later we want to alter the look-up to be based on some other criteria besides or in addition to "type", it should be easy to change that.

We may also want to change how many "matches" we get back. For example, rather than just finding the one match usually assumed for polymorphism, we may want to return several candidates and either operate on them, or apply further criteria to narrow down which one gets selected. For example, a GUI event may be "observed" by multiple listeners, not just one. The order that the listeners are notified may be based on a ranking field. We may start out with a simple look-up (dispatch), but it may grow more complicated over time. We want to avoid ContinuityPrinciple problems.

(I personally think that "type" is too rigid a concept. The "type" of something is often relative to the user or needs. If you are talking about a "type of person" or a "type of car", then it is usually within a specific context, rather than a universal attribute or taxonomy. The feature or field that serves as the "type" indicator can vary widely. Most attempts at universal taxonomies fail in practice unless trivial.)

As you can see, A DS-ist tends to see OOP's view of these concepts as rather rigid. How something is grouped or how things are searched (dispatched) is rather open-ended in practice. Giving special names to a few select approaches, such as Polymorphism, Inheritance, etc., tends to put blinders on one WRT the myriad ways to "surf" our structures. It is like designating 6-wheeled vehicles as sacred, but ignoring or downplaying the usage of 4, 8, 9, or 18 wheels.


Pondering: MergeMe with DataCentricThinking ?


See Also: SparseColumns, TableOrientedProgramming, CodeAvoidance

EditText of this page (last edited February 6, 2012) or FindPage with title or text search