Abstract Data Type

A data type can be considered abstract when it is defined in terms of operations on it, and its implementation is hidden (so that we can always replace one implementation with another for, e.g., efficiency reasons, and this will not interfere with anything in the program). Thus, speaking about such a type, we leave its implementation aside considering it irrelevant to the topic, unless we directly discuss the implementation.

Most ObjectOrientedLanguages provide the feature of user-defined abstract data types. In C++ (CeePlusPlus) it can be done with a class which has no public/protected data members (only privates), nor friends, nor any methods which return pointers/references to any of the private fields.

Excepting the issue of mutability, AbstractDataType shares a feature set in common with CoinductiveDataType.
An intuitive explanation: The term function is used in the mathematical sense here: as a mapping from one set of values (the domain) to another (the range).[2] This differs from the usual programmatic notion due to the fact that it just associates input and output values and has no means of causing programmatic side effects (such as assignment to global variables.)

For an explanation from a practical point of view see chapter 6 of BertrandMeyer's book "ObjectOrientedSoftwareConstruction". (The chapter is entitled "Abstract Data Types".)

Abstract data types were first formulated in their pure form in CLU:

"Programming with Abstract Data Types", B. Liskov and S. Zilles. SIGPlan Notices, 9(4):50.59, 1974.

"Abstraction Mechanisms in CLU", B. Liskov, A. Snyder, R. Atkinson, and C. Schart. Communications of the ACM, 20:564.576, 1977. See "Object-Oriented Programming Versus Abstract Data Types" http://www.cs.utexas.edu/users/wcook/papers/OOPvsADT/CookOOPvsADT90.pdf

A fuller (mathematical) account of ADTs can be found in:

K. Meinke and J.V. Tucker, Universal Algebra, Handbook of Logic for Computer Science. Volume I: Mathematical Structures (ed. S. Abramsky, D. Gabbay and T. Maibaum), Oxford University Press, Oxford (1992) 189-411

My working definition of the concept is a set of pre-defined operations that act on a data structure or grouping of "state". If you have a true ADT, then you can swap or change the internal implementation of the ADT without breaking existing code that uses the ADT. ADT is basically a contract with users of the ADT. The contract does not dictate internal implementation.

Modelling a notion of stored state

Mathematics tends not to deal with mutable state, and so mathematical ADT's are perforce immutable. It doesn't necessarily follow that software representations of ADTs must be immutable. For me the Abstract part of AbstractDataType refers to their hidden implementation, not to their lack of mutable state. If this is a difference in terminology, it's quite widespread. -- DaveHarris

ADTs don't so much hide implementation, as leave it unspecified. They describe results without mandating an implementation. For example a stack ADT might include equations like:

  pop(push(item,stack))= item
  pop(emptystack) = erroritem
  isempty(emptystack)= true
The stack could be implemented as an array or linked list. The important thing is the equations hold for the implementation.

Conventional mathematical notation may have no explicit support for mutability, but it is possible to model mutability and "side-effects" without too much difficulty.

You can do this with an algebra (a concrete ADT implementation) by recognizing that a dynamic system will have only one value at each point in discrete time. Each "point in discrete time" occurs between operations that can affect the state of the system (for example assignment).

Here at UWS[1] lots of people use a "next-state function" to model hardware (and some fairly simple) software systems. (This function effectively codes-up a FiniteStateMachine.)

S - the set of all states E - the set of all events (these could model method dispatch for instance)

System execution can be modelled by a function of the form:

 do_one_step : S * E -> S
		 {new_value1	if e=...
		 {new_valuen	if e=...
		 {error_value	otherwise
This function works like the iterator design pattern:
"Give me the present state and the operation to be done next and I'll give you back the resulting state".

There is no concept of "storage" here, but it's obvious that the result of the function describes the state of the system at the next point in time after the state given as the first argument to the function.

Part of my research involves making this work for OO systems (i.e. one "next-state function" per class and an overall "system next state function"). The interesting (ie. difficult) part comes managing the inter-class dependencies.

What is an object/object-orientation?

AbstractDataType(s) plus PolyMorphism plus ExplicitReuse? equals ObjectOrientation. SideEffect(s) may or may not be involved. -- DaveHarris

I used to think this way about ObjectOrientation, but now disagree with this due to the lack of a notion of object identity...

In my models I define state sets for each class of object, and a set of all object identities that can exist. So an object can be written (i,s) where i is an element of a set of all object identities, and s is an element of all possible states for an object of a certain class.

The reason I got interested in ValueObject s is that the conventional OO notion of identity (that I've tried to model) doesn't seem to work. Identity is solely a function of state with ValueObject s, rather than being independent of the state.

JimRumbaugh wrote in the OMT book (ISBN 0-13-629841-9 , Chapter 3, section 3.1.1 "Objects"):

... "The term identity means that objects are distinguished by their inherent existence and not by descriptive properties they may have..."

Example use of identity notion: Class of persons with name and address attributes of type string. Two John Smiths at 3 Wallaby Street can exist without confusion (identical state, different identity values).

ValueObject counterexample: Boolean class - (if such a class is subject to the quotation from the OMT book) multiple instantiations of "true" have different object id values(!) In this example, identity is really determined purely by equality of object state. [Unless you implement the Boolean class according to the FlyweightPattern... After all, there are only 2 distinct values.] ...

I'm quite happy to consider functional languages to be OO even if they don't have side-effects and/or identity, provided they have the other things I mentioned.

I don't think we have any argument over what is actually going on or whether it is good. This is a disagreement over terminology. And that's OK - I didn't comment because I wanted the whole world to use the same terminology, I just wanted to point out that more than one terminology is in common use. -- DaveHarris

On object identity: for those of a philosophical bent, have a look at John Locke's definition of substance (paraphrased and cross referenced at http://www.xrefer.com/entry/553633), the "I know not what" that underlies the two John Smiths at 3 Wallaby street mentioned earlier, that is conventionally not taken to be part of their state, but is that which makes them different. I find it easiest to think of this notion of object identity in terms of that inviolable thing to which all other attributes stick. For example, in the physical world, "object identity" is something akin to the "space" any object takes up - since two bodies can't occupy the same space at the same time, then a body's coordinates in some kind of space denote its identity. "Space" in a computer memory, for example, is the memory address, obviously; in a database it would be the row ID; etc. -- LairdNelson

A RelationalWeenie complaint about ADT's is that they tend to force an IS-A view of data and data structures. A stack or queue is a particular viewpoint of data, not a self-contained thing. Often you want to view the same data under different perspectives or conditions. If you want to lock certain people out of seeing certain things, then deny them the proper access, not by making a one-size-fits-all interface. Limiting your interface or implementation just because current requirements don't need other views is not the solution to data protection or security. Plus, it is poor future-proofing. A RelationalWeenie starts with full functionality of structures and removes (hides) stuff as needed rather than add to a structure piece by piece to get functionality, reinventing the database brick by brick. Subtracting is generally easier than adding features.

ADTs don't force an IS-A view of data. The same data can be viewed simultaneously through multiple ADTs via references.

That is the start of a hand-rolled database.

It could be the start of a "hand-rolled" anything. Databases, CRUD apps, operating systems, computer games, compilers, just about every piece of software uses multiple concrete forms of ADTs referencing the same data. Only we don't call it "hand-rolling", we call it "programming".

Doesn't that go against what ADT's are all about? True ADT's would not allow operations outside of the original interface specification without changing the spec. If not, then how does an ADT differ from a database? (Hmmmm. Perhaps a database is an ADT of sorts, since you are only supposed to access the data through pre-determined query languages or API's.)

Does what go against what ADTs are all about?

If anything can just "point to" an internal node in a data structure, then it violates ADT. You can't just reach into an ADT stack and copy or reference the data nodes, because that is an internal issue. You would not be able to swap the implementation. I suppose you could add a "dump" operation to the ADT, but that is copying data instead of referencing, violating OnceAndOnlyOnce. If you have an agreed-upon way of referencing the internals so that non-stack thingies can reference the internal stack data nodes (records), then you are starting to build a primitive database around the stack.

I don't understand what you're talking about. Imagine a tree of references and a map of references to the same strings. That doesn't mean trees and maps aren't ADTs, and it doesn't say anything about mutability of the strings. The ADTs don't force an IS-A view of the data, and that isn't a primitive database.

If your ADT assumes a string implementation, then it is not an ADT. You cannot map to the internals of a structure unless either: 1, you know its internal implementation, or 2, you have an interface on your ADT that allows internal references. The first violates ADT rules because things would break if you changed the internal implementation, and the second is a primitive database. (A database is a big ADT, more or less). At the very least, it is no longer a "stack", in the strict sense.

The strings are not part of the implementation of tree or map. The example doesn't involve "mapping" to the internals of a structure. You can add the same string references to a tree and a map, both of which are ADTs. It is not a primitive database. It is two ADTs referencing the same data.

Do you mean like:

  s = "foo bar"
Where the RAM address is stored instead of a value? If so, what if one does:

  s = "new value"
One has now changed what the stack and queue look like without using their operators. Besides, we had to decide in advance to only store addresses instead of values to allow sharing. Not if s is a reference and those methods are pass by value. All you've done is cause s to refer to another string. You haven't modified the references in the stack or the queue. Any statically typed language will require that you decide the types of push() and add() in advance. You can make that type as generic as you like.

What you have then is essentially an index to RAM structures. I still classify such a system (data structure plus "stack", "queue", etc. indexes) as essentially a roll-your-own RAM database of the "navigational" sort.

You may choose to classify things any way you wish. However the rest of the world will merely consider you ignorant of basic computer science. This may be the essence of the difficulty you face when debating with people from a computer science background. We classify things based on a few thousand years of mathematical theory and approximately 70 years of computer science. You don't. Hence we don't understand you and you don't understand us. You may well be right but if asked to choose between following you and following the people who built the universe of software I know who I'm going to choose.

Computer Science is a goddam art, NOT a science. See DisciplineEnvy. A Long tradition does not make stacks "right". There is no mathematical theory that says stacks are "right". If there is, please show it. Besides, if "stack" is defined only as an interface, then RDBMS can be used (in whole or in part) to implement one.

Erm, no. Arts are judged on aesthetics. Sciences and crafts are judged on utility, fitness for purpose and correctness. Stacks are neither right nor wrong. They're merely fit for their purpose or not. There's a conventional definition of stack which people in the field use. You don't have to use that definition but then no-one will understand you. Just like you can redefine red to mean green but don't be surprised if no-one takes your opinions of strawberries seriously.

Are you suggesting we stick with somewhat archaic ideas just to ease communication? (QwertySyndrome) I can see the idea of a stack as a "concept" to "process the most-recent entry first", but making hard-wired interfaces based on that is going too far. Besides, there are more than one "language" to base communication on.

Precisely. That's why we keep using 0 and 1. If you come up with something new give it a new name. Don't rename existing concepts because you won't be able to communicate unless everybody accepts your renaming. ADTs are basic building blocks. You use them to build things like tables, sets, relations etc. Then you use those things to solve your problems.

That's right. Oracle, MySQL, and Microsoft make table engines so that custom application developers like me no longer have to use primitive structures. That is called "progress". That is called using higher-level tools and constructs to simplify projects.

Thus tables aren't basic building blocks because they're constructed using basic building blocks.

It is all relative, or fractal to be more precise. To a carpenter, a drill is a basic tool. Yet it is composed of hundreds of building-block parts, such as standard nuts and bolts and layered wires.

[No it isn't relative. You're advocating (using your metaphor) replacing bolts with a drill. Other people are pointing out that whilst drills are very useful there's no point in using them when a bolt will do. I keep making this same point: it's an AbstractionInversion to suggest replacing fundamental and basic data structures with tables that are composed of these same fundamental and basic data structures. In others words if I want a BTree it doesn't make sense to use a database table that's been implemented using a BTree and then jump through SQL's hoops to make it behave like a BTree.]

I actually feel Top's last point was fair enough. Tables, or even whole databases, can be building blocks if you're looking down from a great enough level of system abstraction. Top did not (on this page) advocate the use of tables in a manner that would clearly be an AbstractionInversion. In a sense, the table update and query interface could itself be qualified as an 'ADT'. So could the DBMS interface. If ADTs are basic building blocks, then tables may also qualify. Not everything is relative, but some things are.

Continued or related discussion in:
See also: ThereAreNoTypes, AdapterPattern

View edit of April 22, 2012 or FindPage with title or text search