Limits Of Hierarchies

Hierarchical taxonomies of just about anything are usually problematic, except for the trivial perhaps. I think software engineering has gotten carried away with trees. Even in the life sciences taxonomies tend to be somewhat arbitrary because bacteria and other organisms can grab DNA and stick them in a far-off organism, busting the tree.

Hierarchies can give way to folksonomies -- CrowdSourced organization.

I think hierarchies are good for front-end navigation, but should not serve as the only organization mechanism, especially internally (except for the trivial).

See further discussion of taxonomy in biology in LimitsOfHierarchiesInBiology.

Also see:

Product catalogs, company org charts, and book grouping (Dewey Decimal) are also considered a hierarchy. However, I found situations where something can belong to two or more groups such that they make lousy leafs. I found (overlapping) sets more fitting than hierarchies.

Re: "sets are more fitting" - sets are rather general; they always fit. Hierarchies (i.e. trees, right?) are sets, but too specific (==restrictive) ones, because they lack orthogonality support. A multi-dimensional tree could be a solution: You start with a tree, then add another one with the same nodes, but with different edges (maybe in another color). cf. from ObjectOrientedProgramming to AspectOrientedProgramming. -- AndreasHaferburg?

Re: "company org charts" - the very idea of a management hierarchy, because it conflicts with the reality of information flow, tends to produce the FearUncertaintyAndDread? that in turn produces heavyweight methodologies and BigDesignUpFront. -- TomRossen

I don't know about that. I have been a victim of the MushroomPrinciple due to "low ranking".

That would seem to be a variant of FearUncertaintyAndDoubt caused by hierarchy - so how are you disagreeing with me? -- tr

Perhaps I am not correctly parsing "it conflicts with the reality of information flow".

Sure a rigidly hierarchical (some would say "tall") org chart hinders communication. But some sort of hierarchy is absolutely necessary - I would suggest that "absolutely unambiguous assignment of who's responsible for what" is the first problem you have to solve in most organizations and hierarchical structures do this pretty effectively.

I have before tried to use org hierarchies in applications, and it did not go over very well. Managers wanted it "tweaked" for different needs. Also, "matrix organizations" are not uncommon, which means you have 2 bosses.

What is meant by "they make lousy leaves"? This seems to be the crux of the argument against hierarchy, but it is never defined. It is a metaphor, not an argument; the argument is that categories overlap.

Overlapping categories? Isn't that what multiple inheritance (CeePlusPlus) and interfaces (JavaLanguage) are for?

Multiple inheritance results in "structures" that are no longer a hierarchy, or at least a "clean" hierarchy. Some of us consider them a kludge, and believe SetTheory, relational, or something similar to be a superior way to manage such. -- top

For other alternatives to multiple inheritance in object-oriented code, see also traits, multimethods and CLOS.

Level-Per-Attribute Anti-Pattern

One thing that really ticks me off is the creation of trees where each orthogonal feature becomes a level. Example hierarchy:


Coke --Diet ----Caffeinated ----No Caffeine --Regular ----Caffeinated ----No Caffeine Lemon-Lime soda --Diet ----Caffeinated ----No Caffeine --Regular ----Caffeinated ----No Caffeine Iced Tea --Diet ----Caffeinated ----No Caffeine --Regular ----Caffeinated ----No Caffeine
(I added dashes because the spacing keeps getting messed up by a rogue browser somewhere.)

Some of the problems with this approach are that adding new attributes balloons up the leaf count, and the levels arbitrarily rank some factors higher than another. For example, if your doctor insisted that you avoid caffeine, then you would probably want that at the highest level to more easily see which flavors don't have a non-caffeine version. But somebody watching their calories may want the Diet attribute moved higher. Level-per-factor can easily grow into an AddingEpicycles kind of mess.

Unfortunately, XML and hierarchical file systems require that kind of kluge. Is something like the following any better?
	Diet -> /Data/Coke-Diet
	Regular -> /Data/Coke-Regular
	Diet -> /Data/Lemonlime-Diet
	Regular -> /Data/Lemonlime-Regular
	Coke -> /Data/Coke-Diet
	Lemonlime -> /Data/Lemonlime-Diet
	Coke -> /Data/Coke-Regular
	Lemonlime -> /Data/Lemonlime-Regular
This has even more leaves, but a script can be written to automate their generation, and all of the actual data is in the /Data directory.

This problem can (most times) be resolved in OO programming by using has-a relationships instead of strict is-a relationships.

But then you lose what is "special" about OO, and start rolling your own NetworkDatabase of sorts. This is utterly false. Replacing inheritance with composition is a very powerful OO 'move', and as mentioned below, features in many design patterns.

I must say that some of the cooler things I've learned about OO are enabled by this, and yet have no analogue in databases of any kind. Many patterns are good examples of this.

Could you select one for analysis here?

What about the whole Facet/Tag approach?

  Facets = { "cola", "lemon-lime soda", "iced tea", "diet", "regular", "caffeinated", "caffeine free" }
And then each item in the system has 0-n tags?

If we wanted stronger typing, we could define each of those as a different dimension with valid values, and every object has to exist at an intersection of those dimensions (but not every point has to correspond to an item).

"Caffeine-free diet cola" being "caffeine-free" within one dimension, "diet" within another dimension, and "cola" within another. We can, of course, implement an arbitrary number of dimensions into the system, each with its own 'points' and constraints. And not all dimensions need to be considered. Sort of how a two-dimensional shape becomes an infinite prism when you add the 3rd dimension.

-- JonathanMitchem

From a practical perspective, it seems it would be difficult to track all that in textual classes. And, require programmer intervention to add or change products. A similar issue came up in PayrollExample. I'd suggest a database, as usual. By the way, this example is revisited again later below. -- top

I maintain that like "a tree falling in the forest", classification hierarchies exist only when there is an observer to classify things. Long ago when conducting manufacturing seminars, I would point out that there are only piece parts - all other levels in a bill of material are organizing artifacts, as evidenced by differences between the "manufacturing" and the "engineering" bills of material or how the manufacturing BOM needs to change as the manufacturing process changes.

The fact that the same observation seems to hold for most hierarchies, suggests to me, that from a systems perspective, there should be multiple, independent organizational structures available to access separately stored entity data at the lowest level. In the example above, the attributes of coke/lemon-lime, diet/non-diet, caffeine/caffeine free can be inserted into the hierarchy at any arbitrary level, yielding multiple classification structures that, based on needs of the observer, have varying utility. Or consider structuring footwear products: multiple sizes, men's/women's, casual/formal, loafers/boots/pumps/slippers/sandals, etc. Again, of the combinatorial possible structures, the suitability of each is a function of the seeker of the shoe data, rather than any inherent value in a single arbitrary ordering of the attributes in the hierarchy.

I would think that the XML structure problem is best solved by not encoding an arbitrary hierarchy in the XML structure, but by including only the end products, and including with each the attributes (diet-caffeine-flavor-packaging size, etc.) that might later be used to organize one or several externally defined classifications.

But there are classification grouping that are independent of, or at least need to be derived from, item level attributes. An example for soft drinks might include sales rank (perhaps by territory or route salesman - invoking other hierarchies), for library classifications, an era - using externally defined groupings based on publication date - or significance based on citations in other publications.

It seems important to keep item attributes at the lowest level, without reflecting any external (perhaps transient) structure. For example, keeping geographic coordinates of plants and sales offices, would enable external conversion/classification of derived attributes like zip, area and dialing codes, political subdivisions, time zone, continent, hemisphere, national holidays, work weeks, sales/value added tax rates, etc.

I've not been able to find any taxonomy or classification theory (that reaches beyond biology) that would help me better understand the problem, although I keep thinking there should be some 101 level course that explains it all. -- JimRussell

While I cannot seem to formulate as good an example as you Jim, the one thing that concerns me is your focus on Data. Taxonomy doesn't necessarily mean all we are looking at is what is different between the attributes of one taxonomical member to another. At the same time I agree that depending on the observer of a system, the classifications have different meanings. Using your coke example, if the system simply sold soft drinks, then coke would not be a type. However, if the system modelled the different types of soft drink for some analysis perhaps, some taxonomy might be required. Maybe I am missing something here, but I myself have never really seen over-complicated hierarchies at the implementation stage. In analysis they offer a perspective on the problem domain, but at design there are ways to deal with a lot of taxonomy problems. I would concur that there are situations were one classification is classified by different things at different times, and a rigid hierarchy is not an enabler for this situation. I try to stay away from thinking about my classifications as just data.

Behavioral classification is even more trickier than data classification IMO. Orthogonal influencers are especially tricky on such taxonomies.

I maintain that like "a tree falling in the forest", classification hierarchies exist only when there is an observer to classify things. -- JimRussell

{It perhaps should be pointed out that there are other classification approaches besides trees. Sets theory comes to mind as a competitor.}

Violent agreement. You might find XFML interesting ( XFML is a simple XML format for exchanging metadata in the form of faceted hierarchies. My contention is the classifications are relationships formed by a point of view. There is no reality that can be modeled. -- AnonymousDonor

I don't know, it looks to me like they are reinventing relational databases the long way around there. I would rather look at a City schema and issue queries against it than fiddle with Yet Another XML Format (XqueryLanguage).

Related: XmlSucks

I maintain that like 'a tree falling in the forest', classification hierarchies exist only when there is an observer to classify things. -- JimRussell

While I agree with most of the sentiments expressed here, I can think of one interesting counter example to this claim: ontogeny. The process of division that leads from [a single] fertilized egg to [a single] multi-cellular organism. -- EricHodges

Have you seen those drawings that classify all living things in the world on one or another branch of the "tree of life"? EricHodges is *not* talking about that. We are talking about a *single* living organism.

The process of division that leads from [a single] fertilized egg to [a single] multi-cellular organism creates a concrete hierarchy of cells. Each new cell is a specialization of its ancestor. This could be one of the most successful DesignPatternss on the planet since all multi-cellular organisms are constructed this way. -- EricHodges

I'm talking about ontogeny (of a single organism), not taxonomies (classifying many organisms). In which case, it's not a counter example, as Jim's assertion says "classification hierarchies" (it may have been edited since you wrote this; please refactor if so). Sure there are other kinds of hierarchies in nature - for example, I believe that trees (you know, the big green leafy things) are tree-structured {Move issue to LimitsOfHierarchiesInBiology perhaps.}

But every green leafy on the tree is of the same Type. -- AnonymousDonor

Cells forming in an embryo each have one and only one ancestor, forming a perfect tree.

the existence of a perfect tree still does not exclude non-tree views for various purposes.

I'm talking about how a multicellular organism grows from a single cell. I'm not claiming the tree structure excludes non-tree views, just that this tree structure isn't imposed by an observer.

Not that I don't share your interest in the example, but I just wanted to point out that seeing this "as a hierarchy" is still dependent on the observer - IOW JimRussell's position is still valid: that the "existence" of the hierarchy is because the observer (a person) interprets what is happening that way. -- BillCaputo

In what way does the existence of this hierarchy depend on an observer? A single cell divides. Those cells divide. Etc. An external observer isn't imposing tree structure on them in any way I can see. That structure is a result of how they reproduce.

It is not the mere existence, but the importance given to it. A given observer may perhaps not give a honk about the cell's ancestry for their particular study. That info may not even be available. Say the observer found the cell in a sewer system. Its parents or "family" may have drifted hundreds of miles away or be deeply hidden in trillions of other unrelated cells. If they had GodsGoogle, perhaps they could find the actual hierarchy, but they are just human.

Further, isn't accidental cross-mixing of cellular material and/or some DNA still fairly likely over the longer term? Thus, there are no "pure" trees in biology. Again, this is an issue for LimitsOfHierarchiesInBiology IMO.

The claim was "classification hierarchies exist only when there is an observer to classify things". I provided an exception to that claim. It doesn't matter if a given observer doesn't care about a cell's ancestry. The hierarchy exists independently of the observer. It doesn't matter if the cell is separated from the hierarchy. The hierarchy existed before the cell was separated. A virus might move DNA between cells, but that's an exceptional case. My original exception stands.

But you don't [provide an exception to that claim]. Where is the original zygote in full-grown organism? Chances are it's not even there, together with many of its descendants, and the full perfect tree exists only in your imagination. Also, is there a way to establish a unique ancestoral path between any two cells in the body? I seriously doubt that... So, your tree exists only in some abstract/historical sense. Also, the analogy is all wrong, you are trying to explain simpler things (cars, geometric shapes, etc.) by referring to _very_ complex ones, like life. Every cell has in its DNA all the information necessary to build an entire organism (and probably much more), so even if it's a tree, it's surely an unusual one, with every node as complex as the whole tree. Another fallacy is that you presuppose that there is a classification of all cells in the body other than that of an observer. I'm not sure this is true, like there are N well-defined classes of cells and an unambiguous way to establish the class of cell X, i.e., the same for any observer. There are, as we now know, stem cells, that transform themselves into other kinds of cells and probably many other strange things, otherwise it's difficult to explain how from a single cell all the different cell types develop.

Obviously there are trees/hierarchies in nature, like genealogical trees (or kinship systems if you want to avoid historicality), rivers/tributaries, mountain systems, etc., but these are mainly value-sorted aggregates of uniform objects, and when you try to put some qualitative difference (like in oceans/seas/bays/etc.), you necessarily introduce subjectivity. Main problem is typology, not structure.

I suppose a lot of things in nature have "links". Whether anybody knows or cares about them is another thing.

[EditHint: Move biology-related stuff above to LimitsOfHierarchiesInBiology]

Have you seen those drawings that classify all living things in the world on one or another branch of the "tree of life"?

These issues are taken up in LimitsOfHierarchiesInBiology. Note that even if we by chance agreed that biology had a "natural hierarchy", there are other ways to view biological organism besides just trees. For example, "all animals that live in deserts" is a view mostly independent of genetic history. For reptiles, mammals, birds, insects, etc. can live in the desert. I call this the "has-a" viewpoint of structures as opposed to an "is-a" view. We don't want trees to make non-tree views difficult. Thus, one should avoid "hard-wiring" info to trees.

That page deals with taxonomies. Taxonomy of clads or species has trouble because of sexual reproduction. Each organism can have two parents, and each parent may not be part of the same clad/species depending on the definitions of the observer.

I think we can agree that such trees are a UsefulLie which may reflect reality, but not perfectly. Just how useful is where most of the debate lies.

Yes, indeed I do find the XFML link interesting. Thanks much! After a quick glance, I want to read the book! I suspect I will find clarifications of things I have never been able to express well. (And to the other responder, relational databases for sure are one of the tools that could be used to maintain independent classification structures - but too often the designers mix the classifications in with the data. When used with junction tables (the MS name) to maintain many-to-many relationships, they would be ideal in that classification hierarchies can come and g[r]o[w] independently of the primary data being stored. But that is just a tool choice.) -- JimRussell

I am curious about the problems or issues with "mixing classifications in with data". Is a classification *not* data?

Classification is not necessarily just data, though a classification may have attributes that shape the classification. I could classify a keyboard as a type of computer input device. It has some keys, but its classification embodies more than just its keys. It sends messages to the computer, and even receives signals back. But it is an input device - a special kind of input device. There should be other things other than attributes that apply to one's reasoning when engaging in the process of finding classifications/abstractions.


XFML looks okay, but is simplistic. In another life, I built a document query system (impenetrably described in DocQueryInSql) which required structured categories (structured facets?) among other things. The point is that we want an associative lookup on attributed facts.

XML is a little overinvolved with its hierarchical structures, and thus is very limiting. Any structuring can always be described using a semi-ordered 'key' tuple which describes the (relational) attributes of some data.

These attributes may be meaningful (as in a relational database), or may be artificial (as in an explicitly structured approach e.g. objects). Often a mixture.

The meaningful approach is less constrained, thus the (potentially) greater power of relational set theory.

Nevertheless, the most powerful approach is to use a system which supports an arbitrary association of attributes at a single point of observation.

Interestingly, this is the most natural approach when people compose queries, since there is no inhuman structure to deal with.

I could blather on more, but only if anyone actually cares. For those who know what I'm talking about, this may make sense. Greetings brothers :). -- RichardHenderson

If relational can indeed provide "taxonomies on demand", then why is it not more commonly recommended to solve classification debates by making them user-defined or local based on existing attributes? Is it perceived to lack built-in tree-friendly operations for those who want a tree view? Do developers prefer to see the "shape" of their classification in code in order to relate to it better?

Relational can express any taxonomy, since it basically ties things together. It doesn't produce anything as such, it is just an abstraction of knowledge into two basic entities 'things' and 'relationships'. This is my approach anyway. Implementing such systems is surprisingly difficult in relational databases (go figure) as they have loads of added constraints that have no place in the abstract scheme.

Relational does not define "relationships". It only defines tables. Relationships are generally considered virtual or ad-hoc or "calculated". (A key is info about the entity, not really a "relationship", although that is probably a matter of semantics. In practice, indexes may speed up commonly-used relationships, but these should not affect the resulting content. Further, I am not sure relational by definition excludes tree-friendly operations. Some vendors have added tree-friendly operations. This is a controversial topic in relational. See RelationalAndTrees.

People seem to like trees. Maybe we have a bit of cortex specially for grocking such things. A UsefulLie perhaps? The problem is that trying to keep it conceptually simple may make for compromised implementations if trees are not really a perfect match for the problem space. In other words, they may be a UsefulLie for the users, but not the implementors. I have been in the middle of such battles. JustMakeItRight.

Question: Are "types" inherently bound to hierarchies? Is there such thing as set-based types? IOW, are types bound by IS-A, or can they also be HAS-A? Is this off-topic? -- AnonymousDonor

Not off-topic, IMHO. Start trying to define types, properties, classes, sets, in terms of each other and you get in a mess. But yes, this gets near to the area of philosophy where ontology and epistemology intersect, which is pretty interesting territory, and difficult; and also easy to argue that it has nothing to do with the history of programming ideas. But databases and file systems have to get organized somehow, and its usually hierarchical in some sense, and usually runs into some limitation or other ...

Usually hierarchical? Not databases, at least not this decade. And file systems don't have to be hierarchical. (FileSystemAlternatives). I agree that hierarchies are usually easier for end-users, and perhaps some programmers, to understand, but IMO they just do not scale well. It is a battle between (initial) comfort and accuracy IMO. I wish I could sell the concept of sets to more end users to escape the shackles of trees. But first developers.

Unfortunately there is a major legacy issue with trying to change. Floppy disks, usb drives, ISO cdroms - they all expect a hierarchical file system. What we need are better ways to treat file system trees as sets, not forcing file systems to be sets. - and most file systems will remain trees for a long time.

QwertySyndrome, perhaps.
Once you draw lines between the nodes in a tree, you start to have a network. Most trees are an abbreviation of a representation of reality that if elaborated, would look more like a SemanticNet. Keeping the "strongest" connections results in a taxonomy but there are usually many other relationships that could be made explicit. In development that is the purpose of ClassDiagrams etc., but for a UserInterface, MFC Library documentation, or textbook theories (i.e. Biology) trees are easier to deal with and remember.

I am not sure I agree with that. I have seen enough class diagrams that seem rather arbitrary. Perhaps trees are better than nothing, but if I have a computer I feel I should be able to use set-based queries to see them in situation-specific groupings. To me, that is closer to the ideal. Trees are just a consolation prize. This parallels the movement from the hierarchical Dewey Decimal system for book libraries to multi-index-based computer searches libraries use now. Trees were okay in the physical world, but now we can move on because we have the technology (cue Steve Austin background music).

A Network is a Graph, a Graph is a set {Nodes,Arcs}, where Nodes is a also set (of nodes) and Arcs a set of tuples, possibly with labels. So representing operating system objects as a network still lets you traverse that network using set-based tools. The question is what are we trying to represent? OS objects? An application? A form in an application? A general domain which many Applications reference (ie Biological objects for a system such as BlueGene?). As ObjectOrientedOperatingSystem s become more prevalent they will have an ObjectModel which has an implicit Package / ClassDiagram, whether arbitrary or not. Created by Microsoft, Sun or the OpenSource community (Linux,BSD) depending what you are using. Even today, all windows OS COM(+) and .NET objects, both those that make up the operating system and ones you build yourself, can be browsed with the ObjectBrowser, and various tools exist to view Java classes as ClassDiagrams though JavaDocs? (a TreeView) are more customary. Different ways of querying OS and/or application objects (whether from command line or a GUI front end) can be created for different purposes, recognizing this is the UserInterface layer. You can already make your own RecursiveDescent parser to do set-based queries on objects if that is your preference. Example of what I think you are looking for

  S = {files:FileSize?(files)>"10Meg"},print(S)
You could today make a C# program to parse that query, pass it to the FileSystemObject? (choosing the appropriate methods from its ObjectModel), and return the result in set notation.

Perhaps this belongs in FileSystemAlternatives, I would point out. This appears to be getting into another HolyWar between RelationalDatabase and OO (NavigationalDatabase) organization philosophy. At this point we can probably defer to an existing topic(s) on the subject.

[I am only using files as one example, since the person advocating set-based queries seems interested in that. I did not mention databases at all although that is covered by ObjectRelationalMapping. My point was that where the LimitsOfHierarchies is reached for representing things-in-the-world, networks, of which ClassDiagrams are one common tool used by programmers, take over. That is entirely in keeping with the opening thesis of the page: "hierarchies ... should not serve as the only organization mechanism"]

Perhaps CategorizationModels is more appropriate.

[DAGs (DirectedAcyclicalGraphs?) are already described on that page. I am not trying to describe what DirectedGraphs or ClassDiagrams are, but illustrate that they help dealing with the frustration of trying to shoehorn everything into hierarchies, which is the topic of discussion. My initial statement was that in drawing lines between nodes of trees (which is the impulse you get when trees don't fit), you end up with networks, which is ok. There are formalisms to deal with it. Someone disagreed, I elaborated. You don't have to believe in UML specifically or even ObjectOriented methods - most programmers use some form of CirclesBoxesAndArrows regardless of their "doctrine". Once the diagram loses its tree structure, you are talking about networks.]

I will agree that membership-based sets don't have very good visual representations (that I have seen). However, this does not limit their usefulness. I suppose we could convert sets into graphs, but the result for non-trivial things is often messy as a visual. That is why I tend to focus on query-ability rather than following pointers around: look at a specific aspect(s) rather than the whole structure.

Can you give an example of using set queries for a specific aspect of a system as you're describing?

Google's search engine, although it a less formal approach from the UI perspective. See the links in FileSystemAlternatives for more formal ones.

It looks interesting, but I think a wiki metaphor would satisfy both what you are looking for and those who like diagrams. In this wiki, you can search text as in google but also take a VisualTour in context. There is a difference between diagrammatic "mess" and complexity. I have worked in engineering (chemical processing and electrical power) as well as large software projects and have seen very complex schematics professionally rendered and partitioned up so that each stakeholder could follow the information they need.

I have not found VisualTour very useful so far. I can think of some kind of indented text (sort of an outline-style) could be just as useful. The indentation level would indicate "distance" from the current topic. One could sort and filter the list on different factors.
What Alternatives to Hierarchies Are There?

What alternative is there to a hierarchy? The only one that I can identify is to have to work with every discriminator simultaneously, whether that discriminator is applicable or not. With a hierarchy, one can address discriminators sequentially and ignore discriminators that may be applicable in one case but not in another.

When one does queries on say sets, you specify the criteria for the result. Example:

  where it belongs to set A or set D and not to set F
We don't have to mention every set. Other examples: This argument, however, also applies to hierarchies. One does not need to provide every possible categorization with trees, hence the "limit" is no longer valid. Trees exist in database tables as well; linkages between tables can only exist where they are explicitly defined. Linkages between dissimilar columns is possible, though meaningless.

Sometimes there is a key(s) or factor(s) that provides a unique result, and sometimes there is not. If a table has a 'parentID' column, clearly there is either zero or one parents (zero if record is already the top-most parent). One may have to deal with the potential of multiple results in some cases. Whether this choice is generally required by the problem domain or is a result of the usage of the wrong structure or paradigm needs further exploring, perhaps on a case-by-case basis. The issue of multiple results is also explored in SetsAndPolymorphism.

How does this relate to the limits of hierarchy? Returning 0, 1, or more record sets is the result of a specific operation and does not say anything about the general relationships.

Doesn't the definition of the keys in a particular table define a hierarchy? Don't the keys defined within a table create a tree?

Are you talking about implementation of indexes? Perhaps they do, but that is an implementation detail that the developer should not have to concern themselves with. In the future, B-trees may be replaced by a superior indexing technology, for example, and the database should still work with existing code.

Math Proof Elusive

For thoroughness, it perhaps should be pointed out that trees can represent anything that say sets represent. It is mostly a matter of TuringTarpit and perhaps OnceAndOnlyOnce issues. I cannot point to any mathematical theory that says "trees suck". My complaints are mostly based on experience. I did not start out in my career with the skepticism of trees I have now. It was "earned". I tried modeling stuff in trees and saw others' tree models that simply did not scale well. Trying to clean up file directory trees has embittered me also. -- top

Moved from WikiIsNotaForum

[JohnKugelman,] You seem to be assuming [in WikiIsNotaForum] that categorization means that a given page has to live in one particular spot of a single hierarchy. But there are better categorization schemes, like FacetedClassification, that allow things to belong to multiple overlapping hierarchies. For example, the JavaLanguage might belong to "Computers / Programming Languages" as well as "People / James Gosling".

That's why I said [Inflicting categories implies a ] "somewhat hierarchical" [structure] rather than "strictly hierarchical". As a mathematical graph, it's the difference between a tree and a DirectedAcyclicGraph (dag). But both structures still have the idea of a top, a root. Does Wiki have a root page? Sure, it has various StartingPoints, but those are just places for beginners to get started, not top-level pages.

Wiki is definitely not acyclic.

Wiki pages aren't organized in a hierarchy, but each page does impose a hierarchical ordering on its contents. A single thought can't belong to multiple sections or pages.

Hierarchies seem acceptable when there are up to roughly 100 to 300 "items". After this, they don't work very well. For example, long wiki topics often end up having to use a fair amount of PageAnchors, which are cross-branch pointers, a sign that the outline-type tree is failing to serve us properly. -- top

I think Bob Dylan says it all in the title of "You Gotta Serve Somebody". Everything can be viewed as a member of a hierarchy, starting with time or things (or a deity or two). When you feel you are taking a non-hierarchical view of something, you are just ignoring the assumed superset. When we build a room in a house, that is a hierarchy. When we call a fruit a fruit, the implicit superset is foods or trees or whatever. There is always a superset.

I don't disagree that everything can be thought of as a tree. But there are a lot of other EverythingIsa competitors. On the UsefulLie scale, I just don't rank it very high for most things. When humans use "type" in everyday speech, it is generally a temporary ad hoc classification useful only for the context of a given discussion. Example, "I don't like the type of people who don't say "hi" in the morning". Or, "Sorry, that is not quite the type of shirt I am shopping for today."

All those "EverythingIsa competitors" are a part of the type "CounterArgumentsToReality?". Except EverythingIsaThing?. Both EverythingIsaHierarchy? and EverythingIsaNetwork? are very valuable concepts.

Valuable relative to what? Are you suggesting that trees are a perfect or nearly perfect model for most things? That I would have to strongly disagree with, as already described above.

No. I am saying that any thing can be viewed as a member of a hierarchy. I hate rules like EverythingIsaHierarchy?, but I like concepts like EverythingIsaHierarchy?. They give choice - they suggest alternative views, which can resolve design conflict.

I am not quite sure what you mean.

. . . . .

I am a fellow "set" fan. I try to find clean hierarchies that fit the domain, but I just cannot. Exceptions to trees come along and ruin the simplicity of the original hierarchy. Throwing branch layers at the problem only makes it into a sprawling weed, often with duplicate children or at least parts of children. Sets seem more forgiving and don't require one to get it right up front. Until somebody can give specific techniques to find the right hierarchies that don't get beat up by unexpected variations down the road, I cannot bring myself to like them. The people who dictate all the goofy business rules don't seem to respect lasting tree-ness. They want what they want when they want it regardless of whether it troubles our dear trees. Sets do have the drawback of not being easy make a chart of, but I can learn to live with that. One just has to build up tools for set inspection and create nice set conventions. Sets can still emulate trees if by chance you do encounter a pure, lasting tree.
See what K Alan Drexler has to say on this:
The advantage of databases, although they don't make full use of set operations), is that they don't impose a world view on data. Any hierarchy instantly limits your world view to something subjective. Let's say I have set of data in tables showing each country and the official languages. Obviously they overlap since Switzerland has 4 official languages. If I put this into a hierarchy, it would impose me to choose country as the top or language. Problem is, for querying, the user may want their own view of the data (and your code may want a view). In some cases, you only want to know which countries speak English, and in others you want to know which languages a country speaks. This becomes worse with more tiers. Say you have a set of countries like the E.U. The more levels/relationships/sets, the more restrictive a hierarchy gets. Objects don't solve this problem either because they create hierarchies that force nouns to be superior over verbs. I.E. try adding a parameter to a method shared by 20 classes. That takes 20 changes instead of 1.

Unfortunately, we don't have the technology in languages or DBs to view and operate on sets. SQL/DataSets?/Classes/Arrays/Collections/etc are all crap. They should all include set operations and expressions instead of imposing proprietary restrictions.

Your OO example is really bad. OO languages like Smalltalk don't have 20 different implementations of a polymorphic method. And if they do then that's because they're needed and whining about it doesn't help. But the real problem is that you're trying to create some kind of analogy which doesn't even make sense. I can't figure out what "Objects create hierarchies that force nouns to be superior over verbs" is even supposed to mean.

And after thinking on it for a few minutes, I have a vague idea of what you might mean and I'm convinced you're simply wrong. Or rather, you would be wrong if your statement were interpreted in a neutral context instead of in the context where you're talking about the "advantage of databases", because in that context you'd be a fool. Databases "force nouns to be superior over verbs" infinitely more than OO ever does. In that context, OO is a clear step forward towards equality.

But ignoring the implicit comparison with databases, a much better example of OO not solving the multiple views issue is the CircleAndEllipseProblem. Which brings to mind that although there are OO approaches that don't impose a worldview on data, you won't find any of these in a typical Smalltalk image. You wouldn't because an image is organized for the benefit of the code that lives in it, so the view imposed on the objects is the one that benefits the code most. This is obvious and is why looking for viewpoint independence in an OO image is foolishness.

If that "image" is small and local, then why bother with the complexities of OO? -- top

[That's not a good question. By analogy: if the "solution" can be expressed in a small bit of SQL, then why bother with the complexities of relational? Perhaps it is the complexities of relational that keep the solution small. You should first question to which degree the language's OO nature influences whether the image is 'small and local'.]

Hierarchies As Views Of "Flat" Data

My perspective on using hierarchies to organize some data is this: a particular hierarchical model is a useful view into a set of data. But the hierarchy should not be the primary way of storing the data. If the data storage is separate from the hierarchical model, and the hierarchy references the data from that separate store, then you can have multiple hierarchical models (either trees or DAGs or whatever) looking at the same data.

The soda example:

In these examples type (flavor), caffeination and diet are properties of the data members. The data members are referred to in the tree views, but the real data is in the separate Drinks Data area. The views all "share" that data. The hierarchical views can be constructed by hand, or by an automatic query, or whatever. Each of (Diet Data, Caffeine View, Diet View, Types View) happen to be hierarchies, and so you could represent each of those in an XML file or whatever (similar to the bullet lists I wrote above). (But it would work if they were not strict trees, too.)

I guess this is similar in concept to NetworkDatabase or ObjectOrientedDatabase, except that the core data store (Drinks Data) is effectively a relational table of properties, so you can search that with the power and efficiency of relational databases.

-- ReedHedges

One could say that you want a tree-based report writer or browser that can project selected factors into a tree for exploration purposes. I see nothing wrong with such a concept because it is only a temporary view and is not imposing a tree as the native structure of the info. The "banded" kind of report writers can do this by allowing you to select which factor goes on which level. However, it often requires extra preprocessing in practice to get data into a form that the report writer likes. As far as what view the user wants, a tree of the above would be a bit awkward in my opinion. Some kind of QueryByExample with pull-down lists with an "any" option (or radio buttons) for each choice may be more appropriate.

  Flavor: [X]Any  [_]Cherry  [_]Cola ...
  Sweetener: [_]Any  [X]Sugar  [_]aspartame [_] Splenda
  Caffeine: [_]Any  [X]Yes   [_]No
  Bottling: [_]Any  [X]Cans  [_]Liter 
(Actually, check-boxes could be used instead of radio buttons for even more flexibility, because we are allowing multiple potential matches.)

See also: MutuallyExclusiveCategoriesDontScale, TaxonomyOfPatterns, CollectionHierarchies, HierarchicalDatabase, ThereAreNoTypes, FileSystemAlternatives, CompositionInsteadOfInheritance, RealWorldHierarchies, MultipleCategorizationPattern, ImperfectHierarchy, TreeUberAlles, CategorizationModels, AttributesInNameSmell, StaffingEconomicsVersusTheoreticalElegance
CategoryPolymorphism, CategoryHierarchy, CategoryClassification

EditText of this page (last edited January 28, 2014) or FindPage with title or text search