# Relational Algebra

See SetTheory. RelationalAlgebra is to SetTheory as algebra is to NumberTheory.

You mean as number theory is to algebra, or as algebra is to arithmetic, right? Number theory is just about impossible without algebra, and the deeper you go the more you need.
That's not an introduction or explanation, that's an analogy. Can you point at a serious reference?

SetTheory is as serious a reference as I know of on this wiki (but see http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?relational+algebra) and is the natural starting point for an enquiry of this nature, since RelationalAlgebra is as meaningless without a basic understanding of SetTheory as algebra is without a basic understanding of NumberTheory. It also contains some examples of modified RelationalAlgebra.

I guess I'm looking for something that comes from it that's (a) useful and (b) non-obvious. I've never seen any examples of that so I'm left wondering why the huge fuss is made of it. Coming from a formal mathematics background it seems a bit light-weight to me.

That it is no less light-weight than the RelationalCalculus is proven by the correctness of CoddsReductionAlgorithm?. Whether the algorithm is correct and the two are isomorphic, I couldn't say, but it has always been my working hypothesis. Whether anything useful and non-obvious has ever come from the Relational Calculus (or SetTheory... or CategoryTheory...) is another imponderable.

Set theory has given non-obvious results - whether useful or not depends on what you are trying to do. Category theory is more like a refactoring of mathematics, a way to discuss different things in the same terms.

Relational Algebra has given useful results* - whether non-obvious or not depends on who you are and when. Category theory is more like a refactoring of set theory than of mathematics, since, by GoedelsTheorem, mathematics cannot be wholly refactored except into a system that allows contradictory propositions.

* RelationalDataBaseManagementSystems and StructuredQueryLanguage
DateAndDarwen, in their book Foundation for Future Database Systems: TheThirdManifesto, describe a drastically simplified relational algebra which is very useful for reasoning about relational operations. They call their algebra 'A', short for 'ALGEBRA', which in turn stands for A Logical Genesis Explains Basic Relational Algebra. Cute. Now there's a back-asswards BackroNym if I've ever heard one. Almost as bad as what the D in TutorialDee stands for. :)

The algebra emphasizes relational theory's roots in predicate logic rather than set theory. (Many practitioners tend to reason about relational operations in terms of sets, which can lead to confusion. But not as much confusion as SQL causes, since SQL's tolerance for duplicate rows implies that it works with bags rather than sets.) SmalltalkCulturalAssumption? here?

• What SmalltalkCulturalAssumption? is that?
• Pure set programming can be done if you don't like bags; one has to be a real purist to get up in arms about the fact that bags are allowed; they're quite handy in many areas of programming, not just with databases. {Date shows in 'An Introduction to Database Systems' how going from set to bag operations can lead to a loss of data integrity, which is something to get up in arms about :) }
• We're not talking about general programming here, but about RelationalAlgebra. Bags don't make sense in this context. Each tuple represents a fact; stating a fact twice is of no use, and leads to a different set of characteristics for operations in the algebra than those which are assumed by the relational model.

Some notation conventions used below:
• All placeholder letters are lower-case.
• Placeholder letters from the start of the alphabet represent relations.
• Placeholder letters from the end of the alphabet represent attribute names.
• Placeholder letters from the end of the alphabet in parentheses represent a set of attributes.
• H(a) means "heading of relation a", which is a set of attributes names and their associated types. This is useful for expressions involving meta-data.
• (x) - (y) : Asymmetric set difference; result is set of attributes from (x) that don't appear in (y).
• (x) + (y) : Set union; result is set of attributes that appear in either (x) or (y))
• (x) * (y) : Set intersection; result is set of attributes that appear in both (x) and (y))
• {x, y = expr} : Represents an "operator relation", where expr uses ad-hoc notation to describe how y is computed from x. The tuples of an operator relation are all of those for which the operator is defined, which is a potentially infinite set. Attributes without defining expressions are inputs, and typically listed first. Attributes with defining expressions are outputs. There can be multiple inputs and/or outputs. Such operator relations allow extension to be treated as an AND of relations. The idea of operators as relations is fundamental to the algebra; this notation is a local convention.

Primitive operators:
• a AND b : Natural join. This is an inner join on attributes with the same name in each relation. If there are no attributes that match by name, it devolves to a Cartesian join.
• a OR b : An extended form of union; if the headings of the operands differ, then "missing" attributes take on all possible values. Thus the result may be very large or even infinite. When the operands have the same heading, then this is the same as a traditional SQL UNION, except that all duplicates are always removed.
• NOT a : Produces a relation containing all tuples with a's heading that could possibly exist, but are not in a -- typically a very large or infinite result.
• a REMOVE (x) : This is the projection of a over all attributes except those in the attribute set (x). The resulting relation has heading (H(a) - (x)).

Note that some expressions can not be practically computed; the relational algebra is thus more expressive than a query language based on it can be. In particular, (NOT a) cannot, in general, be computed in isolation. Likewise, (a OR b) cannot usually be computed unless (H(a) = H(b)).

• a RENAME (x, y) : Renames an attribute; equiv. to (a AND {x, y = x} REMOVE x))
• a COMPOSE b : Equivalent to ((a AND b) REMOVE (H(a)-H(b)) + (H(b) - H(a))).

Here are some useful transformations involving the primitive operators. Analogies to Boolean algebra are obvious in some places:
• associativity
• (a AND b) AND c = a AND (b AND c)
• (a OR b) OR c = a OR (b OR c)
• commutativity
• a AND b = b AND a
• a OR b = b OR a
• idempotence
• a AND a = a
• a OR a = a
• (a REMOVE (x)) AND a = a
• distributivity
• (a OR b) AND c = ((a AND c) OR (b AND c))
• (a AND b) OR c = ((a OR c) AND (b OR c)) ?
• (a REMOVE (x)) REMOVE (y) = a REMOVE (y) if (y) >= (x)
• (a OR b) REMOVE (x) = (a REMOVE (x)) OR (b REMOVE (x))
• (a AND b) REMOVE (x) = ((a REMOVE (x)-H(b)) AND (b REMOVE (x)-H(a)) REMOVE (x)*H(a)*H(b) ?
• (a REMOVE (x)) AND (b REMOVE (y)) = (a AND b) REMOVE (x)+(y) iff (x)*(y)*H(a)*H(b) empty

Many of their operations are not very user-friendly in my opinion. Sure, one can get used to them over time, but I think a more "natural" set would be appropriate. For example, we use AND, OR, and NOT as our "primary" Boolean operators because they reflect spoken language, but in theory we only need NAND. -- top

The primitive set of operations given above is, of course, not the only possible set to start from. However, I find AND, OR, NOT, and REMOVE all very easy to understand, and easily related to SQL. I'm curious which you find to be "unfriendly" -- the analogies to spoken language hold here as well as they do in Boolean algebra, up to a point. REMOVE (projection) doesn't map quite so easily, but it's essential to relational algebra in some form.

• D&Ds AND and OR (aka outer union) operations don't meet all the lattice axioms (hint: absorption law). It's not even a lattice, let alone Boolean Algebra. Therefore, there is absolutely no basis for NAND operation to speak of. -- Vadim Tropashko'

The result of NAND on two finite relations would, in general, not be practically computable, so I don't think it would be a very useful choice -- despite the similarities, this isn't quite the same as Boolean algebra. (Although in fairness, computing OR is similarly impractical in the general case.) -- DanMuller

Example?

Of? The non-computability of NAND? Note that NAND is equivalent to NOT AND. Note that the complement (NOT) of a relation a is a relation with all possible tuples not in a. If the types of the attributes of a relation all have finite bounds, the result of NOT a will be finite (but possibly very large). If any of the types are infinite (for instance a string without a length constraint), then the result will have infinitely many tuples.

• It is clear from the above that some of the operators may result in sets which are infinite. Are all sets generated by the above operators RecursivelyEnumerable? Infinite sets can be handled by LazyEvaluation, should you ever need to enumerate one; but non-RecursivelyEnumerable sets cannot. Given that many non-RecursivelyEnumerable languages can be specified as the inverse (over the alphabet in question) of an infinite but RecursivelyEnumerable language over the same alphabet--similar to what NOT does above--this is an important question to answer.

• I think that in general, whether a result is RecursivelyEnumerable would depend on the types of the attributes. Since NOT of a includes all tuples with the same heading as a that can be validly formed and which are not part of a, and since D&D specify attribute types in terms of sets of values, a result would only be RecursivelyEnumerable if the types of the attributes of a are RecursivelyEnumerable. In practice, it is easy to come up with a very useful set of more restrictive query operators that do not generate such large results. Specifically, UNION is a computation-friendly form of OR which requires that its operands have the same heading, and NOT is simply omitted from the operators of your query language. The relational NOT operator is still useful in describing the definition of some operators. For instance, the relational operation "a MINUS b" can be defined as "a AND NOT b", where a and b have the same heading. -- DanMuller

• Recursive enumerability is known to be closed under (set) union and intersection--the union/intersection of two recursively enumerable sets is R.E; however it is not closed under the complement (NOT) operation. (Though the complement of a finite set, even if infinite, is fully decideable--a stronger constraint than being recursively enumerable).

• Relational AND and OR do not correspond directly to set intersection and union, so I'm not quite following where you're going with this. -- DM

The above discussion seems to involve some issues that were dealt with in the early 20th century in the context of the RussellParadox, and which resulted in the slogan "there is no universe (of discourse)". One stays out of trouble if and only if one insists that NOT {set A} is defined only for the case where A is a subset of B, in which case NOT A is defined to be B minus A.

Unconstrained approaches were demonstrated to allow the RussellParadox. -- DougMerritt

[...aha, I should have realized there are related pages here: SetOfAllSets UniversalSet NewFoundations]

Note that the "base" operations and the operations actually used in a language do not necessarily have to be the same, as long as the derived operations can be defined in terms of the base operations. Thus, perhaps a more "user friendly" set of operations can be derived. Also, a set of hierarchical traversal operations would be nice, but were not in the original set. The result is still a result set (table). An analogy would be AND and OR from BooleanLogic. In theory we only need NAND (NOT AND), but create the others from combinations of NAND to fit spoken language in order to make it easier for humans to relate to. I have been working on a set of "human friendly" relational operators, but am not satisfied with it enough to "publish" it yet. -- top

Contributors: DanMuller