Cartesian Join

Every possible combination of the elements of two or more sets. For example, a Cartesian join of (A,B) and (1,2,3) is (A,1) (A,2) (A,3) (B,1) (B,2) (B,3). Try it with larger set sizes and you will quickly realize why it can be very unpleasant. Accidental usage has been known to bring systems to a crawl.

That is not every combination - it's taking exactly one element from each set (in every way possible).

SQL example:

   SELECT a.cust_id, b.product_id, b.product_name
     FROM order a, product b

Without a specification of how the join is to be done, the DB will blithely return, for every row in order, all of the rows in product. If you have 4,000 orders and 200 products you will get back 800,000 rows of nonesense.

This is usually an SQL CodeSmell, but there are occasions when this is exactly what you need (these solutions are found after overcoming SqlMyopia). For example, it can come in handy when you need to generate large volumes of test data for stress testing.

But it would be nice if it was switched off unless explicitly requested, such as having a CARTESIAN keyword. Things that are rarely used but costly should not be easily triggered by accidental and common typing omissions. It is like putting the Launch Nukes key right next to the Enter key.


See TemplateTables for an example


Shouldn't this page more properly be called CartesianProduct?

I guess CartesianProduct could be better, cartesian join is definitely a bad name, but the above select is not a cartesian product. Maybe CartesianSelect? ??

CartesianJoins are how CartesianProduct is implemented in SQL.

Wait a minute, how do you think you define cartesian product ? The thing you describe is not a cartesian product. The same way, a cartesian join is a nonsensical term. There is no such thing, unless you want to define it by making an abuse of language.

Ok, second iteration: CartesianJoins are SQL joins with no limiting condition, resulting in CartesianProduct.

Cartesian products are defined in set theory, which is the basis of relational algebra and relational calculus. A definition in context is available in: http://www.cs.odu.edu/~toida/nerzic/content/set/set_operations.html

Ok, but you forgot to mention that SQL has little to do with set theory (it just doesn't operate on sets). So you can safely say that sometimes you can generate a cartesian product by using a SELECT. Join already implies a join condition and cartesian join is hardly a good name. Anyway the naming issue is minor, the example is very nice indeed, but maybe it needs a definition or reference to the weighted moving average. The domain of this example is also very interesting because statistical analysis is not easy to do by combining relational operators (and much less the SQL constructs), and that's one of the reasons major databases come with specialized statistical modules to extend the SQL.

The correct "join" term for this construct is a cross join.

I found another use for CartesianJoins besides generating test data (TestDataGenerator). One can use it to create "zero records". For example, suppose you are making a report that has to have an entry for every department, even if that department does not have any data for a particular month. One can Cartesion join a virtual or dummy record with zero values to the department table, and then UNION that result with the regular query. After summing (grouping) by department, there will be an entry for every department. Missing actual data will just result in zeros.

Unless I've misunderstood your description, I believe you're describing what is usually known as a left outer join or a right outer join. If it involves non-matching values in both tables, then it's a full outer join. See http://en.wikipedia.org/wiki/Join_(SQL) Unfortunately, most if not all SQL implementations make it impossible to specify explicit values for the non-matching rows in an outer join, and therefore require explicit null-to-zero conversions and the like on the result of the join prior to performing aggregate operations. In some cases (such as not much data), that might make use of a CartesianProduct as described above more practical than an outer join. -- DaveVoorhis

No, it's not an outer join. Perhaps I should work up sample one of these days. It's not just "null filling"; it's more than that. However, I agree with your suggestion to be able to indicate what outer joins use as the default for missing items.


Common uses for joining to a table but having no inter-table constraining conditions in the "where" or "on" clauses: One can imagine valid business reasons for wanting a CartesianJoin result. It's uncommon, but possible.

Examples:

I want to join all the possible choices from one list with all the possible choices of another list -- to ensure that all possible combinations are tested or reported or "covered" by the processing. Like, I want to join a list of all airports to itself, to get a list of all possible source and destination airports (probably filtering out "source = destination" cases), to ensure that it's possible to find a route from every possible airport to every other possible airport.


See also: EveryCombinationInManyProgrammingLanguages

EditText of this page (last edited March 27, 2014) or FindPage with title or text search