Sql Myopia

"This GemStone stuff is really great, but how do I SELECT DISTINCT PERSON.NAME FROM PERSON WHERE SEX = "MALE" UNION SELECT ..."

One way might be something like this ...

 males := Person select: [ :each | each male].
 names := males collect: [ :each | each name].
 uniqueNames := names asSet.

SqlMyopia: what a wonderful phrase. But in the example above it seems to be taken to mean something more like "RelationalMyopia?". I'll talk about it from a completely different angle.

We were building a decision support system that allowed a limited form of dynamic query generation, and had a somewhat novel and powerful feature making use of SQL's "GROUP BY" clause. But we struggled with that feature -- both the interface to it and the internal design -- before a friend and I realized that we were thinking "inside the SQL box", so to speak. SQL is a weird implementation of relational concepts: mostly the relational algebra (but not quite the relational algebra) with a little bit of the relational calculus (but not quite the relational calculus) thrown in here and there. We were thinking in SQL, and SQL was bringing its own muddy set of concepts into our thinking.

We stepped back, grabbed our old textbooks to refresh our memories, and rethought our problem strictly in terms of the algebra first and then the calculus. Enlightenment followed.

When you're working with relational databases, SQL is the only query language in town. You have to use it to talk to the database. But you don't have to think in SQL, and if you're doing complicated stuff, you probably shouldn't. --GlennVanderburg

Glenn, I couldn't have said it better myself. That's pretty much what I wanted to get across with my original example.. -- StuCharlton

TemplateTables contains an example of calculating moving average in SQL that requires thinking otside the SQL box. See also SqlPattern --NikolaToshev

I very definitely do this. Mostly, I don't suppose I do it out of any well-reasoned rationale; it's just that there's a part of me that likes the conceptual purity of a heavily nested SELECT statement.

I work mostly on database-backed web systems, where most of what's important either goes into the DB or comes out of it at some point. So it often feels appropriate to nest complex set logic in a big SQL statement -- though I try to comment complex SQL liberally as well.

I think this idea might be vaguely related to data encapsulation. In my experience, if you use three queries where you could use one, you end up littering your immediate namespace with a lot of intermediate values that are only used to massage your multiple queries together into the final set you want. (Though I'm sure this is partly a consequence of which language you're using.) Passing it a big gnarly SQL statement gets around this.

On the other hand, a lot of perfectly intelligent, competent programmers have problems with SQL, so you run the risk of confusing them by leaving difficult SQL in your code for them to stumble over. I certainly wouldn't want to minimize this risk. -- francis

Well, it's a year later and my opinion of SQL has turned around 180 degrees. It happened when I was doing a reporting tool and had to come up with demographic breakdown percentages, and histograph slices, and standard deviations, across a really wide-ranging set of data. Any time I tried to do it in SQL it just resulted in spaghetti. But when I went to the effort of doing some ObjectRelationalMapping, then handling all the complex stuff in Java objects tailored to help with the calculations, my life was much easier. - francis
I thought about what I said above (two months previous) and have another point, too: If you have a solid understanding of the set theory behind SQL, it can make your work much more efficient.

To take a simple example, let's say I want to do something like "Get all the people who belong to group A and do not belong to group B." If I didn't know how to deal with subselects, the pseudo-code might look something like this:

 Get all the people who belong to group A
 Create an empty set as the final_set
 For each person in group A:
   If the person isn't in group B:
     Add the person to the final_set
   end if
 end for

... but if I want to this with SQL, the select statement might look like:

 select user_id
 from users
 where user_id in (select user_id from user_group_map where group_type = 'A')
 and user_id not in (select user_id from user_group_map where group_type = 'B')

... and to somebody who knows how to read SQL, this is a very clean mapping of the original idea to code. You could even say that the English-language requirement ("Get all the people who belong to group A and do not belong to group B") is already pseudo-code for SQL. This is because SQL as a language is defined around set theory. And it's quicker to write because it implicitly handles subselects for you. -- francis

The above SQL seems a bit smelly because the two inner Select's have a large degree of similarity. Maybe there is a way to define the set of groups per user in one place, and then use a correlated sub-query to query that same set with the "in" and "not-in" operations instead of redefining it for both operations. Ponder....

 SELECT user_id FROM user_group_map WHERE group_type = 'A'
 SELECT user_id FROM user_group_map WHERE group_type = 'B'

Ah, basic SetTheory to the rescue.

Coming from an engineering background, I have been forced to pick up some SQL along the way, but I didn't even realise there were such things as Relational Calculus or Relational Algebra. Where can I find out more about such things? --AndraeMuys

My favorite book to cover all of these topics is FundamentalsOfDatabaseSystems by Elmasri and Navathe. It covers the math, as well as SQL.

There's something that I don't understand. Does the SqlMyopia come from an OO point of view, or from a relational point of view ? From a relational point of viewm there IS a SqlMyopia, in the sense that SQL is an engineering compromise rather than an implementation of proper relational concepts. However, if criticism comes from an OO point of view (because I know some of Stu's ideas) I'd rather prefer SqlMyopia to OODBMS Myopia.


I'm not sure what you mean by the above, but I'm tempted to say 'it comes from neither.' SQL is difficult in part because it it comes out of a rather confused theoretical basis. If you ignore SQL and just try to express what you want to do in a relational algebra or relational calculus, and then later translate that into SQL, you can sometimes solve things much more easily.

I meant it from a SQL point of view, that many view data access concepts in terms of SQL. At the time I wrote the quote, I was knee-deep in writing an OODBMS-based system, so that was my motivation then. Now, I feel that certainly there is a just-as-bad ObjectMyopia? that only sees data as an in-memory pointer graph. The OODBMS paradigm appropriate in some cases (i.e. you'd like to roll-your-own DBMS) -- it's just that one should be able to "shift perspectives" from the object (in-process / physical) level to a relational (logical) level. --StuCharlton

When you're working with relational databases, SQL is the only query language in town.

Not actually. SQL is the only popular data sublanguage, but it is not even relational. The only relational languages are Berkeley Ingres' QUEL, IBM BS12 (both obsolete), EFCodd's Alpha, ChrisDate and HughDarwen's TutorialD {TutorialDee} (both never implemented), and Alphora Dataphor's D4 (.Net only).

I didn't even realise there were such things as Relational Calculus or Relational Algebra. Where can I find out more about such things?

The classic is AnIntroductionToDatabaseSystems by ChrisDate. See RelationalLanguage for other proposed alternatives.

Related: NewQueryLanguagePressureLevel

See also: StructuredQueryLanguage, SqlFlaws, RelationalLanguage


View edit of December 5, 2011 or FindPage with title or text search