There are cases where StructuredQueryLanguage
is possibly overused, meaning that another tool could do some or all of the job simpler. Note that this does not necessarily imply that relational theory is limited, but rather is a focus on SqlFlaws
. Maybe another RelationalLanguage
could handle it better, for example.
- Sub-totalling - Often reports have sub-totals by a group. I have seen cases where SQL was used to provide the subtotals in the result data. It was ugly SQL code with a lot of duplication (OnceAndOnlyOnce violations), such as nested sub-queries with the same filtering criteria repeated over and over for each nested SELECT statement. It is usually better to handle such in the application, and some report-writing tools can do it with just a few mouse-strokes.
- On the other hand, in the absence of such report-writing tools, it may be worth doing what some report-writing tools do internally -- generate SQL code via more convenient means. It might be warranted to develop a reporting API that generates SQL, or even a reporting language with a compiler that generates SQL. Sometimes it may be beneficial to handle grouping and summarizing in the application, but other times it may mean unacceptable complexity or bandwidth costs -- especially if the raw data set is very large and the report will present a summary, or if the report is complex. As with many database issues, there may not be a general principle here that can be applied universally; it is best to make such decisions on a case-by-case basis.
- I generally subscribe to CodeGenerationIsaDesignSmell, but realize that in the real world we sometimes have to pull ugly tricks. And, subtotalling is not that hard to implement in a regular programming language; it is a common, well-documented pattern (search "control break"). The advantage is that if you add new columns to the report, you usually don't have to change the sub-totalling code. However, if you use nested, repeating queries, you have to propagate the change through the multiple nested SELECT statements. Perhaps if SQL did it in a more strait-forward way, it would not have left such a bad taste in my mouth.
Signs that you are overusing SQL:
- You have a "split-bread sandwich" query such that an outer query starts on one page and ends on another. (ThickBreadSmell)
- You see a lot of repetition of a pattern in the query. For example, having to repeat the same IN list multiple times.
- Lots of CASE WHEN statements.
- It takes you a hour to find a missing parenthesis.
- You spend a lot of time working around "I/O threshold exceed" errors, meaning the query is too complex to execute efficiently.
- You have to repeat the same column names over and over again.
- You wear out your pencil drawing lines and arrows to match and track stuff.
- You have to duplicate the same or similar column or criteria list over and over.
One of my bigger complaints about large SQL statements is that it is hard to inspect the intermediate results, something I miss from my ExBase
days. There are few or no debuggers that let you see the results of intermediate queries, for example, at least not with a lot of copy and paste. SQL as implemented lacks human-interactive DivideAndConquer
. Perhaps this is a drawback of declarative techniques. But it would be fairly easy to add intermediate inspection to something like SMEQL/TQL (SmeQl
) because one can optionally use named references instead of nesting to combine queries. (Inspection may slow down processing because the optimizer may have to create an actual intermediate table when inspection is requested instead of apply shortcuts to possibly combine steps.)
TutorialDee supports this via the WITH construct (see page 45 in TheThirdManifesto 3rd edition for an example), which introduces names to reference subexpressions. Internally, it can be implemented as syntactic sugar, and does not inhibit optimisation.
All of these MisuseOfSql
cases seem to be issues of code clarity and refactoring. Massive duplication? Parametrize into procs. Redundant subqueries? Pre-select into a temp-table.
Peculiar that there's no "Misuse Of" Cee, Perl, Smalltalk, Erlang, Ruby, Lisp or Scala pages, no?
"Parametrize [sic] into procs" often has a significant negative impact on automated optimisation and therefore usable performance; it rarely does in the general-purpose programming languages you've listed. Pre-selecting into a temp-table is unpleasantly complex, bypasses certain automated optimisations, may introduce concurrency issues, and may significantly degrade performance.
There are no "Misuse Of" pages for C, Perl, Erlang etc., because bad programming in general-purpose languages is, for the most part, obvious and well-understood amongst professional programmers. We don't need a page to know that, for example, monolithic programs should be avoided in favour of use of various code modularisation techniques. MisuseOfSql (plus SqlFlaws), though, is somewhat less understood.
See also: ReportsSmell