No Strings

I have a fundamental distaste for passing Strings around as the main interface to a OO system. For example: for SQL in java, you are essentially creating glorified strings. This is why people make object relational mapping tools. But you still have to drop to string type objects to query the mapped objects. (I'm intentionally disregarding writing an algorithm to do the equivalent queries... it's not really a database at this point, just fancy serialization)

In general, how one deals with databases is by passing strings. You make an SQL command. And put it in a string. You might even encapsulate that type of string in a class... But you're still dealing with glorified strings. Ditto for the object database querying schemes I've seen so far. You make a String, or an object with string like properties to encapsulate your query. But the query doesn't really know what it's talking about, there's no way the compiler could infer that this query here is related to the method on this object over there.

This is why we seem to have a divide between SQL like query languages, and the (anti-)pattern of having accessors for any query the object should support. SQL people like theirs because it allows arbitrary queries of sets and relationships. OO guys like theirs because it gives them type safety, and the ability to transparently query their objects. Below is what I hope to be the middle ground: the type safety and defined interfaces of query methods, with the arbitrary selection and joins of relational.

The background code to support this is not shown below... I'm of the opinion that it shouldn't matter to the app anymore than the implementation of any other database. Not that it's trivial, or impossible, simply that it's an implementation that is subject to change. (I have written a working spike which would support the below, and joins between arbitrary lists, all possible thanks to proxies + runtime reflection)


Something like SodaQuery? might be the solution you're looking for: http://sodaquery.sourceforge.net/

The reason why I'm not liking strings seems to be the only area not addressed by SodaQuery?: Referring to the contents of an object by a hardcoded string. Looking through their examples, I see references such as query.select("salary"), referring to an instance member 'salary' in an employee. If you change the name of that member, the compiler will miss the implicit reference of (String) "salary", causing a run-time error as opposed to a compile-time error.

	 : I think what he'd prefer is something like the TypeSafeJdbcWrapper.

Might be a decent target for Template Metaprogramming (or Metaprogramming of some other sort). You basically need combinatorial queries specialized and checked against a relational schema (e.g. 'sql::Query<sql::Select<table::Employees,field::Salary>,MyRelationalSchemaDescriptor>', where the relational schema descriptor carries all significant information on both the tables and the headers/types in each table.) This is certainly doable... even in C++, though C++ templates aren't designed for it.

Oh, I see now. One way to decrease the fear of run time errors is automated UnitTesting. If someone accidentally changes the value of a String your UnitTests should detect that before the code can be checked in. Type checking is fine, but unit testing is super-fine.

Why are method calls easier to modify than strings? What do you mean by "updated automatically"?

An editor, processer, compiler, anything which knows java syntax will understand that a method call is indeed a method call. A String gives them no such advantage. I rather enjoy letting an algorithm do my repetitive work for me. Strings complicate this unnecessarily. They have their uses, but I like to avoid them when possible.

How does that make method calls easier to modify than strings? IDEs and compilers can check the syntax of statically typed languages, so they will complain about typos on method names, but they won't automatically update your code. And unit tests can check the syntax of your strings and complain about typos in them. What's the real advantage of methods over strings?

Ides can changes all references to a method name when I change the first. They can't do that with a string representing the same method.
 class Blah{
   private String toStringMethod = "toString";

public String toString(); }
Now, I point my editor at 'public String toString()' and change it to 'public String returnLocalizedMessage()'. Does the compiler have much chance of getting this right?
 class Blah{
   private Closure toStringMethod = new Closure(){ Object perform(){ return toString(); } };

public String toString(); }
...for instance, doesn't suffer from this problem.

I know where you're coming from. I was taught that view at college, when one of the goals was to make the language responsible for detecting problems. My first instructor bragged about how close they were in the research department to making a computer language that didn't allow you to write bugs. That was 20 years ago. Now I just test anything that can break and I don't need all that language enforcement. If you test anything that can break, accidental changes to strings will be detected by them. Life is easier that way.

I think I'm still not quite coming across... it's not that I don't test these things, I use Strings all the time for some things. I don't like how I can't talk about java's first class elements without using a String though, because then I can't be lazy about changing a name - you can no longer write a generalize algorithm to do a generic boring task. And I think that smells.

To be honest, I regret starting this page the way I did, because I'm not against using Strings. I'm against using them as a substitute for good language decisions. It's a OnceAndOnlyOnce thing.

Maybe it's just me and the domains I've worked in. I don't worry about using plain old SQL queries when I can get away with it. I don't miss the ability to rename methods when some of the behavior is in SQL strings. There's usually one place that talks to the database and one instance of each term that might be a method name. I think OnceAndOnlyOnce minimizes the benefit of method rename refactoring, and even minimizes the need for compile time type checking.
 {
	AnObject anObject = ...;
	AList aResult = DataBase.query(anObject, "toString()");
 }
This is bad. Really bad. "toString()" doesn't have any semantic meaning to the compiler. I want to write this instead:
 { 
	AnObject anObject = ...;
	AList aResult = ((Proxy) ((AnObject) DataBase.query(anObject)).toString()).$getList();
 }
A little bit longer, true, but now the compiler/runtime knows what I'm doing. We can refactor using language aware tools without fear that they don't understand the quoting of methods. And it isn't all bad... consider parameters:
 {
	AnObject aParameter = ...;
	AnObject anotherParameter = ...;
	AnObject anObject = ...;
	AList aResult = DataBase.query(anObject, "toString()", new Object[]{aParameter, anotherParameter});
 }
Consider the possibilities: runtime class cast exceptions cannot be ruled out by the compiler!
 { 
	AnObject aParameter = ...;
	AnObject anotherParameter = ...;
	AnObject anObject = ...;
	AList aResult = ((Proxy) ((AnObject) DataBase.query(anObject)).toString(aParameter, anotherParameter)).$getList();
 }
Still a bit longer, but we get compile time checking for most parameters now.

We start gaining ground once we have more complex queries though... because all calls to a Proxy will return an appropriately typed Proxy... we only have to convert back to a list when we want the results. And we can cache the Proxies and reuse them... the setup is a one time thing for the most part.

Note: I have nothing against language which allow you to directly talk about their parts. I wish java did understand during compile/runtime that "toString()" will be referencing that particular method. But it doesn't.

-- WilliamUnderwood

p.s., please be exceedingly rude in pointing out flaws in my train of thought... altough I do have a working version of this, it was written after a good nights sleep. I haven't had one of those in a few days now (forget the 40 hour work week... I need to get away from the 40 hour work day first!).

p.p.s, for those looking at the wiki source, please leave the things like A"""List... I know they're not strictly necessary, but it's one less thing to go wrong when somebody changes it to MySpecialList? (see the problem?)
Not that it's the best solution for everything... but this, combined with a couple other techniques, essentially gives you the benefits of relational and navigational databases

ReinventingTheDatabaseInApplication

NavigationalDatabases have benefits? :-) -- A RelationalWeenie

I find having an operation such as 'rename' with the only aim being avoiding collisions somewhat sketchy. Navigational implies scoping, which is IMO more elegant that renaming a column to avoid a collision. The only case in relational theory where it's actually necessary is when we're referring to a product of a table against itself. The only difference between the two tables in this case is the route in which they got there, hence a rename operation to describe that route, hence a navigational aspect which necessary for the relational model to work. -- WilliamUnderwood (relational object algebra orientated weenie)


[deleteme if satisfied]

A little bit longer, true, but now the compiler/runtime knows what I'm doing.

I don't know what you're doing. What are you doing?

lol... This is why I should get some sleep... context is a good thing, isn't it?

Maybe I need some sleep. What is the code trying to do? It isn't obvious to me from the names of the methods and variables. Why do you pass "toString()" as the 2nd parameter to DataBase.query() in the first example? What is anObject? How does it relate to querying the database?

Oops (again)... toString() means I want to talk about the toString() method of anObject. The advantage is that the vm now understands that. Really, this is a partially specified, incompletely written, horrible rewrite of what I wrote in the ManyToManyChallenge. I've been meaning to rewrite that in it's own page, complete with implementation details, etc... this was a first attempt, failing because one can't write coherent sentences when one hasn't slept in three days :)

[/deleteme]


In the Unix world, strings are often what make protocols machine-independent and language-independent. The success of HTTP is partly due to this. TheUnixPhilosophy (ISBN 1555581234 ) talks about this. Although I am not a Unix fanatic, I generally agree with that philosophy. Strings equal easier sharing.

I don't care about sharing. I want typed java code that can be easily refactored, and many errors can be caught early through syntax hilighting and compiling.

You seem to wish/hope that the whole world is Java to avoid translating between type systems, paradigms, languages, etc. Not a very realistic hope in many domains. I hardly think that Java is the pinnacle of languages anyhow.

It doesn't need to be. Something simple and elegant can be ported to many languages, like xUnit or ThePrevayler. Planning ahead for data sharing is classic YouArentGonnaNeedIt. Java certainly is not the pinnacle but that's not the point. I would like an elegant, typed query mechanism for <Insert language here>. I shouldn't have to put things in strings simply because strings are the LowestCommonDenominator.

But LeastFlexibleProtocolWins. so you have to use strings for inter-program communication if the programs are incompatible. Of course, it's better to have a common server that lets you share binary data, but many programs just won't use it. If you use text anyway, you'd better use some structure-preserving convention, like EssExpressions or XML [ExtensibleMarkupLanguage].

I'm assuming inter-language communication isn't a requirement. Certainly no project should plan with that in mind if it's not a requirement based on concrete need. I think this was one of the author's (implicit) assumptions as well.

Under YagniAndDatabases things are not that simple. -- AnonymousDonor

That page is a great guide. If you have a requirement that multiple applications in different languages need to access the same data using ad-hoc queries, a db with string-based queries is obviously needed. However there are many more applications that needlessly waste developer time with the difficulty of logic disappearing into strings than ones that end up needing that flexibility and must consequently be changed.


"The string is a stark data structure and everywhere it is passed there is much duplication of process. It is a perfect vehicle for hiding information." -- AlanPerlis


Isn't using a string in this way somewhat analogous to working in a dynamically-typed language?

To use a brutally simple example of a getter in Java:

   public MyType getMyType( String theNameOfTheThingToGet );

Let's say for there are (currently) four possible String's values to pass in: "Foo", "Bar", "Fubar", and "Snafu" This give us four "effective method names":
   public MyType getMyType-Foo();
   public MyType getMyType-Bar();
   public MyType getMyType-Fubar();
   public MyType getMyType-Snafu();

That could be implemented as four separate "real" methods:
   public MyType getFoo();
   public MyType getBar();
   public MyType getFubar();
   public MyType getSnafu();

If fact, the latter would even allow different return types for each method. At this time, the code-struture in Java and something dynamically-typed like Smalltalk or Python would look pretty much the same.

What if we now need to add a new "name-of-something" - say, "Ugh"? In a statically-typed language, we need to expand the interface and add a new method:
   public MyType getUgh();
Then we need to go and implement it in all implementor classes. But how do we handle the case where one of those classes doesn't give a hoot about "Ugh"? Not pretty (ex. use a base class that implements every method in the interface in some "default" way, and derive every concrete class from that).

In a dynamically-typed environment we just implement an additional method on the concrete class(es) that care, and don't worry about it in any place where doesn't (and, of course, unit-test all our modified classes to make sure they respond correctly to the new method).

Similarly, in our original interface (remember, the one with a String parameter?), we could do exactly the same thing in Java as our dynamic friends; have the classes (or even the instances! ust like adding a method to an instance in Python) that care respond to the new "effective method name": getMyType-Ugh()

Granted, there are perhaps better types to use for the "theNameOfTheThingToGet" (for example, a TypeSafeEnum? instead of a String), but I think the basic analogy is fair - or not?

-- GeoffSobering

This problem was solved many years ago in the Lisp world by using symbols and multiple namespaces.

It was also solved in SmallTalk, Python, and a host of other dynamically-typed languages, too (more recently, for sure).

My point is that using "magic strings" as a method-signature extender can be thought of as a mechanism (poor as it may be) to gain some of the benefit of dynamic-binding in a statically-bound language. Obviously, if your problem-domain is completely (or mostly) composed of entities that don't map cleanly to static-types, you probably should choose a dynamically-typed language from the start. However, if your domain has only a small number of dynamic elements mixed into a largely static set of entities (or if you're trapped in a static language for non-design reasons), then the above technique may be helpful. -- GeoffSobering

The best dynamic languages pretty much treat everything as a string. Hybrid-typing systems like PHP are too problematic IMO. The drawback is that comparison operators need to be clear about whether they are comparing as string or numbers, but this is good IMO. You don't have to back up to look to find their type to know. IOW, I don't like polymorphic comparisons, at least not in scripting languages.

No they don't. The best dynamic languages have types, SmallTalk, Lisp, Python, Ruby, all have types, they don't treat everything as a string. Dynamic Typing simply means the types are checked at runtime, rather than compile time. Scripting languages are the kings of polymorphism, they do it better, and make use of it more often than most static languages, so your point was what?

I think you misread the above paragraph. It said the "best" languages, not necessarily all dynamic languages. It was a value judgement, not an assessment of all dynamic languages.

As a value judgement, it's not very helpful unless you give us some reason why you think they're the best. Offhand, I don't think I've ever used a single scripting language that used a string for everything. Maybe TCL. But Perl, PHP, Python, Ruby, Scheme, Lisp, Smalltalk, Self, and all their variants all have types. -- JonathanTang

Python fakes it well, but all member-lookups in Python are really just string-based hashtable access. foo.bar() is really foo.__type__.__dict__["bar"](foo) in Python. You can even override getattr and setattr so that ["bar"] is doing something else.

The original scripting language: shell.

Many of the arguments here seem to mirror the dynamic-versus-static-typing HolyWar. I wonder if dynamic language fans tend to have different views on this than static-typing fans.


There is a simple trick that can sometimes be used to address some of the problems of using strings in lieu of identifiers known to the language. It will probably work in many languages; I've used it in C++, exactly for the sort of problems mentioned on this page.

In your database API (you do have a DatabaseAbstractionLayer, don't you?) you simply avoid using unadorned strings as arguments to identify things like attribute (field) names or relation (table) names. Instead, you use small classes to wrap around the strings. Thus, query.select("salary") becomes for instance query.select(attr("salary")), where "attr" is such a wrapper class. Now any competent editor can search for a regular expression "\battr\s*\(\s*"salary"\s*\)" (expressed as a Perl-style regexp here), and even replace the name for you. The mechanism can easily be subverted, of course, by using temporary string variables and such, but if programmers can be trained to "keep it simple" and pass only literal string constants to attr(), this can be fairly effective. -- DM

I don't buy it. I think cwillu has some very good points, and that the way to go is to use e.g. query.select.salary (or some similar but more workable syntax), avoiding strings altogether. Anything that uses strings will not get checked by the compiler, yet compiler checks are very useful. Clever tests, editors, refactoring tools, self-discipline, etc etc etc are all good things, but are orthogonal to compiler checks. -- dm

Note that some of us prefer dynamic (interpreted) languages, so compile-time checking is not something we go out of our way for.

That's a nitpick; the rest of the page is phrased in terms of a Java or C++ kind of language. It's still a good thing for the language implementation to tell you of glaring errors such as a call to a non-existent function as early as possible. If you use a string for the DB query, that guarantees that the error checking will happen as late as possible: only once the query is performed.

If instead you define the query in terms of language symbols, then an error that catches a non-existent query-related symbol will happen as early as possible in that particular language. This is true regardless of the language, without getting us bogged down in an unnecessary tangential discussion of just how early that is in various different languages.

Amen. Having written this kind of database crap in C++, I'll tell you what I did to avoid strings.

Don't ever pass a string literal to identify a table, column, or record. Use a symbol instead.

schema.h:
 const char *kSalary = "salary";
 // etc.
Later:
 int salary = record.getMonetaryField(kSalary);
Also, database abstractions like HiberNate have alternate ways to specify queries. QueryByExample for instance. Also by composing query criteria objects. Seems a lot less error prone than concatenating a gnarly SQL query string.

-- IanOsgood

I haven't had many problems with frameworks that don't completely hide the SQL. I find it gives me the best of both worlds: automating the repetitous parts of SQL, but still allowing one to use a somewhat natural form of RelationalAlgebra instead of the verbose, hard-to-read "new-dot-new" pattern described in ExpressionApiComplaints. Plus, the frameworks/utils don't have to completely cover everything SQL syntax does (if even possible) because they are not trying to be a total wrapper, but merely helpers. This probably reduces their size by a small fraction of a pure wrapper. Plus, I often test direct SQL anyhow. If I had to use new-dot-new, I'd have to rewrite it. And, tools like HyberNate? have their own learning curves and gotcha's.

(EditHint: perhaps the above should be moved to an SQL-wrapper-specific topic.)


One problem with strings is that they turn applications into silos which are hard to integrate. One advantage with strings is that they are more human-natural for those with low InductiveReasoningAptitude (me) -- JonGrover

---

See TypeSafeJdbcWrapper, HelpersInsteadOfWrappers, ExpressionApiComplaints, PowerOfPlainText, CrossToolTypeAndObjectSharing
CategoryWrapping


EditText of this page (last edited July 6, 2009) or FindPage with title or text search