The Problem With Sigils

Sigils (such as PerlLanguage's $scalar @array %hash, RubyLanguage's @member, etc) encode type or scope information in variable names, and therefore remind some people of HungarianNotation. However, one of the big complaints about HungarianNotation is that, if one changes the type, then one must also (global search and replace) change the name, a complaint that doesn't apply to sigils, since the name and the type are one and the same.

Sigils in Ruby and Perl are not just hints for the compiler; remove them, and the program no longer works the same as before (or maybe at all). In this sense, they're not the same as HungarianNotation.

Even with HungarianNotation, you can break a program by removing the warts - consider 2 variables in the same scope with the same root name, but different type (e.g. pi_count and i_count). Remove the warts, and the names are the same.

Some people wonder why any language uses such a thing, since other languages have proven it is not strictly necessary to do so. It's just a way of encoding type information, and yes, there are other ways to do it in other languages, but the advantages and disadvantages of each approach are mostly a matter of taste or perhaps ergonomics, as with many language design choices.

This method of encoding type contrasts with several others in other languages (ranging from APL to Lisp to Pascal): Some people continue to dislike sigil type systems even after significant use of Perl, and do not miss them in the languages that do not use them. It is bug-prone in Perl to refactor a list or scalar into a hash, since it is easy to forget to correctly change the sigil. A refactoring editor could help.

I believe that there is a valid complaint trying to be made here amidst refactoring; since I'm not in favor of sigils, I'm having a hard time verbalising it. ExplicitIsBetterThanImplicit?, maybe? (To which I'd counter-argue that that's a guideline, not a law; we can't afford to make everything explicit or we're back to assembly language, and the evidence that sigils justify their existence just isn't there.)

There is no difference between:

"s" is a letter, and therefore the brain reads it as part of the word "sfoo". Symbols such as "@", "$", "%", "_" are not and it is easier for the brain to skip over them.

Which brings up the issue of why the brain should have to skip over them. ;-)

The brain doesn't have to. It's a learned habit related to the programming languages that the person in question has used in the past. Someone from a Scheme background naturally considers names such as call-with-current-continuation or 'atom?' to be perfectly fine function names, because there's no difficulty using a hyphen or question mark in a function name, and lots of library routines use them.

I'm not sure why this has not come up yet, but I miss sigils in languages that don't have them. I find it much easier to tell the parts of the code apart with those visual warts strewn in - a function call is clearly obvious as such by lack of a sigil, a variable interpolated into the middle of a string stands out, and so on. Also, take into account that the sigils make it easier to unambiguously make a lot of parentheses optional, and I like using a minimally-parenthesizing style which uses them only where necessary or where beneficial to alert the reader to something. Another benefit is that good syntax highlighting is easier to implement.. Put together, all these mean that my Perl code has no more, in fact sometimes less line noise than my C or Java code, and I find it much less tedious to both write and read. -- AristotlePagaltzis

I see $foo as performing some unknown operation on the variable foo. I see sfoo as a variable name. Permit me a meta soapbox: I am thrown off by the dollar-sign or at-sign characters in variable names. Twenty years of programming (not Perl) has taught my brain that a special character identifies something far more important than a characteristic of the variable. I see the special characters reserved for *actions*. These actions can be mathematical (plus, minus, times, boolean operators, double-equals) or grouping (parentheses, braces) or indexing (square brackets) or assignment (equals). When I see special characters used for scope and type information mixed in with special characters for actions, readability suffers. After all, some languages use a dollar-sign or at-sign prefix as a dereference operator. Now that we programmers switch between many different languages, it is no longer acceptable for each language to do things its own way. I suggest that variable naming is becoming standardized across all languages (first character is a letter and the rest is a letter or number; mixedCapitalization is also becoming standard instead of underscores). BasicLanguage, PerlLanguage, and RubyLanguage are violating the standard variable naming rules. -- DaveEaton

Sigils (such as PerlLanguage's $scalar @array %hash, RubyLanguage's @member, etc) encode type or scope information in variable names, and therefore remind some people of HungarianNotation.

This is not the case in Perl - the @ in the @array is not part of the name. When you look up an element in an array, you use the $ sigil. See the section about perl's sigils below.

This is also not the case in Ruby - the @ and $ sigils are not really part of the name; they just tell the interpreter where to look for the name.

I don't think this can be said so absolutely. The documentation for Perl and Ruby says that the sigil is not part of the name, but so what? What illogic follows from saying "it is too part of the name"? None, that I see, at least in Perl; these things are in different name spaces. That seems to be equivalent to saying they're in the same name space and the sigil is part of the name. And a perl interpreter could in fact implement it that way, could it not?

Consider this bit of Perl:

 @bar = ("a", "b");
 $bar = "cde";
 print $bar[0];

This code prints the letter "a". If the sigil were part of the name, perl would see "$bar", determine that it is a scalar, and attempt to perform an array lookup on it. By any logic, that should return one of "c", "cde", undef, or be an error or exception. It should not find the letter "a".

Perl is instead looking at the first element of the array @bar. So, the symbol used is the array "bar", not the scalar "bar". It must be the case that the interpreter determines that the symbol "bar" is being used in an array context, and then looks it up in a separate array namespace.

Not at all. Perl just has additional semantics concerning the syntax "$bar[0]" that are not implied solely by its decision of how to treat "@bar" and "$bar". The array reference is yet a third kind of sigil, not an extension of the first two. A hypothetical implementation could parse "$bar[0]" and turn that into a reference to "@bar" with index 0. I am not aware of any semantics that forced Perl to use "$bar[0]" for this; "@bar[0]" here would be illegal syntax (not true, see below), and therefore is open for use - it could hypothetically have been used instead.

More tersely: nothing prevents Perl from treating "$bar[0]" as a lookup of a variable named "@bar", as an implementation detail where "@" is treated as part of the name.

In essence, this creates a new sub name space for arrays via an internal naming convention. To the perl user, it acts like a different name space, even if it is implemented as a single name table. But, perl still has to determine that "bar" must be an array before it can decide that it has an "@" in its name.

In this sense, all variables share a single name space - you only have to encode enough context in the name.

[ Actually, @bar[0] is quite legal. It's equivalent to ($bar[0]), that is, an array containing $bar[0]. Likewise, @bar[0..3] is a slice of @bar; it's the same as ($bar[0], $bar[1], $bar[2], $bar[3]). Likewise @$bar, is the array referenced by $bar, where $bar was previously assigned = \@quux. Perl treats sigils consistently as operators, not as part of the name. ]

Asking whether a Perl sigil is really part of the variable name is like asking whether the case+number ending at the end of a Latin noun is really part of the word. Call the $ in $foo[0] an inflection and it becomes much clearer, despite the analogy falling down after a little bit.

I'm not sure whether I agree or disagree, depending on what you mean, which depends on how much linguistics (not just Latin) you've studied. I might agree if we draw the parallel that I was essentially asking whether the sigil is part of Latin morphology or whether it is part of Latin syntax. All languages have devices to signal case, but it varies whether the devices are in morphology, syntax, or both.

But I think I'm quite sure that your analogy does not actually clarify the subject, no matter which way you meant it. I'm a compiler guy, so when I asked, basically I wanted to know whether implementing it that way would give rise to bugs -- I was not asking in order to find out the One True Larry Wall way of thinking, for instance.

BTW, the analogy is not entirely hypothetical. See PerliGata

Sigils in PerlLanguage indicate type:

Scalars, arrays, hashes, and subroutines each have their own "namespace", so you can simultaneously have in scope one symbol of each type with the same name!

The $ sigil is used before array and hash lookups when you are looking up a scalar:

 $foo = $bar[0];	# array lookup in @bar
 $foo = $bar{"asdf"}; # hash lookup in %bar

In both the above cases, the type of the symbol "bar" is determined by the context (the following [] or {} characters, not the leading $ character). The leading $ character only indicates that what came out of the lookup is a scalar.

When you are looking up more than one thing at a time (a slice), use the @ sigil:

 @foo = @bar[2, 4, 0];  # fetch three values from the @bar array
 @foo = @bar{'name', 'type', 'value'};  # fetch three values from hash %bar

In the last four assignments, it's the [] and {} that determine the type of the symbol "bar", and the "$" and "@" that determine whether a scalar or a list is being retrieved.

Sigils in RubyLanguage indicate scope:

In Ruby, there is no need for sigils to indicate type, as all variables are the same type - reference to object.

More completely, though, all objects are instances of a class (not a type), so an attempt to map Perl-style sigils to Ruby might involve a different sigil per class. Such an attempt would be quite braindead, of course: Ruby's OO-focus means that it's quite common for a Ruby application to contain dozens or hundreds of small classes, just as might be typical in Smalltalk.

Sigils in Ruby are a completely different beast from sigils in perl. Matz stated more than one time that sigils for globals and for instance variables are there to remind you that you should not use them directly. You should encode the global information in a class or a constant, and access the instance variables via accessor methods. When you're writing quick & dirty code you can use them, but globals are evil and the sigils are there to reify a code smell.

This entire issue is largely a matter of opinion, but I'll pipe in as somebody who's used both sigil languages and non-sigil languages. I like Ruby's way of doing it. It's good to have an inline reminder of a variable's scope. (As opposed to, say, going up to the top of the class to find its declaration, as you might do in Java or I think C++.) And it's not quite accurate to say you could do the same thing with a HungarianNotation-style system, since @ and @@ stand out a lot more than, say, m_ and c_, and are a lot less likely to be confused for part of the variable name.

Note that I don't particularly care for Perl's sigils demarking variables by type, but that's 'cause when I work in a language that encourages me to make lots of little classes, I think less about the type than about what the instance does. (Yes, this is that old static vs. dynamic debate. Again.) I'm not certain why knowing scope would be more useful than knowing class, but there you have it. -- francis

On the lighter side of Perl operators:

There is a practical issue with sigils when editing and searching code. Double clicking on the foo in @foo will highlight just foo, not the full @foo (at least in BbEdit). Highlighting @foo requires either dragging over the whole thing in the first place or a two-step process: highlight foo then shift-left-arrow (or whatever) to include the preceding @.

Sigils are hard to type, @, #, $ requires using the alt-gr key on many keyboards which is always to small and badly placed. Python has self. ("something." if you want) as sigil for instance variables. It is however considered good practice by to prepend instance variables with some kind of marker.

And what keyboard layouts would those be?

Swedish keybord layout

German keyboard layout Similar for french, japanese and almost any other kind of non-US keyboard layout... What an arrogant, self-centric question...

Sigils are not such a big problem when there is a reasonable number of them, but PerlSix is adding secondary sigils to it's OO stuff (for example $.x is the member scalar named "x"). That is getting a little out of hand (Though LarryWall has promised not to ever add tertiary sigils). -- mp

Sigils make it possible to interpolate variables directly into quoted strings, e.g.:

 print "Hello $name!\n";

Why do you need sigils in the language in order to have this? While I'm not yet aware of a language that does this, it shouldn't be all that difficult to do. In any case, sometimes I like this about sigil languages, but even in sigil languages, you don't get to put arbitrary code into a string when it would be nice to do so, and if you are referencing instance or class variables, the syntax can get ugly pretty fast. Finally, I like the flexibility that leaving sigils off gives you: there are times when it's elegant to have a variable be different types, based on the given situation, and sigils are either demonstrated meaningless in such situations (such as in PHP) or make it difficult or impossible (such as in Perl).
Related pages: LanguagePissingMatch, PythonVsPerl

View edit of January 7, 2014 or FindPage with title or text search