Typing Quadrant

More examples would be welcome. It might also be a useful to add a third dimension for explicit/implicit typing. -- AdewaleOshineye

See below for discussion of weak/strong vs safe/unsafe typing

                | Weak          | Strong        |
        Static  | Pascal        | OCaml [*]     |
                | C             | Haskell       |
                | C++           |               |
        Dynamic |               | CommonLisp    |
                | Perl          | Python        |
                |               | SmallTalk     |

[*] But see OcamlTypeSafetyProblem

There is a case for extending the matrix to include:
                | Weak          | Strong         | StronglyTypedWithoutLoopholes
        Static  |Pascal         |Haskell         |Haskell (no unsafePerformIO etc.)
                |C              |Java (w/ native)|Java (no native methods)
                |C++            |       methods) |OCaml (no unsafe modules)
                |               |OCaml           |Sather (no external classes)
                |               |Sather (w/ ext. |
                |               |  classes)      |
        Soft    |none?          |DrScheme        |E
                |               |                |
                |               |                |
        Dynamic |               |CommonLisp      |W7
                |Perl, PHP?     |Python          |Squeak-E
                |               |Smalltalk       |
The StronglyTypedWithoutLoopholes category is related to, but not entirely equivalent to, the ObjectCapabilityModel. (A language can have a strictly enforced type system, but also, e.g. global variables, which excludes it from being an ObjectCapabilityLanguage. An example of this is Java without native methods.)
The stronger a language's typing, the higher it is placed. Thus OCaml and Haskell are considered to have equivalent levels of strong typing and so are Java and C++. However, OCaml and Haskell are placed above C++/Java because they have 'stronger' typing.

C++ moved to "weak" because it allows unsafe casts, in exactly the same way that C does. Any difference between C and C++ in this respect is only a matter of style and not language design.

There's an important difference that you're missing, though. You can't write non-trivial programs in C without using weak typing. By contrast, the only time you ever need weak typing constructs in C++ is when doing low-level bit scribbling. So the language design of C++ enables use of strong typing everywhere, the language design of C does not. This difference is important. Quite. One language (C++) gives you the freedom to write in a particular way, the other (C) does not. Surely this is something that "matters". In general, since languages frequently support more that one variety of typing, it's surely at least arguable that one should speak in terms of which forms of typing a language supports, rather than saying any language only supports one form of typing and leaving it at that. For instance, both Java and C++ support both dynamic and static typing styles, so it doesn't make much sense to call either Java or C++ a "dynamically typed language" or a "statically typed language". I agree that if one were only going to use one variety of typing to describe a language, it would make most sense to use the lowest-quality one. So in this case, that makes C++ a statically typed, weakly typed language. Is this the sense in which C++ belongs in that box? But in practice, all C++ applications that I have dealt with in recent years have been strongly typed, and make extensive use of dynamically typed programming styles. So while the categorization of C++ in the box may be right in some senses, it's not conveying the full truth of the matter.

Classifiying languages by the lowest-quality variety of typing each provides makes for a very simple diagram: All languages are typed weakly and dynamically, because all languages support types that are not specific to the solution domain. So you can always use a length (a simple number) as argument to a function that expects an angle in degrees (as opposed to radian). Always. In every language. Classifying by the variety of typing that is recommended for a language, and easy to do in it, seems fairer and more useful to me.

I don't think the Weak/Strong factor is being very informative; it more like boils down to programming style. Even in ObjectiveCaml which is very strongly typed compared to C++ (but depending on C++ programmer style), you can write type-unsafe code by using the functions in the library that are marked "dangerous" (unmarshalling, for example). And how is Perl's typing so weak?

I think that Perl is listed as weakly typed because its types all autoconvert based on context. I also think that that's wrong, but I don't want to change it in case I'm wrong about why it's listed there. I thought that weak typing implied that one could pass in types to the wrong function and cause UndefinedBehavior, but Perl has well-defined semantics for type mismatches. For example, "01" + "23" == 01 + "23" == "01" + 23 == 1 + 23 == 24. qw(a b c) + "d" is 3. 12 . 34 is 1234.

Oh wait. Once you get into OOP, then you can get UndefinedBehavior. Well, not exactly undefined behavior, but you can have subs that modify the underlying blessed hash without causing an error. However, that behavior still isn't exactly "undefined".

To have both C++ and Haskell in the "strong" box doesn't seem right. C++ is stronger than Java (for the moment), but nothing like as strong as either Haskell nor OCaml, neither in terms of guarantees about correctness, nor expressive power.

WeakTyping to me means "given an object of type A, the ability to reinterpret the pattern of bits in its value as being of type B": it's intimately connected with the actual machine representation. For example, an IPv4 address in the BSD Sockets API is a "network-order unsigned long", but in a language like C you can also interpret it as a vector of octets simply by using a cast. Perl's silent coercions (e.g. between numbers and strings containing the printable representation of those numbers) are not at all the same thing, because at the machine level they require processing (such as atof()) to be done to the object.

I suppose that the Perl pack() and unpack() functions can be used to give it some form of WeakTyping ability.

I am still fuzzy on what distinguishes "weak" and "strong" on the dynamic end. My impression was it depended on whether or not variables stored a type indicator of some sort with the variable.

As I understand it, it's based on the extent to which variables are implicitly coerced from one type to another. So PHP and Perl (which automatically change strings to integers and vice versa depending on the context) are weakly typed, while Python and Lisp (which give an error if you try to apply an integer operation to a string without an explicit conversion) are strongly typed. -- JonathanTang

What if a language uses only strings to store basic variables (scalars)? If you do a numeric operation, it simply converts (or tries to) strings into internal numbers to perform the operation. In other words, there is no way to ask a variable its "type". There is nothing to "coerce". The closest you can do is have operations such as canBeInterpretedAsNumber(x). The following would create identical variables:

        x = 9;
        x = "9";
I'd consider that WeaklyTyped. I have a friend that made a language where the only data types were strings and objects (collections of strings). All arithmetic was done on strings-that-could-be-interpreted-as-numbers. It worked, certainly, but I wouldn't call that strong typing at all.

Treating all data as a string is also a well-known way to circumvent the type system in more strongly typed languages. Every object, record, collection, etc. can be represented as a string - the limiting case is just parsing the source code for the collection itself. But you have no guarantees as to data integrity or anything like that when you do that.

There are usually two ways to do "tag free" typing: the bytes can be interpreted (read) as any "type"; or strings represent all types (or at least appear that way to the programmer). In tag-free typing, there is no type tag/flag separate from the value. The value is what it is and your operations interpret it based on the value and only the value. "Checking a type" is more or less parsing and validation on the value itself. In that sense, it can have "type security". Some languages, such as ColdFusion, can even (optionally) validate function parameters in such a way. See TagFreeTypingRoadMap, --top

See also: http://www-lp.doc.ic.ac.uk/UserPages/staff/ft/alp/net/typing/strong.html for a succinct definition of the terms.

Perhaps we need a cube rather than a square, with soundness of the type system as the third dimension. A language with completely SafeTyping? is one where type errors are either impossible, or caught at runtime and handled safely with an exception or similar mechanism. A completely unsafe language is one which makes it easy to generate (undetected) type errors (and may even require engaging in dangerous constructs to do anything useful). In the middle are languages which either have holes in the typing system which are easy to avoid, or require use of constructs explicitly marked dangerous in order to circumvent the type system. An approximate listing of some languages from least safe to most safe:

To redo the above graph in 3D - heck, I won't even try. :)

                | C                     | C++                   |
                |                       |                       |
                |                       |                       |
                |       .. Safe ........|................       |
                |       | Pascal        | Java    |     |       |
                |       | (mostly)      | Haskell       |       |
                |       |               |               |       |
                |       |               |               |       |
         Weak   |       |               |               |       | Strong
                |       | Perl    | Smalltalk   |       |       |
                |       |               |               |       |
                |       |               |               |       |
                |       |               |               |       |
                |       |...............|...............|       |
                |                       |                       |
                |Forth                  |                       |
Note that most languages with DynamicTyping are generally safe - if you are doing runtime checks for type resolution; detecting type errors comes naturally. Languages with static typing may be unsafe, depending on whether the compiler allows code through that it can't prove typesafe.

A language can't be called type-safe if type errors happen but they just don't crash the physical machine. For example in Java it's very easy to get ClassCastExceptions? at runtime. The definition above says this doesn't indicate that Java's not type-safe because it's possible to catch that exception. That's a problem. A better definition of type-safety is provided here: http://matrix.research.att.com/vj/bug.html

Perhaps we have a fourth dimension - TypeSecurity?. (Or perhaps this is an extension of TypeSafe; as it is not possible to have a language which is type-safe but not type-secure. C++ implements public/private/protected, but it is trivial using casts and the like to bypass the encapsulation boundaries, should you choose. BjarneStroustrup has long said that the C++ type system was not designed to prevent "fraud" - in a language where arbitrary pointer casts are permitted; it would be impossible for it to do so.)

At any rate, the definition of "type-safe" given the paper mentioned above is as follows:

A language is type-safe if the only operations that can be performed on data in the language are those sanctioned by the type of the data.

I disagree with that definition; for one it is imprecise. For another, it simply contradicts what the typing literature has to say on the point.

There are many languages (PythonLanguage and CommonLisp) which are generally considered type-safe (ignoring the option to disable typechecks in CommonLisp) that provide no compiler-enforced encapsulation whatsoever - the concept of private methods/attributes is handled by convention, not by barriers erected by the tools. JavaLanguage is perhaps unique among programming languages in that it attempts to preserve type-security (the JVM guarantees, or at least attempts to guarantee, that features marked private can only be accessed from within the class).

This, of course, has nothing to do with the question of whether a language which traps type errors at run-time is typesafe - the general consensus among the programming community is that run-time typechecking does not disqualify a programming language from type-safety. Otherwise, none of the languages with DynamicTyping could be considered type-safe (nor could Java for that matter - it traps invalid downcasts and NullPointerExceptions at runtime. Trapping these at compile time is well-known to be an undecidable problem). Perhaps Haskell or one of the ML family could meet that strict of a definition.

Finally, I should note that the flaw in Java mentioned in the paper (the paper was written in 1997) has long since been fixed. I am unaware of any ways to compromise the type system in the current version of Java (1.4) for code executing on a conformant JavaVirtualMachine.

But a strongly-typed language will also tend to be type-safe whilst a weakly-typed language will tend not to be type-safe. The strength of a type system is a continuum indicating how many holes there are in the type system. Thus Java and C++ aren't as type safe as Haskell and so C++/Java's type system is not considered as strong. And if you're still confused, just ask Cardelli: http://citeseer.nj.nec.com/cardelli97type.html.

I'm curious. This is an area where I have a significant amount of fuzz.

I have written a lot of code in FoxPro, which allows the type of a variable to be changed on the fly. In other words, the code:
  someDate = {12/07/1941}       && assign an actual date to SomeDate?
  ? type("someDate")     && output: "D"
  someDate = someDate + (365.25 * 20)  && date math for 20th anniversary of Pearl Harbor
  ? type("someDate")     && output: "D"
  someDate = DtoC(someDate)  && now convert to a string
  ? type("someDate")     && output: "C"
  someDate = someDate + " is the 20th anniversary of Pearl Harbor"  && now extend it
  ? type("someDate")     && output: "C"
  ? someDate             && output: "12/07/1961 is the 20th ..."
is quite valid.

Operations across different types require explicit type conversions, and there is a TYPE() function that tells you what kind of data any given variable (or field/column) is. There is only one kind of "numeric" data (dates are a special case) and "logical" data is not interchangeable with numbers.

Where would you classify a language with this kind of behavior?

That appears to be DynamicTyping. While not familiar with FoxPro; languages with DynamicTyping allow the types of variables to change by assignment. Languages with StaticTyping could do this as well (there is no need for the type of a variable to be preserved across an assignment); and some with TypeInference may. Those with ManifestTyping don't do this, though it's theoretically possible to do so... you would need to extend the syntax of the language to include a way to specify a new type (and simultaneously provide it with a new object of the type, or else invalidate the value).

Note also that, in the above example:

  someDate = {^1941/12/7}  && "Strict date" expression
  ? type("someDate")     && output: "D"
  * This line throws a data type mismatch error that may be caught at design time when saving the file in the IDE under some
  * circumstances, or at compile time under other circumstances, but certainly at runtime.
  someDate = someDate + " is the 20th anniversary of Pearl Harbor"  
Thus Foxpro is "strongly typed" for some definition of "strong typing".

Anybody care to take a stab at RebolLanguage?

Strong dynamic typing (with a bit of static typing in local variables declarations), what's the catch?
"So what is "strong typing"?

As best as we can tell, this is a meaningless phrase, and people often use it in a nonsensical fashion. To some, it seems to mean the language has a type checker. To others, it means the language is sound, i.e., the type checker and run-time system are related. To most, it seems to just mean "a language like Pascal, C or Java, related in a way I can't quite make precise."

For amusement, when the phrase "strongly typed" is mentioned at a cocktail party, ask them to define it, then sit back and watch them squirm. And, please, don't use the term yourself unless you want to sound poorly-trained and ignorant. One recommended set of terminology is found in the online book ProgrammingLanguagesApplicationAndInterpretation:

                    | statically checked  | not statically checked |
        type safe   | ML, Java            | Scheme                 |
        type unsafe | C, C++              | assembly, Forth        |

Weak typing and dynamism don't have to go hand-in-hand, but often do in practice because of the trade-offs they offer. It's possible to have a strong typed dynamic languages, but it probably would not be popular because those who prefer dynamic languages tend not to want to deal with typing.

If you "invest" in a type-heavy technique, then you might as wall use a compiler because compilers can use the type info detect a wider class of problems.

We who like dynamic languages often do so because they are usually less verbose than the type-heavy languages and read more like pseudo-code, and thus easier to manually inspect and work with as code. A strong-typed dynamic language would essentially give you the worst of both worlds: type verbosity and lack of compiler checking.

In short, there's a fundamental trade-off between auto-checking and brevity. A type-heavy dynamic language gives neither benefit.

This is why they are not common.


What is a "type-heavy" language?

How does tag-based typing versus tag-free typing work on that chart? Related: ColdFusionLanguageTypeSystem

There needs to be some explicit tests or criteria for each of these. The existing ones are poor or contradictory.


View edit of October 13, 2012 or FindPage with title or text search