Java Arrays Break Type Safety

[Voting on JavaDesignFlaws page.]

A Java array containing a reference type can be cast to an array of the supertype. This breaks type safety, and therefore array accesses have to be type checked at runtime.

For example this code is valid at compile time and fails at run time.

  String[] strings = new String[1];
  Object[] objects = strings;
  objects[0] = new Integer(1); // RUN-TIME FAILURE

I don't think it is a flaw in design. If it would be impossible, you could not create any methods which work on generic arrays. No Arrays.sort(objarr). Java already suffer from primitive types not being objects and thus requiring separate handling of each of 8 or 9 types of them in generic algorithms. You propose widening this illness to all object space... -- Artur

Java doesn't have generic arrays or any kind of generics, that's the problem. If you want to sort a generic array you should create an array of Comparable objects. However that's not the same as casting an array of whose element type implements Comparable to Comparable. This is faking generics by opening a hole in the type system that requires run-time type checks. When they add generics to Java this hole should be removed. Until then, it's something we have to put up with, but it's still a flaw.

Well, is not Object (or Object[]) exactly the kind of generic you are looking for? Simply substitute "String[] strings" by "Object[] strings" in the above example. I agree with Artur, btw. -- MartinSchwartz

Yes, Object[] is a generic array. However, a String[] should not be an Object[] - that opens a hole in the type system. If one needs to do generic operations on an array, one should copy the typed array into an Object[] array, do the operation and then copy back. This would require casts, which would emphasize to readers of the code that type coercion is taking place and force one to consider type errors.

Current plans are to add JavaGenerics (TemplatesInJava?) in version 1.5. For compatibility reasons, they don't plan to close any holes, like the above.

I'm glad it stays. Btw. an example from JSR14 (Adding Generics...):

 LinkedList<String> ys = new LinkedList<String>();
 ys.add("zero"); ys.add("one");
 String y = ys.iterator().next();      

It's not correct for an object-oriented system to do this. See, entry 21.3. Actually, read from 21.2 down because it's pretty good reading.

I think the conclusion of this faq is rather due to missing run time type information in C++ and doesn't apply to java. It's nice to read, though. -- martin

This is a practical example of why method arguments must be contravariant not covariant.

Object[] defines the subscript operator to take an Object parameter as the new element. String[] is derived from Object[] and redefines the subscript operator to take a String as the new element, which is covariant w.r.t Object. Thus, String[] breaks the LiskovSubstitutionPrinciple.

The EiffelLanguage also allows covariant method arguments, which opens a hole in it's otherwise excellent type system.

Actually, contravariant method arguments are broken too. If my method requires a String[] and you pass me an Object[], that's a problem. Similarly if I operate on your String[] as if it was an Object[], that's a problem.

More specifically: If an array (or any collection) is used in a context where it is only read from and not modified, then covariance is OK, contravariance is not. If I have a function which takes an Object[] and only reads from it, passing a String[] is harmless

On the other side of the coin, if an array (or any collection) is used in a context where it is only modified and not read from, then contravariance is OK and covariance is not. If I have a function that inserts strings into the provided array (doing nothing else), passing it an Object[] is typesafe as strings may be safely inserted into an Object[].

Finally, if both reads and writes are done, than neither covariance nor contravariance is safe; only invariance is safe. Otherwise you risk either retrieving a non-string from what you think is a collection of strings, or inserting a non-string into a collection of strings.

Proper generics will solve the problem. We hope. It works fine for C++

Well, almost. It would be correct, if no subtyping relation existed between Object[] and String[]. At the point of use you'd declare that you want an "x[] where x is a subtype of Object" or an "x[] where x is a supertype of String". The first is possible in Java (I think), the second isn't. Conflating subtyping with inheritance may turn out as a dumb idea, too.

The real pain of Java only completely enforcing type safety for arrays at run-time is that there is a run-time cost to array accesses, even though no sane code will ever fail the test. -- JamesDennett

This is where smart compilers are useful. If DataflowAnalysis shows that only elements of appropriate types are assigned to an array of fully known type, then the runtime checks can be neglected without harm. This clearly doesn't apply where the array is passed in as a method argument (because then the array is only of partially known type), but it is quite useful for locally instantiated arrays. Extensive DataflowAnalysis can increase the number of situations where this optimization can be done, at the cost of potentially having to recompile more of the program when changes are made.

The problem is that you can't specify whether you are going to read from the array or write to it. Or more generally if you require elements of at least some type, or if you promise to provide elements of at least some type.

I think this problem exist in C++ too.

For generics types there is a solution planned for this:

This may be an innovation for strongly typed OO languages. (Innovation is defined as "not in C++")


Firstly, arrays do not break type safety. If they did, then upcasting an array wouldn't fail at runtime. They make it impossible for the compiler to prove the program type-safe, so the checking is deferred until runtime.

I think confusion occurs here because one the one hand, since a String is-a Object an array of Strings is-obviously-an array of Objects, and on the other it clearly isn't. The answer is in the mutability of an array.

If the array is immutable, then it safe to treat a String[] as an Object[], because an immutable String[] is always exactly like an immutable Object[].

On the other hand, if the array is mutable, then it is not usually safe to treat a String[] as an Object[].

The "wildcards" technique described in the above link is exactly what CommonLisp has been doing for years.

  > (deftype StringArray? () (array String)) ; This is the type of arrays of String
  > (deftype ObjectArray? () (array Object)) ; This is the type of arrays of Object
  > (subtypep StringArray? ObjectArray?)      ; Is StringArray? a subtype of ObjectArray??
  false, true                               ; No, it isn't. (false: it isn't, true: I'm sure)
  > (deftype AnyArray? () (array *))         ; This is the type of arrays of anything
  > (subtypep StringArray? AnyArray?)         ; Is StringArray? a subtype of AnyArray??
  true, true                                ; Yes, it is. (true: it is, true: I'm sure)

Isn't there a bigger problem here? Either the entire array must be of one type, or each element is allowed to be a different type. If we allow the second, then we cannot know at compile time whether a given expression is valid or not because arrays are not practical if they have to be filled by compile-time. Plus, it may make the array need more memory because something has to track the type of each element as they are added or changed. If we require that each element be the same type to avoid this issue, then we limit the power and flexibility of arrays.

Java arrays can specify any type, including Object, as their base type. An Object[] can hold any Java object, an String[] can only hold String's (as the String class is final; there are no subtypes for String). This isn't a problem. The problem is that Java allows String[] to be converted to Object[], when that's a type-unsafe conversion.

"Object" is just a dynamic type in static clothing.

Changing memory requirements aren't an issue for Java arrays, as 1) they are fixed size (use the Vector class if you want arrays that grow and shrink); and 2) arrays hold references to objects, not the objects themselves (the size of an array element is fixed, regardless of what that element is).

There are many times when you want homogenous containers, rather than heterogenous ones. In the relational world, for instance, all records in a given relation have to be of the same shape. I shouldn't be able to insert a PurchaseOrder? tuple into the Employees table. Likewise, I shouldn't be able to insert a SecurityManager into a String[]

Are you suggesting that two types of arrays be part of the language: dynamic and static kind? I suppose, but that just complicates an already complicated language further. I think they wisely chose the more flexible option to serve multiple purposes rather than complicate the language further. Sometimes you have to break theory for practicality sake. Then again, I am a fan of dynamic or type-free languages anyhow, but can see their reasoning from a static perspective.

ParametricPolymorphism gives you both, with a single concept. I can use vector<Base *> in C++ or use vector<Derived *>, whether I want more flexibility or a stronger invariant. (Of course, even in Java and Smalltalk, one can essentially enforce such invariants, it's just a matter of when. Collection invariants in Smalltalk, such as they exist, aren't enforced until an object is removed from a collection and a message sent to it, and the message send returns DoesNotUnderstand. Java array invariants are checked earlier, when an object is inserted into an array. C++ (and Java5) vector invariants are checked by the compiler. Like many other things; stronger invariants means greater ability to reason formally about the program; but less flexibility at runtime.

It seems one must either choose static-ness or dynamic-ness because you cannot have both.

That's like saying you have to choose either black or white because you cannot have both--a tautology. Of course you might find that grey or other colors are also valid options, so it's not a binary choice; but something cannot both be simultaneously black and white.

I am just suggesting that there is a fundamental trade-off of compile-time checking and run-time flexibility. Java arrays are not "broken" or "flawed", but rather chose a certain path down the trade-off tree that tilts toward the flexibility camp.

If we approach arrays as objects instead of syntax with square brackets, then we have situations such as:

  objectA = objectArray.getAtPosition(1);
  objectB = objectArray.getAtPosition(2);
  objectC = objectArray.getAtPosition(3);
  objectD = objectArray.getAtPosition(myPositionVariable);
Here, the type of object coming out depends on the parameter value. The compiler cannot realistically check this at run-time unless method getAtPosition is forced to declare its return type at compile time. If so, this can limit its flexibility. For example, we may want a factory-like pattern that returns different "kinds" of things based on download from a different system. The compiler can perhaps know the set of types that may be generated, but it cannot know which particular one will come out. For instance, suppose it reads in an XML GUI arrangement and different types of widgets can be generated. We may be able to know that all the return types are of master type "widget", but not which specific ones.

View edit of September 26, 2005 or FindPage with title or text search