Durable Java

Mark Davis / http://www.macchiato.com/


Abstraction

Durable Java | Immutables | Abstraction | Serialization | Liberté, Égalité, Fraternité | Hashing and Cloning

[Note to the editor: here is the contents of the left-hand first-page information. Both it and the column title have changed.]

Design your code from the start to be durable--so it can evolve without breaking your clients' code.

Dr. Mark Davis is lead architect at IBM's Center for Java Technology, Silicon Valley, co-founder and president of the Unicode Consortium, and architect for the bulk of JDK1.1 internationalization.

[Note to the editor: end of first-page left-hand information.]

In this column we are going to take up the use of abstract classes. These will include choosing when to use abstract classes (vs. interfaces), when to add abstract superclasses, and how to create instances of abstract classes. Proper use of abstract classes can go a long way towards making your code more robust. Sparked by my last column, I received some questions about String and StringBuffer performance and memory usage, so I'll take some time to follow up on that as well.


ABSTRACT CLASSES VS. INTERFACES

"Another short fit of abstraction followed,..." -- Mansfield Park

Interfaces and abstract classes are very similar; both provide a mechanism for different concrete implementations to be used by the same code, and passed as a parameter to the same methods. However, there are two significant differences between them that can trip you up if you aren't careful.

 

Interfaces

Abstract Classes

Inheritance

Multiple Single

Implementations

Illegal Allowed

Inner classes

Static only Static and non-static

Interfaces can be extremely valuable. However, you do need to be wary of defining an interface when it should be an abstract class instead. Interfaces are subject to a very important constraint: they are impossible to modify--even with pure additions--without breaking your clients. For example, if you add a method to an interface X, you cause a runtime error in every class that implements the interface (except those lucky classes that happen to have a method with the same signature!). When you don't control all the subclasses of your interface (they may be in different organizations or different companies), you have a big problem.

There is some relaxation of this in Java 2. You used to get a runtime exception when a class was loaded, if it didn't implement the interface completely. In Java 2, you get a runtime error when the new method is called. You still end up with compabilitity problems, if your code ever calls the new method and a client's subclass didn't implement it.

With abstract classes, on the other hand, you can add methods without breaking compability: you just need to supply a reasonable default implementation for that method. For example, java.io.Writer is an abstract class. It provides methods that will write characters, arrays of characters, and Strings. If Sun wanted to add a new method that wrote, say, StringBuffers, it could easily do it just by supplying a default implementation. Subclasses could override that method for efficiency--but wouldn't have to.

 public void write(StringBuffer source) {
   write(source.toString());
 }

If java.io.Writer were an interface, this would be impossible. With an interface, you could only add a method signature, with no implementation.

 public void write(StringBuffer source);

Adding this would cause all implementing classes to break (unless, by chance, they happened to have that method!). Since you can't add a method to an interface, you have to use a rather clumsy work-around. You have to make a new interface X2 that implements X and adds the method. Then, in all of your code that takes X as a parameter, add runtime tests using instanceof to distinguish between X and X2, and use different code to handle the two cases. This is exactly what happened with LayoutManager and LayoutManager2 in the AWT: Sun had to create a whole new interface. Any future extensions of LayoutManager would require a yet another LayoutManager3, and so on. If LayoutManager were an abstract class, than this would not be necessary. You can add a new method with a default implementation to an abstract class.

Take another example, CharacterIterator. This class allows access to characters from a String or any other source of text. Its methods include next() and previous(), allowing iteration. However, there are many times when a user of CharacterIterator needs to get a whole chunk of text at a time, for efficiency. If CharacterIterator were an abstract class, then we could add a method that implements that in terms of the other methods, but which can be overridden for speed by any subclass:

public void getChars(int start, int end,
    char[] output, int outputStart) {
  for (ch = setIndex(start); 
       getIndex() < end;
       ch = next()) {
    output[outputStart++] = ch;
  }
}

This would be very useful, but since CharacterIterator is an interface it is not possible to add this method.

So when do you use an interface? It's when you really need to support multiple inheritance. MouseListener is a perfect example: you might want to make a new class which is only a MouseListener, but it is extremely convenient to make a class both a MouseListener and an ActionListener. To support the latter case, you need multiple inheritance.

But LayoutManager and CharacterIterator never really needed to support multiple inheritance, and would have been much better off as abstract classes.

When to add Abstract Classes?

Whenever there is more than one reasonable way to implement a given API, and those different implementations can have useful differences, consider adding an abstract superclass to your class structure. In general, you give yourself and your clients much more flexibility if you generally provide an abstract class, and you use that abstract class in your method parameters where possible. In this section, we'll use the Java String as an example. Suppose that it had an abstract class--call it StringBase. If StringBase were used in the API in general, then other subclasses could implement it, and interwork throughout the API. Methods like indexOf or regionMatches could be used with any class that provided text storage, not just String.

We don't have to look far for an example of other subclasses: StringBuffer provides one. While String was designed to be immutable and thus simpler and safer, StringBuffer has tremendous advantages in performance, as discussed in my last column. If String and StringBuffer both extended StringBase, and methods like Writer.write(String str) took StringBase parameters, then people would have the full advantages of both forms of strings, and avoid unnecessary (and costly) conversion between them. (See Follow-up below for more information.)

Note that in this example, StringBase would be neutral regarding mutability--classes that implement it may be either mutable or immutable. You couldn't overload StringBase with the property of immutability, since then StringBuffer would then be an illegitimate subclass. Still, clients could use StringBase as the parameter type in all those cases where they don't care about mutability and they only need to access the character data. Additional string classes could be added that provide useful features, such as disk-based storage or efficiently modifications of large volumes of text. Since they would extend StringBase, they could be passed into any method that takes a StringBase as a parameter.

As discussed in my last column, in Java the mutable/immutable distinction is the way to compensate for the lack of const. With the above structure, a method parameter that required the use of an immutable string would be forced to specify a concrete class (String). But you may want to allow your clients the ability to specify the mutability of objects passed as parameters and allow for different implementations. For that case, you would need to add more structure that makes it explicit when a class was mutable, when it was not mutable, and when you didn't care. Of course, this might be much more work than you really care to do for every class, so you would target those cases where these capabilities were of particular importance to your clients.

When you add an abstract superclass to an existing class, you can move some implementation and fields up into the abstract superclass. But don't move fields that would be redundant in reasonable subclasses. For example, you might move the count field from String and StringBuffer into StringBase, but you would not move the character array since many reasonable implementations (such as tree-structured storage) would supersede that array, resulting in duplicate storage and messy code.

You can add on an abstract base class after a first release. You then change methods in other classes to take that abstract base class as a parameter place of your original class. However, because of issues involve in serialization (that we will discuss in the next column), you may need to add overloads instead, which will bloat your API unnecessarily. Providing an abstract base for your classes from the start gives you the maximum flexibility in extending your class structure for the future.


FACTORY METHODS VS. CONSTRUCTORS

"...if their construction could ever be deemed clever, time has long ago destroyed all its ingenuity." -- Sense and Sensibility

So how do we create instances of abstract classes, since they can't have constructors? Factory methods provide the way. Simply speaking, a factory method is a static method of a class that returns an object of that class's type. But unlike a constructor, the actual object it returns might be an instance of a subclass. This is particularly interesting for abstract classes, since they can't have constructors, but is also valuable for concrete classes.

For example, String has 9 constructors (handling char arrays, byte arrays, Strings and StringBuffers) and 9 factory methods (handling the rest of the primitive types), such as:

public String valueOf(char);

 
By the way, people often forget about these factory methods, since they show up at the end of the API documentation--and have particularly unintuitive names. I've have often seen the expression "" + ch used as a way to generate a String containing a single character. The more efficient way to do this is: String.valueOf(ch).

The advantage of a factory method is that it can return the same instance multiple times, or can return a subclass rather than an object of that exact type. For example, the Locale constructor would have been better written as a factory method, as we will see in a minute. Instead of calling

locale = new Locale(language, country, variant);

one would call

locale = Locale.make(language, country, variant);

Here are some of the advantages of using factory methods.

1. Object Creation

Immutable classes can avoid object creation. For example, Locale is immutable, so any returned instance can't be altered. Given that, rather than create a new instance of a locale for France, for example, a factory method can return objects from cache.

If the immutable class is being constructed from an instance of a base class, it can also check for instances of its own type, and just return them without construction:

public valueOf(StringBase source) {
 if (source instanceof String) return source;
 // otherwise construct
}

2. Equality

If there are no public constructors, the instances of Locale can be precisely controlled. In this case, we could guarantee that all instances of a class come from a cache, and that they are equal if and only if their references are equal. That is, x.equals(y) if and only if x == y. This allows for very significant performance boosts in tight loops, since a == check is much faster than an equals check.

If you use factory methods exclusively, that means hiding your constructor. If your class is final, make the constructor private; otherwise make it protected.

3. Returning different classes

Unlike constructors, the return value from a factory method can be a subclass. This also permits the factory method to choose different subclasses based on the input parameters. For example, Calendar's factory method getInstance(Locale) illustrates this: it can return a different subclass in locales that don't use a Gregorian calendar.

4. Naming

A constructor always has the same name, so different constructors have to be distinguished by parameter types. Factory methods can have different names, either to show different purposes, or make distinctions with the same list of parameter types. For example, Color has both a constructor and a factory method with the same parameters (but very different interpretations of those parameters).

public Color(float r, float g, float b);
public Color getHSBColor(float h, float s, float b);

 
I don't actually recommend the use of "get" in a factory method name--it is too often confused with a getter of a field. Using the name make(...) for factory methods sets them off more clearly. Variant names can be attached where necessary, such as Color.makeHSB(h,s,b);

In fact, some advocates would go even farther than this and recommend that, as a matter of style (it should have been a matter of language design, they say), absolutely all constructors should be private or protected--that it's no one else's business whether a class manufactures a new object or recycles an old one!


Follow-up

"...and often without any attention to the imperfection of the performance"--Pride and Prejudice

Some people asked for more information about the speed differences between String and StringBuffer discussed in my last column. I'll take some space to discuss it here, since proper use of these classes can be crucial for performance in many programs. Here is basically how it works.

To start with, there is an interesting under-the-cover relationship between String and StringBuffer that makes the + operator and certain conversions between them faster than they would otherwise be. Take the following, for example:

for (i = 0; i < source.length; ++i) {
 use(i + ": " + source[i]);
}

A straightforward (but dumb) implementation of the second line would be equivalent to:

String temp1 = String.valueOf(i);
String temp2 = temp1.concat(": ");
String temp3 = temp2.concat(source[i]);
use(temp3);

This creates three temporary object, on the surface. However, each String contains a character array, so actually six new objects would be created. And as we saw in our last column, unnecessary object creation is very nasty for performance. But Java is smarter than that: as documented in StringBuffer, this line actually turns into the following:

 use(new StringBuffer().append(i)
  .append(": ").append(source[i])
  .toString());

This is far better, since it avoids creating all those temporary Strings (and their internal character arrays). Instead, the text is all appended to a single StringBuffer, which is only converted to a String at the end. But while better, this still involves creating three objects each call: the temporary StringBuffer, its internal character array, and a new String object.

This would be four objects instead of three, if it weren't for yet another optimization in Sun's code. When you call StringBuffer's toString(), the new String object shares the original StringBuffer's character array, until such time as any StringBuffer call would modify the array. Any time a method call would modify the array, they are decoupled, and a copy is made for the StringBuffer. (This is illustrated in the figure Decoupling Storage. The section marked with blue is added when any change is made to the String buffer.) Since this StringBuffer is temporary, and never used afterwards, that never happens and we only have three object creations per execution of that line of code.

Decoupling Storage

Still, we can do far better than that. Look at the following code:

StringBuffer temp = new StringBuffer();
for (i = 0; i < source.length; ++i) {
 temp.setLength(0); // clear old contents
 temp.append(i).append(": ").append(source[i]);
 use(temp);
}

[Sidebar]

Caution: one of the coders on our Java XML parser ran into a hideous memory leak because of the hidden relationship between String and StringBuffer. Look at the following snippet illustrating the problem.

StringBuffer temp = new StringBuffer();
for (int i = 0; i < 10000; ++i) {
  temp.setLength(0);
  temp.append('<').append(tag[i]).append('>');
  stringArray[i] = temp.toString();
}

Suppose that tags are an average of 100 chars each, but the first one happens to be 10K chars long. After this code is executed, how much memory do you think the stringArray takes up?

Pause here for dramatic effect...

You might think it's about 10,000 × 100 chars = 1,000,000 chars (plus object and array overhead, of course). But sadly, you'd be wrong. The first tag causes the internal character array in temp to grow to 10K characters. All of that storage is copied to every subsequent String, even if only the first 100 bytes are used by the String. So the real answer is 100 times larger!

By moving the temporary StringBuffer out of the loop, this new code creates a fraction of the objects: only two objects per loop (not per line). Since use() is overloaded to take a StringBuffer parameter, we never have to take the step of creating a String, and we save quite a bit on performance. This step, of course, requires that use() be overloaded to take a StringBuffer parameter to pass its text.


Wrap-up

Proper use of abstract classes can make your code more robust, and allow you to more easily make changes in the future without breaking your clients. The general rule of thumb is to use them instead of interfaces whenever multiple inheritance is not required, and consider adding abstract superclasses for many of your concrete classes. This links in with a use of factory methods for construction, which provide more flexibility and control of object creation than constructors do.

In the next column, I plan to discuss compatibility gotchas with serialization and changing method names. We will see why doing something as simple as adding a new method--with a new name--can break your clients!


Resources

If this topic is of interest to you, you might also look at:

  • Rich Gillam's "Java Liaison" column in the May issue of C++ Report.
  • "Well Mannered Objects" by the author,
    (http://www.ibm.com/java/education/portingc/WellMannered.html)
  • Factory methods are also used in a number of Design Patterns, such as the Singleton. See Design Patterns by Gamma, et al. for more information.

Feedback is welcome: new topics you'd like to see covered, questions about this column, or objections to what I've written.

I can be contacted at mark@macchiato.com; previous columns are archived at http://www.macchiato.com//

Copyright (c) 1999, Mark Davis, All rights reserved.