Durable Java

Mark Davis / http://www.macchiato.com/


Designing Durable APIs in Java

Durable Java | Immutables | Abstraction | Serialization | Liberté, Égalité, Fraternité | Hashing and Cloning

[Note to the editor: here is the contents of the left-hand first-page information. Both it and the column title have changed.]

Design your code from the start to be durable--so it can evolve without breaking your clients' code.

Dr. Mark Davis is lead architect at IBM's Center for Java Technology, Silicon Valley, co-founder and president of the Unicode Consortium, and architect for the bulk of JDK1.1 internationalization.

[Note to the editor: end of first-page left-hand information.]

Designing for larger-scale systems can be quite a bit different than for small programs. In particular, the cost of making incompatible changes becomes very high. Even within the same company this is the case, and when your product is library code available to clients incompatible changes can be either difficult or impossible. This means that you have to design your code to be durable--where the APIs can withstand the test of time, and can be extended without breaking client code.

Durability is not typically highlighted in object-oriented design books. In this column we'll focus in on particular techniques for ensuring that your Java APIs are durable. On the other side of the coin, there are a lot of non-durable APIs out in the world, so we'll also discuss defensive techniques for subclassers.

This column is the product of many years of wrestling with APIs for large object-oriented software systems, first in C++ and then in Java. While my primary area of expertise in software is internationalization, I've managed architecture design and review teams for very high level software on down to very low level. During this time, I've had the benefit of working with some really outstanding people, and will draw upon what we learned in working through the issues presented here.

This is a new column for JavaReport, and I welcome any feedback from readers, both on published columns and on topics they would like to see covered in the future. (If I run low on topics, I'll resort to writing columns on traveling in Europe--and I'm not sure Dwight would like that!). I can be reached at mark@macchiato.com. Here are some of the topics I'm planning for future columns:

!

I try to draw as many examples as I could from the Java API (both 1.1 and 1.2). This is because those APIs are more familiar to people, not because they are bad. While there are flaws in the Java APIs, that is to be expected from any product of such complexity--you'd rather hit your window of opportunity than ship absolutely perfect APIs!

Reliable Superclasses

Designing classes for subclassing is easy. Designing classes for reliable subclassing is not. It is very easy to muck this up: you can give your subclassers too much control, or too little control, or you can force subclasses to be fragile.

What do we mean by a fragile subclass? The specification for Java classes is customarily established by the javadoc documentation, which includes a text description and the public and protected method and field signatures. Any implementation change that stays within the spec is compatible. When you are subclassing a class, you may figure out a particular way to subclass based on fishing through the Java implementation to figure it out. But your subclass is then fragile--compatible changes can blow you out of the water.

Let's look at some of the problems, and how to avoid them.

Too much control

So what is the problem with giving subclasses too much control? The disadvantage is that it ties your hands for future subclassing. For example, if you make your fields public, then you have no control over when they get changed, and you can't decide that you want a different way of organizing your data later on. If the fields are dependent on one another, then you can't make sure that they are set in tandem, or that they don't end up with inconsistent values.

Privacy issues. Generally, it's a bad idea to have fields be anything but private. If they are private, you have complete control over access, and don't have to worry about users, subclassers, or other classes in your package doing unpleasant things behind your back. You can supply access with getters and setters having the appropriate access control: public, protected, private, or package-private.

Of course, there are many circumstances where it's reasonable to make the fields public. Look at Point, for example. It's simply a collection of two simple fields x and y, which are directly accessible for speed. The best case for making fields public is when: (1) your class is final, (2) the fields are independent: one doesn't depend on another, and (3) access speed is crucial. Yet even here, a modern JIT will make a final getter or setter be just as fast as a public field, so there is often little reason to have public fields.

Don't make internal fields package-private. Package-private fields can be accessed and changed by any class in the package, intentionally or unintentionally, behind your back and a subclasser's back. A future implementation change in any other class in that package--with no API or implementation change to your class-- could cause subclassers to malfunction. If package-private access is required, funnel it through a package-private getter or setter.

There are times when you want to give access to both subclassers and other classes in your package. Do this with two different methods, funneling through the protected method.

protected void setField(int value) {
 ...
}

void setFieldInternal(int value) { // package private
 setField(value);
}

Not enough control

Subclassers have insufficient control when they need to access or change private fields, but can't. They also can have insufficient control when your class (or classes in your package) go through back doors. Let's take BorderLayout, for example.

Two of BorderLayout's fields are fine, and accessible through public getters/setters: get/setHgap, get/setVgap. But it has five package-private fields, corresponding to the "North", "South", "East", "West", and "Center" parameters passed in to Component.add. Since these are inaccessible, the only way to subclass them is to have your own copies, and override every BorderLayout method that either uses or sets them. This means duplicating the vast majority of code in the class. (Moreover, the fields are package-private, not private, which makes them more fragile.)

To avoid this problem in your development process, start by making the class final. As a part of your testing, construct a substantial subclass--one that significantly modifies the behavior of your class. Once a class is tested for subclassability, it can be modified to allow sufficient access to its data and final could be lifted. Remember, removing final is a compatible change--adding it is not!

Call yourself. A class can also not permit sufficient control when it does not call its own setter when it is changing its internal state. By doing that, subclassers are not able to reliably override the setter, other than overriding all the methods that change the state. Look at the following example:

public class AClass {
 private int x;
 public void someMethod() {
  ...
  x = 5 * x;                                                 (1)
  // setX(5 * getX());                                       (2)
  ...
 }
 public void setX(int newValue) {
  x = newValue;
 }
 public int getX() {
  return x;
 }
}

If you wrote line (1), your subclasser couldn't just override setX--he would have to override someMethod and duplicate all of its implementation. By writing line (2) instead, your subclasser only needs to override setX. Unless a setter or getter is final, and thus can't be subclassed, always call it instead of accessing the field directly.

Call your super. When you override methods, make sure that you call your superclass's method at some point. Unless that superclass method is clearly spec'ed out to not need overriding, a compatible change could break your subclass. Look at the following simple cases.

public class Foo {
    ....
    public void setName(String name) {
        this.name = name;
        hashCache = name.hashCode();                   (1)
    }
    public int hashCode() {
        return hashCache;
    }
    protected String name;
    protected int hashCache;                           (2)
}

class Foo2 extends Foo {
    ....
    public void setName(String name) {
        this.name = name;                              (3)
        // super.setName(name);                        (4)
        containsSlash = name.indexOf('/') > 0;
    }
    boolean containsSlash;
}

In version 1, Foo does not have lines (1) and (2). If Foo2 uses line (3) then everything seems to work fine at first. But then along comes the author of Foo and makes a performance enhancement--all within spec--by adding (1) and (2). Foo2 then breaks, since its hashCode method now fails. If Foo2 is coded with line (4) instead, all is well in either case. (Of course, this also illustrates the value of having private classes--if name were private this wouldn't happen!)

Documentation!

Except for trivial classes, the methods signatures alone are not sufficient to tell someone how to subclass. If a class is designed to be subclassable, it must document how to subclass. With more complicated arrangements of objects, your objects' methods can be called by many other classes under a wide variety of circumstances. If you don't document how to subclass, it is very easy to for subclassers to produce fragile subclasses. The two main problem areas are where:

Which method to override? Suppose that I want to be notified whenever my Component changes its size. If Component were badly designed, it might change its size without calling an overridable method. Then I would be completely stuck.

Luckily, the designers of Component avoided that, and never set the size without calling an overridable method. But which ones? If we look at the signature for Component, we find a whole host of methods having to do with size and positioning.

If you examine the Java 1.1.4 code, you will find the following call chains:

 

setSize(Dimension)

resize(Dimension)

setSize(int, int)

resize(int, int)

setLocation(Point)

setLocation(int, int)

move(int, int)

 setBounds(Rectangle)

setBounds(int, int, int, int)

reshape(int, int, int, int)
* Be careful to catch all the uses of the method in all of your classes. For example, there is a flaw in the JavaSoft renaming of reshape. The method ScrollPane.layout() calls reshape directly, and will thus not go through an overridden setBounds(). This will cause subclasses to malfunction unless they override both method names.

That is, all of these methods funnel through one method: reshape(int, int, int, int). This is the primary method--all the others are secondary methods, and eventually just call the primary one. (Sometimes not along an especially efficient path!). Since this method was deprecated as part of a name change, you will also find by examining the code that all* of the 1.1 Java code funnels through the new name, setBounds(int, int, int, int). This is excellent--we only have to override one method, not all of them.

However, this funneling behavior is not part of the specification. That means that it is fragile: Sun or any Java licencee could change this behavior, and break your subclass. In this particular case, it is unlikely--but if you were paranoid you would not depend on unspecificied behavior, and override each of the methods.

Let's pull back a bit, and see what you, as the superclass writer, can do to avoid this problem. In general, there are two ways to provide for reliable subclassing in this kind of case.

In new code, do both of these. Marking a method final will let the compiler tell your subclassers they are not doing the right thing. Unfortunately, that also means that you can never fix an old method by marking it final without breaking compatibility. So for old methods that should not be overridden, the documentation alone will have to do.

Coordinated methods. Many methods are coordinated with one another. In order to effectively subclass, you need to override those methods as a group. For example, let's suppose that you are building a component that needs to rearrange its contents in response to a change in size. You could start by overriding setBounds(), but you will find out that you shouldn't relayout your component in response to this, you should override layout() and do the relayout there. You will then find out that layout() is only called on subclasses of Container, not on arbitrary subclasses of Component (even though it would be faster not to have to test for subclasses).

So even though Container has more fields and API than you really wanted, you will have to extend it instead of Component. Of course, none of this is documented in the API, so it is a matter of fishing through the implementation to figure it out. So, you have no choice but to have a fragile subclass.

There is no language mechanism in Java to indicate that two method are coordinated. The only way to prevent this is to get it documented in the superclass API documentation.


Here is a summary of the main guidelines for helping to make sure that your classes are designed for reliable subclassing.


Copyright (c) 1999, Mark Davis, All rights reserved.