2008/04/05

The New Builder Pattern

The idea

I like to create immutable objects, especially after reading Josh Bloch's excellent "Effective Java" book. If an object is immutable, it has only one possible state and it is a stable one, so once you successfully build an object, you don't need to care about state transitions that can make your object unstable or corrupted. And immutable objects can be shared even in a multithreaded application. There are many other pros of immutability (you can read some of them here).

There is a classical way of making immutable objects in Java which consists of making all fields final (and private, of course), using only constructors to modify them (so that the only moment when a field is modified is during its construction) and making the class final (to avoid adding "setter" methods to subclasses). When you only have a couple of fields, that's fine, but when you have many of them you end up with a constructor with many arguments, which is ugly and difficult to use. If you have optional parameters, you can have a constructor with all the parameters and some other shorter constructors that have the mandatory parameters and some optional ones, that invoke the big constructor, like this:


public class Foo {

private final String mandatoryOne;
private final String mandatoryTwo;
private final String optionalOne;
private final String optionalTwo;

public Foo(String mOne, String mTwo, String optOne, String optTwo){
this.mandatoryOne = mOne;
this.mandatoryTwo = mTwo;
this.optionalOne = optOne;
this.optionalTwo = optTwo;
}

public Foo(String mOne, String mTwo, String optOne){
this(mOne, mTwo, optOne, null);
}
...
}


This can be a bit messy when you add more optional parameters, you end up with a lot of constructors like these and it has a lot of boilerplate code.The use of setters for the optional parameters is not an option, because this leads to non immutable objects (some object can change the state of your object with one of those setter methods).
Some time ago, thinking about this problem, I thought a solution could be to use a Javabean object, with one setter per field (even for the mandatory ones), but with a kind of "seal" method, that would "mark" the object as built and since that moment, an IllegalStateException would be thrown if a setter was called. Nevertheless, I wasn't very satisfied with this approach, because the setter methods that sometimes can be called and sometimes not would be confusing for the caller.

Then I found the New Builder pattern, explained by Josh Bloch in this PDF presentation, which is different from the original GoF Builder pattern. This pattern uses a public inner static class as a builder. The constructor of the original class is made private, so the only way to build objects is with the inner builder class. The builder has a setter method for each optional parameter and uses a fluent idiom that allows chaining of these method calls. I like this pattern a lot, because it solves the problem elegantly and effectively.

The implementation

In Josh Bloch's presentation there wasn't a detailed implementation of the pattern, although it was very clear the idea and the intention so I have searched for it in the Internet.

In Richard Hansen's blog you can find an implementation that seems to be more close to what Josh explains: the builder is a static nested class of the class from which it has to make instances, the builder's constructor is public (so you invoke the builder with 'new'), and the builder has the same fields as its enclosing class. The 'build()' method copies the content of the builder's fields into a new instance of the enclosing class. What I don't like about this implementation is this duplication of fields (for each field in the original class you have a duplicate field in the builder).

In Robbie Vanbrabant's blog there is a variation of this pattern, which avoids the boilerplate code using a base class for the builder and some reflection to build the object from the builder. I don't like the use of an interface for the builder, because that way you can't add a new optional parameter without breaking existing code that uses the builder (if you change the signature of a public interface the classes that use it have to change their code to implement the new methods). Update: This is not a problem at all, as Robbie points out in a comment. Also, I don't like the use of reflection because it's slower than the normal access to fields, but I do like the way it avoids duplication of fields in the builder.

The implementation I like most is the one found in Mario Hochreiter's blog. The builder is a nested public static class, but it changes the fields of its enclosing class directly, it doesn't use duplicates. It doesn't use reflection and the builder is a class, not an interface. The only problem I see is that, in theory, with a reference to a builder, you can change the state of the object it built, so you don't have the guarantee that the object is immutable. So I would add a check before each "setter" of the builder that would throw an IllegalStateException if the object has been already built and a check before the 'build' method itself to ensure the object is not built more than once. Also, I would make the mandatory parameters final. Update: I have made volatile the non final fields of the class in order to avoid problems of visibility if the reference returned by the method build() is passed to code to be executed in another thread, as Niklas points out in his comment.

So, with the example of Mario, I would implement this pattern this way:


public class ID3Tag {

private final String title;
private final String artist;
private volatile String album;
private volatile int albumTrack;
private volatile String comment;

public static class Builder {

private boolean isBuilt = false;
private ID3Tag id3tag;

public Builder(String title, String artist) {
id3tag = new ID3Tag(title, artist);
}

public Builder album(String val) {
if (isBuilt){
throw new IllegalStateException("The object cannot be modified after built");
}
id3tag.album = val;
return this;
}

public Builder albumTrack(int val) {
if (isBuilt){
throw new IllegalStateException("The object cannot be modified after built");
}
id3tag.albumTrack = val;
return this;
}

public Builder comment(String val) {
if (isBuilt){
throw new IllegalStateException("The object cannot be modified after built");
}
id3tag.comment = val;
return this;
}
// ... a lot more optional parameters

public ID3Tag build() {
if (isBuilt){
throw new IllegalStateException("The object cannot be built twice");
}
isBuilt = true;
return id3tag;
}
}

private ID3Tag(String title, String artist) {
this.title = title;
this.artist = artist;
}
}


The usage of this class would be:

ID3Tag tag = new ID3Tag.Builder("My Title", "My author")
.comment("Great song").build();


I have found a similar pattern, called the Essence pattern, described here and here by Dr Herbie. This pattern uses direct access to the fields of the builder (like in a C++ structure) instead of using "setter" methods and it doesn't use "chaining" of modifications like in the New Builder Pattern ("...builder.option1(value1).option2(value2)...").


11 comments:

Robbie Vanbrabant said...

As for my proxy implementation, I don't think performance is that much of an issue. I just ran a benchmark (chart: http://tinyurl.com/6akome, source: http://tinyurl.com/5wef7n) and performance is not that far off from regular Java objects. In general I think this is a penalty that you can ignore unless you'll create tons of objects in a tight loop.

That said, I'm also not sure what you mean with the interface problem. In theory that _could_ be a problem, but the whole point of using this proxy thing is not having to implement that interface yourself.

Xavi Miró said...

Hi, Robbie.

I've just seen your benchmarks and I still think performance can be an issue with this implementation. It will depend on the use of objects, but in your benchmarks the proxy implementation doubles the time of the manual implementation. This is a microbenchmark, so an application will not double its time simply by using the proxy builder, of course (there will be I/O and other objects that may consume much more time), but in applications where many of these objects are built it may be a performance penalty.

You're right about the interface problem. Although changing an interface in general can break the source code of classes that implement it, this interface is going to be implemented _only_ by the proxy builder, not by external clients, so it will not be a problem at all.

-- Xavi

Robbie Vanbrabant said...

I should probably also mention that I omitted three zeroes for the "Objects Created" X-Axis. So 20 = 20,000 and so on.

Xavi Miró said...

Thank you for your clarification, Robbie.

Although the implementation with the proxy builder is slower (a factor of 2) than the "manual" one, I agree with you, the time to build an object with your implementation will be tiny and nothing to be worried about, especially if we take into account this 1000 factor you mention in your last comment.

-- Xavi

Shripad Agashe said...

Hi Xavi,
I think we can make the product effectively mutable by packaging arrangements. I've provided more details on the same in my blog. http://sagashe.blogspot.com

All though packaging is not such good option for making objects immutable.

Xavi Miró said...

Hi, Amuktamuk.

The pattern I'm talking about in this blog entry is not the "classic" Builder (from the GOF patterns book), but a new one. It seems to me that you're talking about the classic one, which is not very familiar to me (I haven't used it yet and I only know a little bit from the GOF book). I know there are several possible ways of implementing it and it seems reasonable to me what you say to improve immutability (if you limit the access of the modification methods to the minimum, you improve the immutability of your class).

-- Xavi

Unknown said...

I think a better way to handle the Builder after calling build than using the isBuild boolean would be to have a Builder member within Builder and to set it to null inside build().

public class SomeObject {

private final Integer id;
private final String name;

...

public getId() {
return id;
}

public getName() {
return name;
}

private SomeObject(Builder builder) {
this.id = builder.id;
this.name = builder.name;
}

public static class Builder {

private Builder builder;

private Integer id;
private String name;

...

public Builder(Integer id) {
builder = new Builder();
builder.id = id;
}

private Builder() {
}

public Builder name(String name) {
validateBuilder();
builder.someField = someField;
return builder;
}

...

public SomeObject build() {
SomeObject someObject = new SomeObject(builder);

builder = null;

return someObject;
}

private void validateBuilder() {
if ( builder == null ) {
throw new IllegalStateException("You must instantiate a new Builder");
}
}



}

Xavi Miró said...

Hi, James.

Your suggestion is good, but it duplicates the fields (id is in SomeObject and in the Builder, and the same for the rest of fields). I prefer not to duplicate them.

-- Xavi

Anonymous said...

Hi, Robbie.

I like your solution, but I suggest you one little change:

private void check() {
if (isBuilt) {
throw new IllegalStateException("The object cannot be modified after built");
}
}

Thank you!
Martín.

Niklas Matthies said...

The "immutable" objects created with this non-copying pattern are not thread-safe. If the result of the build() method is passed to a different thread, this other thread may still see a previous state, only partially built. To be thread-safe, either all fields have to be final, or the object has to be published by assigning it to a final field. See http://jeremymanson.blogspot.com/2008/04/immutability-in-java.html for example. Using "final", and hence copying the fields at "build time", is essential to real immutability in Java.

Xavi Miró said...

Hi, Niklas.

Very good point, you're right. The JVM memory model does not ensure that the values of the non final fields would be visible from another thread. From what I've read, I think that making the non final fields 'volatile' would suffice, but this concurrency aspects are rather complex, so I will try to think about it a little bit before changing this code example.

Thank you for your comment.

-- Xavi