Sunday, June 21, 2009

Shutdown: deterministic vs fuzzy

Since I published my lazy manifesto I have been gathering opinions on the soundness of it's various aspects. One great eye opener came form the good people at the Apache Felix mailing list. Turns out my lazy scheme contains one very deterministic aspect that permeates both the importer and the exporter policies: the desire to have a deterministic bundle shutdown. There is ofcource another way to shutdown a bundle, which for the fun of it I'll call "fuzzy" rather than "indeterministic". Let us now work our way backwards through the bundle lifecycle to discover how these two shutdown schemes lead to subtle but important differences in the importer synchronization schemes.

Deterministic

Under this scheme we want to guarantee that after a call to ServiceRegistration.unregister() returns there are no references to the service left in the wild. That includes both references from object fields and references from thread stacks. The cleanup of object references is achieved by calling synchronously the ServiceListeners plugged by each importer. In this way the importer has a chance to flush out all references it has stored in object fields. This however is not enough. There might be "in-flight" references stored in local variables. These reside on the stacks of active thread and we have no direct control over them. The only thing to do is to make sure unregister() blocks until all method calls that have the service referred from local variables complete. This means the importer must wrap every call to the service in a synchronized block. That block must use the same lock as the importer's ServiceListener. In this way the unregsiter() call will enter the importer and block until all calls to the service complete. Note that this means that the service unregistration can be postponed indefinitely if business control flow threads keep preempting the management control flow thread, which performs the bundle shutdown.

This is how we arrive at the following importer synchronization scheme:

/* 
 * Service tracking code 
 */
private final Object lock = "Lock for HelloService";
private HelloService serv;

void set(HelloService serv) {
  synchronized (lock) {
    this.serv = serv;
  }
}

void HelloServ get() {
  synchronized (lock) {
    if (serv == null) {
      throw new ServiceUnavailableException();
    }
    return serv;
  }
}

...

/* 
 * Service consuming code 
 */
synchronized (lock) {
  HelloService serv = get();
  serv.hello("OSGi");
}

One bad thing about this scheme is that we hold a lock while calling into another bundle. This lock is our own private lock, which rules out the possibility that code we can't control can wait on this lock. Still there is the potential for dead locks from other threads calling inside our bundle. Another drawback is that this scheme kills the potential for concurrency inside the importer. Any threads that pass through the bundle will have to line up for the private lock when they try to use the service. I have used a read-write lock to alleviate this problem and have even profiled this with the Peaberry benchmark. It turns out this is likely not prohibitively expensive but to be sure one must do a huge profiling job (different OSes, different JVMs, different CPUs). As you can see it is worth exploring alternatives that avoid these complications.

The return on our investment in importer synchronization code is the following bundle shutdown sequence:

  1. Unregister all services
    After this step we are free to tear the bundle internals as we wish. If there are any references left to our services this is a bug in the importer.
  2. Deactivate
    Stop threads, close sockets, dispose widgets. We don't have to care about synchronization of the code that contains these resources.
  3. Detach
    Null the references to what's left. This cuts the dead bundle internals from the activator and leaves them to the garbage collector.

As exporters we benefit from this scheme because the normal "work-well or fail clean" guarantees we have to provide for our services end after the first step. If an importer keeps calling our services after step one he can experience random buggy behavior like calls appearing to be successful but returning bad values. Even worse since our service code is not supposed to be called at this time we can end up with open files, unclosed sockets or other cleanup bugs.

The final remark I have on this shutdown scheme is that it heavily relies on the synchronous service event dispatch. I was quite surprised to discover I actually required this! I generally think the synchronous dispatch is dangerous precisely because it allows importers to block the management control flow. Also this creates the potential for unexpected call loops when the listener code finds a way to call back into the bundle that unregisters the service. Both of these can cause really nasty bugs.

Fuzzy

Under this scheme after ServiceRegistration.unregister() returns it is only guaranteed that references from object fields are flushed. References from thread stacks can remain. We can have less synchronization in the importer:

...
/* 
 * Service Consumer code 
 */
HelloService serv = get();
serv.hello("OSGi");

Now we remove the problems of holding locks when calling out to unknown code. Also the importer can be as concurrent as the service implementation allows. Less obligations for the importer translate to more obligations for the exporter. The service is now not permitted to ever exhibit random behavior. It must keep it's "work well or fail clean" contract forever. E.g. the service must be properly composed before it is exported and behave consistently until garbage collected. The service unregistration sits somewhere between these two points to mark the beginning of gradual (e.g. fuzzy) decline of service usage as late importers try to call and crash.

Except simplified importing code this scheme has the additional benefit of handling buggy importers that keep calling the stale service indefinitely. Under fuzzy shutdown these importers will at least crash cleanly rather than also cause trouble in the exporting bundle.

The Fuzzy shutdown sequence remains the same but with different meaning attached to each step:

  1. Unregister all services
    This is a bit redundant because all services will be unregistered by the OSGi framework after the shutdown completes. It still feels like good style to announce the imminent destruction before we close shop. This also caters to importers who choose to follow the deterministic shutdown import scheme for reasons of their own.
  2. Deactivate
    Stop threads, close sockets, dispose widgets. Here we do have to care about synchronization legitimate importers can still call in. This means that all OSGi services have to handle concurrent access either because of the application design or because they can be shutdown from another thread.
  3. Detach
    Null the references that attach the bundle internals to the activator. Now we leave behind a ball of fail-fast objects that will remain behind until all local variable references to the service drain away.

Notice that under Fuzzy shutdown the sum of tracking code inside each importer forms a kind of service cache. Each pair of private lock and service storage field form one cell of this cache. The application code than pulls objects out of the cache every time it needs a service. When the service is gone we have a cache miss - e.g. ServiceUnavailableException. The Peaberry framework makes this thread-safe cache explicit. Under the covers services are pulled into the cache upon a method call and linger inside for a period of several minutes. The only exception are services with sticky decorators. These are typically stateful services and must be kept around until the state loaded inside is used. This scheme is better than using a bunch of ServiceTrackers because each tracker will hold it's service for the entire bundle lifetime even if it was used only once to initialize something.

Java is traditionally fuzzy

There are other precedents in Java for fuzzy shutdown. Consider the way threads are stopped: raise a "shutdown" flag, close your resources in a threadsafe manner and let the thread expire from natural causes. In our case calling unregister() is equivalent to rising a boolean flag. Closing resources covers both disposal of non-memory resources and discontinuation of imports used by our service: both will cause exceptions to calling importers. Generally it seems to be the Java way to have a deterministic startup and a fuzzy shutdown. We can summarize this in one sentence:

Be consistent at all times.

Typically in Java we start application subsystems in a way that guarantees our objects are exposed to application control flow only after being fully set up. E.g. wire together a set of objects, configure them, and only then call start()/open() methods or, in OSGi's case, register the service. When the time comes to stop we remain consistent right up to the point where the garbage collector frees the memory. As we know this happens when truly no one uses the object any more. To precipitate the release of memory as a final act we cause the deterministic release of non-memory resources via close()/dispose() calls. Than we let nature take it's course.

Conclusion

To summarize:

  • Memory vs. Non-Memory
    The Deterministic shutdown scheme tries to guarantee the release of both memory and non-memory resources. The Fuzzy shutdown scheme guarantees the deterministic release of non-memory resources and counts on this to cause clients to release the memory resources as well. We can consider the services used by the bundle as a special case of non-memory resource because these are indeed resources external to the bundle that require explicit release.
  • Importer vs. Exporter
    The Deterministic shutdown scheme causes complexity in the importing code in order to avoid extra invalidation code in the exporter. The Fuzzy scheme embraces the invalidation code as a necessary consequence of being consistent at all times.

Which shutdown scheme is better? It depends on the cost of the invalidation code that needs to be added to the exporter. Empirical data shows that most services are designed for concurrent access anyway so adding an additional close()/dispose() method does not cost too much. This coast can even drop to zero if the service is stateless - just detach it from the activator and let it operate in "bypass mode" until dropped from all importers.

In my initial article on Lazy dynamics I advocated the deterministic shutdown scheme and was against invalidation code in the exporter. In this article I am inclined to drop the determinism in favor of the traditional fuzzy Java shutdown. The multiple simplified importers and the additional safety seem to justify a bit of extra complexity in the exporter. Notice that this is all about releasing resources, not about making the inherently indeterministic service dynamics predictable - we all know this is a lost cause. Under both shutdown schemes you still have to code for clean error handling.

Finally I would be grateful for any additional arguments for either case. Or maybe you have your own shutdown scheme to propose?

Saturday, June 20, 2009

Current state of affairs

These are my comments on this Caucho blog.

Eclipse - not a good OSGi example

Despite it's huge popularity Eclipse is far from being an archetypal example of OSGi usage. The main anti-OSGi feature of Eclipse is their decision not to use the OSGi service model. Instead they provide a complete replacement for OSGi services: the Extension Registry. As far as I know there are two reasons for this:

  • Legacy code: the extension registry was well established when Eclipse moved to OSGi and it would be too costly to toss it out.
  • Bootstrap time: Eclipse is a desktop app. It heavily relies on lazy class loading for improved startup times. The code OSGi service model requires the service implementation loaded before a service is exported. A major requirement of Eclipse is the ability to build the UI out of pure metadata and have the code loaded later. This can't be fulfilled using regular OSGi services.

The Eclipse extension registry is much more complex than the real OSGi service model, which is suitable for any app that does not have Eclipse's unique requirements. Also lately various runtimes have emerged on top of the OSGi service layer that allow the users to only load the service API at startup and postpone the loading of the implementation code until needed. Curiously the people from Eclipse seem to have played a significant role in the development of one of these runtimes - OSGi Declarative Services.

The OSGi dilemma

Generally speaking OSGi does not seem to tolerate gradual migration. You either embrace the high-isolation, service oriented model completely or suffer exploding complexity when trying for partial solutions. The problem here is that apparently any non-trivial Java app is organized around it's own specialized modularity (sharing classes), and service orientation (sharing instances) engines. This is especially true for infrastructural frameworks such as component runtimes and persistence managers. This transforms the migration to OSGi into a serious rewrite. As Eclipse shows if you are not satisfied with the OSGi service layer you can roll out your own relatively easily. The modularity however is the killer - you either let OSGi manage all the class loading for you or not use OSGi.

On top of it OSGi is a low level runtime core that merely enables these cool features but does not make it easy to use them. That is in fact a good design because to enable easy usage OSGi would need to specialize in a certain component/application model. The good new is that this modular core can easily be extended with multiple "opinionated" component runtimes. That is exactly what Spring Source are frantically coding right now. The even better news is that all of these component/application runtimes can interoperate transparently through the OSGi core.

Right now OSGi is hovering on the edge of being ready for prime-time. Many of the conveniences an enterprise developer is used to are yet to appear. So we have a double whammy:

  • Gradual migration of critical infrastructure is costly because it is either the best possible re-design: strong modularity, strong componentization or nothing.
  • This raises the entry cost for new applications and hinders the migration of existing ones.

I think this lack critical mass is the usual dilemma of any game-changing technology.

Join the fun

At the same time the core features of OSGi: strong modularity, service orientation and dynamics, are so compelling that OSGi evolution is rapidly driven by demand. For exmaple currently there are at least four frameworks that allow the writing of POJO applications on top of the OSGi service layer. Also the major enterprise infrastructures are being overhauled to work on OSGi. This is in fact the major focus of the due to be released OSGi 4.2 specification. The field is wide-open! There is no more exciting time to work on OSGi based infrastructure than right now when the future is being shaped.

As for the normal application developers who don't want to risk early adoption - they need to wait until the OSGi environment is properly set up for them. But be prepared. OSGi or something like it is inevitable in the near future.