Sunday, June 21, 2009

Shutdown: deterministic vs fuzzy

Since I published my lazy manifesto I have been gathering opinions on the soundness of it's various aspects. One great eye opener came form the good people at the Apache Felix mailing list. Turns out my lazy scheme contains one very deterministic aspect that permeates both the importer and the exporter policies: the desire to have a deterministic bundle shutdown. There is ofcource another way to shutdown a bundle, which for the fun of it I'll call "fuzzy" rather than "indeterministic". Let us now work our way backwards through the bundle lifecycle to discover how these two shutdown schemes lead to subtle but important differences in the importer synchronization schemes.

Deterministic

Under this scheme we want to guarantee that after a call to ServiceRegistration.unregister() returns there are no references to the service left in the wild. That includes both references from object fields and references from thread stacks. The cleanup of object references is achieved by calling synchronously the ServiceListeners plugged by each importer. In this way the importer has a chance to flush out all references it has stored in object fields. This however is not enough. There might be "in-flight" references stored in local variables. These reside on the stacks of active thread and we have no direct control over them. The only thing to do is to make sure unregister() blocks until all method calls that have the service referred from local variables complete. This means the importer must wrap every call to the service in a synchronized block. That block must use the same lock as the importer's ServiceListener. In this way the unregsiter() call will enter the importer and block until all calls to the service complete. Note that this means that the service unregistration can be postponed indefinitely if business control flow threads keep preempting the management control flow thread, which performs the bundle shutdown.

This is how we arrive at the following importer synchronization scheme:

/* 
 * Service tracking code 
 */
private final Object lock = "Lock for HelloService";
private HelloService serv;

void set(HelloService serv) {
  synchronized (lock) {
    this.serv = serv;
  }
}

void HelloServ get() {
  synchronized (lock) {
    if (serv == null) {
      throw new ServiceUnavailableException();
    }
    return serv;
  }
}

...

/* 
 * Service consuming code 
 */
synchronized (lock) {
  HelloService serv = get();
  serv.hello("OSGi");
}

One bad thing about this scheme is that we hold a lock while calling into another bundle. This lock is our own private lock, which rules out the possibility that code we can't control can wait on this lock. Still there is the potential for dead locks from other threads calling inside our bundle. Another drawback is that this scheme kills the potential for concurrency inside the importer. Any threads that pass through the bundle will have to line up for the private lock when they try to use the service. I have used a read-write lock to alleviate this problem and have even profiled this with the Peaberry benchmark. It turns out this is likely not prohibitively expensive but to be sure one must do a huge profiling job (different OSes, different JVMs, different CPUs). As you can see it is worth exploring alternatives that avoid these complications.

The return on our investment in importer synchronization code is the following bundle shutdown sequence:

  1. Unregister all services
    After this step we are free to tear the bundle internals as we wish. If there are any references left to our services this is a bug in the importer.
  2. Deactivate
    Stop threads, close sockets, dispose widgets. We don't have to care about synchronization of the code that contains these resources.
  3. Detach
    Null the references to what's left. This cuts the dead bundle internals from the activator and leaves them to the garbage collector.

As exporters we benefit from this scheme because the normal "work-well or fail clean" guarantees we have to provide for our services end after the first step. If an importer keeps calling our services after step one he can experience random buggy behavior like calls appearing to be successful but returning bad values. Even worse since our service code is not supposed to be called at this time we can end up with open files, unclosed sockets or other cleanup bugs.

The final remark I have on this shutdown scheme is that it heavily relies on the synchronous service event dispatch. I was quite surprised to discover I actually required this! I generally think the synchronous dispatch is dangerous precisely because it allows importers to block the management control flow. Also this creates the potential for unexpected call loops when the listener code finds a way to call back into the bundle that unregisters the service. Both of these can cause really nasty bugs.

Fuzzy

Under this scheme after ServiceRegistration.unregister() returns it is only guaranteed that references from object fields are flushed. References from thread stacks can remain. We can have less synchronization in the importer:

...
/* 
 * Service Consumer code 
 */
HelloService serv = get();
serv.hello("OSGi");

Now we remove the problems of holding locks when calling out to unknown code. Also the importer can be as concurrent as the service implementation allows. Less obligations for the importer translate to more obligations for the exporter. The service is now not permitted to ever exhibit random behavior. It must keep it's "work well or fail clean" contract forever. E.g. the service must be properly composed before it is exported and behave consistently until garbage collected. The service unregistration sits somewhere between these two points to mark the beginning of gradual (e.g. fuzzy) decline of service usage as late importers try to call and crash.

Except simplified importing code this scheme has the additional benefit of handling buggy importers that keep calling the stale service indefinitely. Under fuzzy shutdown these importers will at least crash cleanly rather than also cause trouble in the exporting bundle.

The Fuzzy shutdown sequence remains the same but with different meaning attached to each step:

  1. Unregister all services
    This is a bit redundant because all services will be unregistered by the OSGi framework after the shutdown completes. It still feels like good style to announce the imminent destruction before we close shop. This also caters to importers who choose to follow the deterministic shutdown import scheme for reasons of their own.
  2. Deactivate
    Stop threads, close sockets, dispose widgets. Here we do have to care about synchronization legitimate importers can still call in. This means that all OSGi services have to handle concurrent access either because of the application design or because they can be shutdown from another thread.
  3. Detach
    Null the references that attach the bundle internals to the activator. Now we leave behind a ball of fail-fast objects that will remain behind until all local variable references to the service drain away.

Notice that under Fuzzy shutdown the sum of tracking code inside each importer forms a kind of service cache. Each pair of private lock and service storage field form one cell of this cache. The application code than pulls objects out of the cache every time it needs a service. When the service is gone we have a cache miss - e.g. ServiceUnavailableException. The Peaberry framework makes this thread-safe cache explicit. Under the covers services are pulled into the cache upon a method call and linger inside for a period of several minutes. The only exception are services with sticky decorators. These are typically stateful services and must be kept around until the state loaded inside is used. This scheme is better than using a bunch of ServiceTrackers because each tracker will hold it's service for the entire bundle lifetime even if it was used only once to initialize something.

Java is traditionally fuzzy

There are other precedents in Java for fuzzy shutdown. Consider the way threads are stopped: raise a "shutdown" flag, close your resources in a threadsafe manner and let the thread expire from natural causes. In our case calling unregister() is equivalent to rising a boolean flag. Closing resources covers both disposal of non-memory resources and discontinuation of imports used by our service: both will cause exceptions to calling importers. Generally it seems to be the Java way to have a deterministic startup and a fuzzy shutdown. We can summarize this in one sentence:

Be consistent at all times.

Typically in Java we start application subsystems in a way that guarantees our objects are exposed to application control flow only after being fully set up. E.g. wire together a set of objects, configure them, and only then call start()/open() methods or, in OSGi's case, register the service. When the time comes to stop we remain consistent right up to the point where the garbage collector frees the memory. As we know this happens when truly no one uses the object any more. To precipitate the release of memory as a final act we cause the deterministic release of non-memory resources via close()/dispose() calls. Than we let nature take it's course.

Conclusion

To summarize:

  • Memory vs. Non-Memory
    The Deterministic shutdown scheme tries to guarantee the release of both memory and non-memory resources. The Fuzzy shutdown scheme guarantees the deterministic release of non-memory resources and counts on this to cause clients to release the memory resources as well. We can consider the services used by the bundle as a special case of non-memory resource because these are indeed resources external to the bundle that require explicit release.
  • Importer vs. Exporter
    The Deterministic shutdown scheme causes complexity in the importing code in order to avoid extra invalidation code in the exporter. The Fuzzy scheme embraces the invalidation code as a necessary consequence of being consistent at all times.

Which shutdown scheme is better? It depends on the cost of the invalidation code that needs to be added to the exporter. Empirical data shows that most services are designed for concurrent access anyway so adding an additional close()/dispose() method does not cost too much. This coast can even drop to zero if the service is stateless - just detach it from the activator and let it operate in "bypass mode" until dropped from all importers.

In my initial article on Lazy dynamics I advocated the deterministic shutdown scheme and was against invalidation code in the exporter. In this article I am inclined to drop the determinism in favor of the traditional fuzzy Java shutdown. The multiple simplified importers and the additional safety seem to justify a bit of extra complexity in the exporter. Notice that this is all about releasing resources, not about making the inherently indeterministic service dynamics predictable - we all know this is a lost cause. Under both shutdown schemes you still have to code for clean error handling.

Finally I would be grateful for any additional arguments for either case. Or maybe you have your own shutdown scheme to propose?

No comments: