Wednesday, August 19, 2009

Stateful vs Stateless

In this entry I will cautiously venture into the thorny territory of stateless-vs-stateful. For quite some time I have been frying my brain in attempts to introduce some structure into this space. The goal is to define patterns and practices about working with state under OSGi. Generally under OSGi bundles communicate with services so I will do my musings from this perspective.

Hotswapping and the Service Use Spike

"Why not simply use state as we like?" you might ask. As is common with OSGi the short answer is "Because of dynamics!". The longer answer is that under OSGi a bundle can be updated at runtime and this will kill any state accumulated inside the bundle. Ideally we want client bundles to perceive this event as a hotswap of all services the updated bundle exports. I will define a service as being capable of hotswap as

The ability for callers to swap the live service object and experience a timing hiccup as the only side effect.

Using services under OSGi always requires us to go through the same steps:

  1. get
    Service object is obtained.
  2. use
    Service object is called one or more times.
  3. release
    Service object is released

To reflect the peak of activity at the use step I decided to call this sequence the service use spike. The spike "rises" in the get step and "falls" in the release step. You can imagine these spikes drawn on a time axis as the pulse of some odd creature. Obviously a service can be hotswapped safely between the use spikes. The event of a bundle update can be represented as a vertical line cutting through the time axis. Because an update can happen at any time inevitably it will cut through some use spikes. The type of service determines how the bundles who own the interrupted spikes deal with the failure.

Stateless Services

These are the "classic" services. The result of call to a stateles service depends only on the passed parameters - i.e. is a direct function of the parameters. Note that there is no problem for the results to include side effects including database modifications. The single and rather big constraint that makes a service stateles is...

A service qualifies as stateles if it does not store unrecoverable data in the objects comprising it's implementation.

Here unrecoverable basically means data kept in memory rather than externalized into some data store. I.e. once the service is unregistered the data is lost to all clients. This definition permits stateles services to build up state as long they remain indistinguishable from a service that calculates everything at every method call. I.e. we permit intermediate computation caching to improve performance.

Logically the use spike of stateles services spans a single method call. Therefore to retry a use spike we need only the information about this call. In most cases this contextual information is available on the stack of the current thread. E.g. the thread is at a point where it tries to call out to the service and discovers it is not available. It is than free to retry the call against an alternative service or if none are present wait until it appears. This type of behavior is even supported by the OSGi standard service tracking utility.

For performance reasons the use spike of stateles services can be extended to span multiple method calls. This can be done by serving the get step from a cache that in turn obtains the services from the dynamic environment and holds on to them for a while. The release step is than postponed and the use spike is artificially extended. The net result - we do much less get and release than use and performance goes up.

Stateless services are used simultaneously by multiple clients and virtually always need to deal with issues of concurrent access.

Stateful Services

A result from a call to a stateful service depends on previous calls. Here two subsequent calls with identical parameters can yield different results. You can think of these as services with history where every call mutates some internal state and thus influences the outcome of subsequent calls. Because the correctness of our program depends on the accumulated history we care dearly to call the exact same object every time. This means that hotswapping stateful services is in the general case impossible.

Obviously the use spike of stateful services spans many calls, because the user must hold on to a service object as state is accumulated with every subsequent call. If the spike is interrupted at the N-th call to do a retry we need to playback all (N-1) calls leading up to the failure. It is impractical or downright impossible to keep such a call log. For this reason we can not hotswap stateful services. I.e. an update to a stateful service can not go unnoticed. We treat the disappearing of a stateful service as a catastrophic event. I.e. we throw an exception, unroll the stack up to a fault barrier, cleanup any private state we have associated with the stateful service, do contingency reactions.

Stateless services can serve only one client at a time and often are accessed sequentially by that client. So we can expect less concurrency issues here.

Because each client needs it's own copy of the service we need a factory mechanism via, which clients can produce new instances. The default OSGi factory mechanism is too restrictive. It caches service instances on per-bundle basis and does not allow parameters to be passed at construction time. For this reason often stateful services come with a supporting stateles factory service. This service has one or more factory methods that clients call to obtain the "real" stateful service objects. This approach has the drawback that the OSGi container no longer has visibility over these secondary services. As a consequence we count on the clients to not forget to call a close()/dispose() method on the stateful object when they are done with it. OSGi definitely needs an improved factory mechanism if it is to support stateful services as equal citizens. It must support construction parameters and relinquish the caching policy into the hands of the exporting bundle. This is a theme for a future post.

Once we have state we must worry about mutating it in a correct sequence. The beginning and the end of this sequence become particularly interesting when part of the state is stored into an external service. Under OSGi dynamics we can not know when the external service will come and go. So how can we keep our half of the state in lockstep with the dynamic half? This is how we arrive at the notion of a lifecycle and to the problem of how and when to synchronize the lifecycles of dependent "lumps of state". In short:

As soon as we add even one bit of mutable state (e.g. an availability flag) to our service, we also need to add lifecycle to manage that bit.

A service interface plus a lifecycle are one common definition of component. And here you have it! Managing stateful services leads to the need of a strong component model. What are the further repercussions of this I will analyze in future posts.

Conclusion

State is a fact of life in a general purpose programming setting. It is simply not convenient or practical to always externalize the state out of our service so we make it stateles. One example is a bundle that implements a network stack. The clients of this stack need to use Connection objects directly. It would be an atrocity to force them to hold on to a ConnectinData object and pass it to a stateles ConnectionService every time they want to read or write.

Here in lies the dilemma of state: an explicit separation between code (locked in stateles services) and data (locked in DTOs) permits a scalable, composable, hotswappable design. All nice features of a functional-like programming style. At the same time this flies in the face of conventional OO design. This stark switch in style between design in the small (inside the bundle) and design in the large (composing bundles into systems) can be a source of more problems than just going ahead and using state where it makes sense.

In this post I tried to identify the problems caused by state. The idea was to invent some reference framework to think about this stuff so I can get to real solutions in later posts. If OSGi is to become the universal JVM middleware it aspires to be we need clear practices and frameworks to deal with state. Even if the ultimate answer turns out to be "Just don't use stateful services!" the exploration leading up to that is worth it.

Friday, August 14, 2009

Classload acrobatics: code generation under OSGi

In a previous blog I mentioned that the hardest problem we face when porting existing Java infrastructure to OSGi has to do with class loading. This blog is dedicated to the AOP wrappers, ORM mappers and similar code generation engines that face the harshest issues in this area. I will gradually introduce the main problem, present the best current solution, and develop a tiny bit of code that implements it. This blog comes with a working demo project that contains not only the code presented here, but also two ASM-based code generators you can play with.

Classload site conversion

Usually porting a Java framework to OSGi requires it to be refactored to the extender pattern. This pattern allows the framework to delegate all class loading to OSGi and at the same time retain control over the lifecycle of application code. The goal of the conversion is to replace things like

Class appClass = Class.forName("com.acme.devices.SinisterEngine");
...
ClassLoader appLoader = ...
Class appClass = appLoader.loadClass("com.acme.devices.SinisterEngine")

with

Bundle appBundle = ...
Class appClass = appBundle.loadClass("com.acme.devices.SinisterEngine")

Although we must do a non-trivial piece of work to get OSGi to load the application code for us we at least have an nice and correct way to get things working. And work they will even better than before, because now the user can add/remove applications just by installing/uninstalling bundles into the OSGi container. Also the user can break up their application in as many bundles as he wishes share libraries between the applications and all that sweet modular stuff.

Adapter ClassLoader

Sometimes the code we convert has externalized it's class loading policy. This means the classes and methods of the framework take explicit ClassLoader parameters allowing us to dictate where they load application code from. In this case the conversion to OSGi can become a mere question of adapting a Bundle object to the ClassLoader API. This is done by what I call an adapter ClassLoader.

public class BundleClassLoader extends ClassLoader {
  private final Bundle delegate;

  public BundleClassLoader(Bundle delegate) {
    this.delegate = delegate;
  }

  @Override
  public Class<?> loadClass(String name) throws ClassNotFoundException {
    return delegate.loadClass(name);
  }
}

Now we can pass this adapter to the framework code. We can also add bundle tracking code to create the adapters as new bundles come and go. I.e. we are able to adapt a Java framework to OSGi "externally" avoiding the exhausting browsing through the codebase and the conversion of each individual classload site. Here is a highly schematic sample of some code that converts a framework to use OSGi class loading:

...
Bundle app = ...
BundleClassLoader appLoader = new BundleClassLoader(app);

DeviceSimulationFramework simfw = ...
simfw.simulate("com.acme.devices.SinisterEngine", appLoader);
...

Bridge ClassLoader

The coolest Java frameworks do fancy classworking on client code at runtime. The goal usually is to dynamically build classes out of stuff living in the application class space. Some examples are service proxies, AOP wrappers, and ORM mappers. Let's call these generated classes enhancements. Usually the enhancement implements some application-visible interface or extends an application-visible class. Sometimes additional interfaces and their implementations are mixed in as well.

Enhancements augment application code. I.e. the generated objects are meant to be called directly by the application. For example a service proxy is passed to business code to free it from the need to track a dynamic service. Similarly a wrapper that adds some AOP feature is passed to application code in place of the original object.

Enhancements start life as byte[] blocks produced by your favorite class engineering library (ASM, BCEL, CGLIB, ...). Once we have generated our class we must turn the raw bytes into a Class object. I.e. we must make some ClassLoader call it's defineClass() method on our bytes. We have three separate problems to solve:

  • Class space completeness
    First we must determine the class space, into which we can define our enhancements. It must "see" enough classes to allow the enhancements to be fully linked.
  • Visibility
    ClassLoader.defineClass() is a protected method. We must find a good way to call it.
  • Class space consistency
    Enhancements mix classes from the extender and the application bundles in a way that is "invisible" to the OSGi container. As a result the enhancements can potentially be exposed to incompatible versions of the same class.

Class space completeness

Enhancements are backed by code private to the Java framework that generates them. Therefore the extender should introduce the new class into it's own class space. On the other hand the enhancements implement interfaces or extend classes visible in the application class space. Therefore we should define the enhancement class there. Bummer!

Because there is no class space that sees all classes we require we have no other option but to make a new class space. A class space equals a ClassLoader instance so our first job is to maintain one dedicated ClassLoader on top of every application bundle. These are called bridge ClassLoaders, because they merge two class loaders by chaining them like so:

public class BridgeClassLoader extends ClassLoader {
  private final ClassLoader secondary;

  public BridgeClassLoader(ClassLoader primary, ClassLoader secondary) {
    super(primary);
  }

  @Override
  protected Class<?> findClass(String name) throws ClassNotFoundException {
    return secondary.loadClass(name);
  }
}

Now we can use the BundleClassLoader developed earlier:

  /* Application space */
  Bundle app = ...
  ClassLoader appSpace = new BundleClassLoader(app);

  /*
   * Extender space
   *
   * We assume this code is executed in a non-static method inside the extender
   */
  ClassLoader extSpace = getClass().getClassLoader();

  /* Bridge */
  ClassLoader bridge = new BridgeClassLoader(appSpace, extSpace);

This loader will serve requests first from the application space, and if that fails try the extender space. Notice that we still let OSGi do lot's of heavy lifting for us. When we delegate to either class space we are in fact delegating to an OSGi-backed ClassLoader. I.e. the primary and secondary loaders can delegate to other bundle loaders in accordance to the import/export metadata of their respective bundles.

At this point we might be pleased with ourselves - I was for quite some time. The bitter truth however is that the extender and application class spaces combined may not be enough. Everything hinges on the particular way the JVM links classes (also known as resolving classes).

In brief
JVM resolution works on a fine grained or sub-class level.

In detail
When the JVM links a class it does not need the complete descriptions of all classes referenced from the linked class. It only needs information about the individual methods, fields and types that are really used by the linked class. What to our intuition is a monolithic whole to the JVM is a class name, plus a superclass class, plus a set of implemented interfaces, plus a set of method signatures, plus a set of field signatures. All these symbols are resolved independently and lazily. For example to link a method call the class space of the caller needs to supply Class objects only for the target class and for all types used in the method signature. Definitions for the numerous other things that the target class may contain are not needed and the ClassLoader of the calling class will never receive a request for them.

Formally
Class TA from class space SpaceA must be represented by the same Class object in class space SpaceB if and only if:

  • There exists a class TB from SpaceB that refers to TA form it's symbol table (known also as the constant pool).
  • The OSGi container has chosen SpaceA as the provider of class TA for SpaceB.

By example
Imagine we have a bundle BndA that exports a class A. Class A has 3 methods distributed between 3 interfaces: IX.methodX(String), IY.methodY(String), IZ.methodZ(String). Imagine further we have a bundle BndB that has a class B. Somewhere in class B there is a reference A a = ... and a method call a.methodY("hello!"). To get class B to resolve we need to introduce into the class space of BndB class A, and class String. That's all! We don't need to import IX or IZ. We don't need to import even IY because class B does not use it - it uses only A. On the other hand when the exporting bundle BndA resolves class A it must supply IX, IY, IZ because they are directly referenced as interfaces implemented by class A. Finally even BndA does not have to supply any of the super-interfaces of IX, IY, IZ because they are not directly referenced from A.

Now let's imagine we want to present to class B an enhanced version of class A. The enhancement needs to extend class A and override some or all of it's methods. Because of that the enhancement needs to see the classes used in the signatures of all overridden methods. To supply all required classes BndB must contain code that calls each method we mean to override. Otherwise it will have not reason to import the required classes. It is very likely however that BndB calls only a few of A's methods. Therefore BndB likely does not see enough classes to support the enhancement. The complete set can only be supplied by BndA. Double Bummer!

Turns out that we must bridge not the extender and application spaces but the extender space and the space of the enhanced class. I.e. rather than "bridge per application space" we must shift to a "bridge per enhanced space". I.e. application really requires us to bridge the class space of some third party class it can see because it's bundle imports it. How do we do that transitive leap from the application space to the space the application uses? Simple! As we know every Class object can tell us, which is the class space where it is fully defined. For example all we need to do to get the defining class loader of A is to call A.class.getClassLoader(). In many cases however we have a String name rather than a Class object so how do we get A.class to begin with? Simple again! We can ask the application bundle to give us the exact Class object it sees under the name "A". This is a critical step because we need the enhanced and original classes to be interchangeable within the application. Out of potentially many available versions of class A we need to pick the class space of the one used by the application. Here is a schematic of how an extender can maintain a cache of class loader bridges:

...
/* Ask the app to resolve the target class */
Bundle app = ...
Class target = app.loadClass("com.acme.devices.SinisterEngine");

/* Get the defining class loader of the target */
ClassLoader targetSpace = target.getClassLoader();

/* Get the bridge for the class space of the target */
BridgeClassLoaderCache cache = ...
ClassLoader bridge = cache.resolveBridge(targetSpace);

where the bridge cache would look something like

public class BridgeClassLoaderCache {
  private final ClassLoader primary;
  private final Map<ClassLoader, WeakReference<ClassLoader>> cache;

  public BridgeClassLoaderCache(ClassLoader primary) {
    this.primary = primary;
    this.cache = new WeakHashMap<ClassLoader, WeakReference<ClassLoader>>();
  }

  public synchronized ClassLoader resolveBridge(ClassLoader secondary) {
    ClassLoader bridge = null;

    WeakReference<ClassLoader> ref = cache.get(secondary);
    if (ref != null) {
      bridge = ref.get();
    }

    if (bridge == null) {
      bridge = new BridgeClassLoader(primary, secondary);
      cache.put(secondary, new WeakReference<ClassLoader>(bridge));
    }

    return bridge;
  }
}

To prevent memory leaks due to ClassLoader retention I had to use both weak keys and weak values. The goal is to not retain the class space of an uninstalled bundle in memory. I had to use weak values because the value of each map entry references strongly the key thus negating it's weakness. This is the standard advice prescribed by the WeakHashMap javadoc. By using a weak cache I avoid the need to track a whole lot of bundles and do eager reactions to their lifecycles.

Visibility

Okay we finally have our exotic bridge class space. Now how do we define our enhancements in it? The problem as I mentioned is that defineClass() is a protected method of BridgeClassLoader. We could override it with a public method but that would be rude and we will have to code our own checks to see if the requested enhancement has already been defined. Normally defineClass() is called from findClass(), when it determines it can supply the requested class from a binary source. The only information findClass() must relay on to make this decision is the name of the class. So our BridgeClassLoader must think to itself:

This is a request for "A$Enhanced" so I must call the enhancement generator for class "A"! Than I call defineClass() on the produced byte[]. Than I return the new Class object.

There are two remarkable things about that statement.

  • We introduced a text protocol for the names of enhancement classes.
    We can pass to our ClassLoader a single item of data - a String for the name of the requested class. At the same time we need to pass two items of data - the name of the original class and a flag marking it as a subject to enhancement. We pack these two items into a single string of the form
    [name of target class]"$Enhanced"
    . Now findClass() can look for the enhancement marker $Enhanced and when it is present extract the name of the target class. In this way we also introduce a convention for the names of our enhancements. Whenever we see a class name sending on $Enhanced in a stack trace we know this is a dynamically generated class. To mitigate the risk of name clashes with normal classes we make the enhancement marker as exotic as Java allows (e.g. $__service_proxy__).
  • Enhancements are generated on demand.
    We will never try to generate an enhancement twice. The loadClass() method we inherited will first call findLoadedClass(), if that fails it will call parent.loadClass(), and only if that fails it will call findClass(). The fact that we use a strict protocol for the names guarantees findLoadedClass() will work the second time we get a request to enhance the same class. Couple this with the caching of bridge ClassLoaders and we get a pretty efficient solution where at no point we bridge the same bundle space twice or generate redundant enhancement classes.

Here we must also mention the option to call defineClass() through reflection. This approach is used by cglib. I suppose this is a viable option when we want the user to pass us a ready for use ClassLoader. By using reflection we avoid the need to create yet another loader on top of that just so we can access it's the defineClass() method.

Class space consistency

In the end of the day what we have done is to merge two class spaces that were not explicitly connected through the OSGi modular layer. Also we introduced a search order between those spaces similarly to the search order of the evil java class path. I.e. we have potentially eroded the class space consistency of the OSGi container. Here is a scenario of how bad things can happen:

  1. Extender uses package com.acme.devices and requires exactly version 1.0
  2. Application uses package com.acme.devices and requires exactly version 2.0.
  3. Class A refers directly to com.acme.devices.SinisterDevice.
  4. Class A$Enhanced refers directly to com.acme.devices.SinisterDevice from it's internal implementation.
  5. Because we search the application space first A$Enhanced will be linked against com.acme.devices.SinisterDevice version 2.0, while it's internal code was compiled against com.acme.devices.SinisterDevice version 1.0.

As a result the application will see mysterious LinkageErrors and/or ClassCastExceptions. Triple Bummer!

Alas there does not yet exist an automated way to handle this problem. We must simply make sure the enhancement code refers directly only to "very private" implementation classes that are not likely to be used by anyone else. We can even build private adapters for any external API's we might want to use and than refer to those from the enhancement code. Once we have a well defined implementation subspace we can use that knowledge to limit the class leakage. We now delegate to the extender requests only for the special subset of private implementation classes. This will also limit the search order problem allowing us to switch between application-first and extender-first search. One good policy to keep things under control is to have a dedicated package for all enhancement implementations. Than we only check for classes who's name begins with that package. Finally we sometimes need to judiciously relax this isolation policy for certain singleton packages like org.osgi.framework. I.e. we can feel pretty safe to compile our enhancement code directly against org.osgi.framework because at runtime everyone in the OSGi container will see the same org.osgi.framework - it is supplied by the OSGi core.

Putting it all together

Everything from this class loading saga can be distilled in the following ~100 lines of code.

public class Enhancer {
  private final ClassLoader privateSpace;
  private final Namer namer;
  private final Generator generator;
  private final Map<ClassLoader , WeakReference<ClassLoader>> cache;

  public Enhancer(ClassLoader privateSpace, Namer namer, Generator generator) {
    this.privateSpace = privateSpace;
    this.namer = namer;
    this.generator = generator;
    this.cache = new WeakHashMap<ClassLoader , WeakReference<ClassLoader>>();
  }

  @SuppressWarnings("unchecked")
  public <T> Class<T> enhance(Class<T> target) throws ClassNotFoundException {
    ClassLoader context = resolveBridge(target.getClassLoader());
    String name = namer.map(target.getName());
    return (Class<T>) context.loadClass(name);
  }

  private synchronized ClassLoader resolveBridge(ClassLoader targetSpace) {
    ClassLoader bridge = null;

    WeakReference<ClassLoader> ref = cache.get(targetSpace);
    if (ref != null) {
      bridge = ref.get();
    }

    if (bridge == null) {
      bridge = makeBridge(targetSpace);
      cache.put(appSpace, new WeakReference<ClassLoader>(bridge));
    }

    return bridge;
  }

  private ClassLoader makeBridge(ClassLoader targetSpace) {
    /* Use the target space as a parent to be searched first */ 
    return new ClassLoader(targetSpace) {
      @Override
      protected Class<?> findClass(String name) throws ClassNotFoundException {
        /* Is this used privately by the enhancements? */
        if (generator.isInternal(name)) {
          return privateSpace.loadClass(name);
        }

        /* Is this a request for enhancement? */
        String unpacked = namer.unmap(name);
        if (unpacked != null) {
          byte[] raw = generator.generate(unpacked, name, this);
          return defineClass(name, raw, 0, raw.length);
        }

        /* Ask someone else */
        throw new ClassNotFoundException(name);
      }
    };
  }
}

public interface Namer {
  /** Map a target class name to an enhancement class name. */
  String map(String targetClassName);

  /** Try to extract a target class name or return null. */
  String unmap(String className);
}

public interface Generator {
  /** Test if this is a private implementation class. */
  boolean isInternal(String className);

  /** Generate enhancement bytes */
  byte[] generate(String inputClassName, String outputClassName, ClassLoader context);
}

Enhancer captures only the bridging pattern. I have externalized the code generation logic into a pluggable Generator. The generator receives a context ClassLoader from where it can pull classes and use reflection on them to drive the code generation. The text protocol for the enhancement class names is also pluggable via the Namer interface. Here is a final schematic code for how such an enhancement framework can be used:

...
/* Setup the Enhancer on top of the current class space */
ClassLoader privateSpace = getClass().getClassLoader();
Namer namer = ...;
Generator generator = ...;
Enhancer enhancer = new Enhancer(privateSpace, namer, generator);
...

/* Enhance some class the app sees */
Bundle app = ...
Class target = app.loadClass("com.acme.devices.SinisterEngine");
Class<SinisterDevice> enhanced = enhancer.enhance(target);
...

The Enhancer framework presented above is more than pseudocode. In fact during the research of this blog I really built it and tested it with two separate code generators mixing it in the same OSGi container. The result was too fun to keep for myself so I put it up on Google Code for everyone to play:

Enhancer

Those interested in the class generation process itself can examine the two demo ASM-based generators. Those who read the InfoQ article on service dynamics may notice that the proxy generator uses as private implementation the ServiceHolder code presented there. I try to put my code where my mouth is.

Conclusion

The classload acrobatics resented here are used in a number of infrastructural frameworks under OSGi. Classload bridging is used by Guice, Peaberry and Spring Dynamic Modules to get their AOP wrappers and service proxies to work. I have seen classload adapter used on EclipseLink JPA to get it working on a non-equinox OSGi container. When I hear the Spring guys say they did serious work on Tomcat to adapt it to OSGi I imagine they had to do classload site conversion or a more serious refactor to externalize Tomcat's servlet class loading altogether.

Acknowledgements

Many of the lessons in this blog were extracted from the excellent code Stuart McCulloch wrote for Guice and Peaberry. For examples of industrial strength classload bridging look here and here. There you will see how to handle some additional aspects like security, the system class loader, better lazy caching, and concurrency. Thank you Stuart!

I also am obliged to this article by Peter Kriens. Before I read this I though I had OSGi class loading figured out. I hope my meticulous explanations on JVM linking will be a useful contribution to Peter's work. Thank you Peter!

Tuesday, August 04, 2009

Published on InfoQ

Some time ago a authored an article for InfoQ on the topic of service dynamics. I am proud to say my creation was finally published. The article is a refined fusion between two of my previous blogs with some added content. I write this entry for the people who have read those previous blogs. After all I have to cater to the few readers I have accumulated so far if I want to stand a chance of getting more people to comment on my ideas. So...

Precious readers of mine! Do not bother reading the entire InfoQ article - you will be disappointed to discover mostly rehashed old stuff. Instead jump straight to the Fighting Code Distortion section, where the most important new content is located.

Monday, July 13, 2009

OSGi: Go Forth And Extend

This entry is my take on a trait of the current OSGi architecture unparalleled to my knowledge in any other modular environment. Before I dive in I must introduce in brief said architecture. Detailed descriptions can be found in the public OSGi specification.

OSGi Layers

Modular

This is what OSGi is best known for. The modular layer introduces the notion of bundle. with strict visibility rules based on the bundle metadata. One interesting thing to notice is that the modular layer is static. If you code an application based on this layer you will not have any hot code update. The application would begin with a main() method contained in a bundle from where the first classes and threads will be created. These will only be classes that other bundles expose to the "main bundle". These classes can than turn around and create other classes based on what their respective bundles can see, and so the process continues. Here we can have multiple versions existing side by side and resource sharing is done via a graph of ClassLoaders rather than the traditional ClassLoader tree. The second interesting thing to notice is that this layer deals only with class sharing. E.g. we have no concept of sharing instantiated (or "live") objects between the bundles. In short the modular layer today is what project Jigsaw is going to be when Java 7 comes out. But I digress...

Lifecycle

Lifecycle binds the modular and service layers in a common state machine. This is where dynamics are introduced to OSGi. Some states deal with the modular layer and describe when a bundle can participate in class (and resource) sharing. These are "installed", "resolved", and "uninstalled". The rest deal with the service layer and describe when a bundle can participate in instance sharing (and code execution). These state are "starting", "active", and "stopping".

These two sets form two state machines with the service layer machine being embedded in the modular layer machine's "resolved" state. So we can break the lifecycle into two sublayers: modular lifecycle and service lifecycle.

Service

This is where the interesting stuff happens. Each bundle has an activation hook. When the bundle is activate the lifecycle layer calls the hook to make the Resolved->Starting->Active transition. When the bundle is stopped the lifecycle layer call the hook make the reverse Active->Stopping->Resolved transition. When activated the bundle hook object can mushroom into a runtime object structure. Some of the objects comprising this structure can be shared with other bundles. I.e. while the modular layer was about class sharing the service layer is about instance sharing. So the two layers complement each other. A bundle must first share the classes that comprise the API of another bundle before it can share runtime instances of these classes. Some of the activation hooks can also start threads to drive application control flow. To do the actual instance sharing each activation hook receives as a parameter a private accessor to the inter-bundle environment (the service registry).

Extender

The extender architectural pattern defines how a central bundle (the extender) can manage a certain facet of the logic of other bundles. This is kind of like dependency injection but instead of objects entire chunks of logic are injected! For example a Service Layer Runtime can drive the service interactions of a bundle based on metadata found in that bundle. Similarly an extender can manage the URL space of another bundle by discovering and registering it's servlets and resources with the HTTP server. Yet another extender can set up the database and the JPA entity manager. The applications of the pattern are endless. It's key elements are:

  1. Track the lifecycle of other bundles.
  2. Search newly resolved bundles for relevant metadata and interpret it.
  3. If needed load resources from the bundle.
  4. If needed load classes from the bundle and instantiate them.
  5. If needed export the instantiated objects on behalf of the bundle (as services).
  6. If needed import live objects (e.g. services) on behalf of the bundle.
  7. If needed start threads to drive the internals of the bundle.

Looking at this list we might be reminded of the way the operations performed by the OSGi core (i.e. the "framework") . And this is exactly the case...

The bootstrap extender

The OSGi service layer follows the extender pattern quite closely:

  1. Track the lifecycle of other bundles.
    We can imagine the modular layer notifies the service layer the bundle has entered the Resolved state.
  2. Search newly resolved bundles for relevant metadata and interpret it.
    The service layer looks for a Bundle-Activator header in the bundle manifest and interprets it as a fully qualified class name.
  3. If needed load classes from the bundle.
    The service layer loads the BundleActivator class through the bundle's own ClassLoader.
  4. If needed instantiate the loaded classes.
    The BundleActivator is instantiated using it's default constructor.
  5. If needed start threads to drive the internals of the bundle.
    To do this the service layer calls the activator, which can if it so chooses start threads to drive application control flow.

So there we are - the service layer of the OSGi framework is nothing more than the first ever extender. The service registry is a runtime structure maintained by this extender and is completely separate from the bundle graph maintained at the modular layer. Part of the methods of BundleContext and Bundle form a separate sub-API that has nothing to do with bundles:

  • From BundleContext we have getServiceReference(),getServiceReferences(), registerService(),addServiceListener(),addRemoveListener()
  • From Bundle we have getRegisteredServices(),getServicesInUse()

In fact these methods can be moved into a ServiceContext, leaving Bundle/BundleContext to deal only with resource searching and class loading.

Now any bundle can do it

Before OSGi 4.0 it was exclusively the prerogative of the OSGi core (bundle 0) to play god with other bundles. This is because the OSGi API did not provide a way for one bundle to cause another bundle to load a class. I.e. it was not possible for one bundle to drive the module-level behavior of another bundle. OSGi 4.0 gave the fire of the gods to mortal bundles via a single method called Bundle.loadClass(). This essentially enabled the extender pattern. Now a bundle can instantiate classes belonging to another bundle. Later OSGi 4.1 increased this power via the Bundle.getBundleContext() method. Now an extender can also control the service-layer interactions of another bundle.

Let us now examine what doors into the OSGi core an extender can use to implement each major step of it's operation:

  1. Track the lifecycle of other bundles.
    Use BundleContext.addBundleListener(). The extender should add the SynchronousBundleListener flavor. This guarantees that all methods on Bundle and BundleContext that depend on the state of the extended bundle will work as expected. I.e. the extender will always observe the target bundle in it's current state.
  2. Search newly resolved bundles for relevant metadata.
    Use Bundle.getHeaders, Bundle.getEntry , Bundle.getEntryPaths , Bundle.findEntries to search through the content of the bundle. Using these methods it is even possible to obtain every single class file in the bundle and use that information (i.e. the class names) to load the classes later on. The important trait of all these methods is that none of them pass through the ClassLoader of the extended bundle. So the search will not cause premature resource consumption.
  3. Load resources from the bundle.
    Use Bundle.getResource, Bundle.getResources. These use the ClassLoader and cause class-space resolution for the extended bundle. On the other hand they can be used to search through jars embedded in the extended bundle.
  4. If needed load classes from the bundle.
    Use Bundle.loadClass on the extended bundle. This uses that bundle's ClassLoader to get a Class object. I.e. the extender looks "through the eyes" of another bundle and that bundle's imports and exports alone determine if the desired class can be loaded. At the same time the loaded class does not have to be exported from the target bundle. That's right - the extender can touch into the internal classes of another bundle as long as it knows their fully qualified names. These names are obtained in the "metadata hunt" that took place in the previous phase. Notice that we can manipulate these classes because the reflection API is common to all bundles and is provided by the system ClassLoader.
  5. Import/Export services on behalf of the bundle.
    Use Bundle.getBundleContext() on the manipulated bundle. To export services we must load their implementation classes as described above, create instances via reflection and than register using the context of the manipulated bundle.

Using steps 1 through 4 we can implement the OSGi service registry as a bundle. Using step 5 we can bridge our home-grown registry with the one provided by the OSGi core.

Eclipse - extenders only

In fact there is a project out there that uses these exact means to implement it's own complete replacement of the OSGi service registry. This ofcource is the Eclipse project and it's extension registry. As far as I know Eclipse does not take step 5. It is still possible to use both OSGi services and Eclipse's own extension-point/extension system in one application because bundles are still activated via the standard OSGi hook, which gives them the opportunity to step into the OSGi service layer. Despite this until recently it was relatively hard to mix extensions and services. Today the Peaberry project finally enables the seamless mixing of extensions and classic OSGi services.

Actually the entire extension-point/extension model is built on the extender pattern. Each extension point is in fact an extender - it pulls resources and data from another bundle and mixes them into a runtime structure that it hosts. Eclipse goes one step further: if each extension point must have code that hunts for metadata inside other bundles why not extract this functionality into a common extender. That extender will assemble the metadata from all bundles and provide a metadata repository accessible to everyone. This is kind of a meta-extender. In Eclipse's this metadata repository is called the the extension registry.

OSGi - extenders vs. services

Eclipse is a vivid demonstration of the maturity that OSGi has reached. With the addition of a couple of methods the OSGi API can be split almost cleanly into modular-API and service-API. Subsequently two types of bundles emerge: extenders and applications. The extenders operate almost purely on top of the modular half of the API: they pull out classes and resources from the application bundles and mix them into runtime structures. These structures do not have to reflect the underlying bundles in any way. E.g. looking at the way an extender has arranged servlets into an alias space we can no longer see the boundaries between the bundles from, which these servlets were loaded. The applications are now full of metadata that describes to various extenders, which pieces of the app need to be pulled out and mixed into the runtime structure maintained by the respective extender. A bundle that acts as an extender with respect to a certain type of application bundles can be an application with respect to another extender. Replace "extender" with "extension-point" and "application" with "extension" and we get the Eclipse runtime model. Unlike Eclipse, OSGi provides a simpler, lower level API that places no restrictions on the format of the metadata. The downside is that extenders have to do the bundle tracking and metadata parsing by themselves. For an in-depth comparison between the Eclipse and OSGi runtime models read this excellent article by Neil Bartlett

Than there is the bootstrap extender: the service layer. Unlike most extenders this one hosts a truly generic runtime structure: the service registry. In a sense the service layer pulls out objects from other bundles and mixes them into this runtime structure. The service registry does preserve the division between bundles - we can say that the service and modular layers of OSGi are "aligned" with respect to bundle boundaries. I suppose this alignment has caused people to believe the two layers are inseparable when in fact they represent two distinct but complementary halves of the OSGi runtime. Because the service registry is available on any OSGi framework bundles that can't subscribe to a specialized extender can always opt for the generic playground hosted by the OSGi service layer. I.e. (most) applications choose to use the service half of the OSGi API, but (as Eclipse demonstrates) this is no longer mandatory.

All of this brings up the question if the OSGi API is not due for a makeover. Maybe we need to explicitly split the modular and service realms? I have some wildly speculative ideas on the subject, which warrant their own blog entry. Until then - Go forth and extend!

Sunday, June 21, 2009

Shutdown: deterministic vs fuzzy

Since I published my lazy manifesto I have been gathering opinions on the soundness of it's various aspects. One great eye opener came form the good people at the Apache Felix mailing list. Turns out my lazy scheme contains one very deterministic aspect that permeates both the importer and the exporter policies: the desire to have a deterministic bundle shutdown. There is ofcource another way to shutdown a bundle, which for the fun of it I'll call "fuzzy" rather than "indeterministic". Let us now work our way backwards through the bundle lifecycle to discover how these two shutdown schemes lead to subtle but important differences in the importer synchronization schemes.

Deterministic

Under this scheme we want to guarantee that after a call to ServiceRegistration.unregister() returns there are no references to the service left in the wild. That includes both references from object fields and references from thread stacks. The cleanup of object references is achieved by calling synchronously the ServiceListeners plugged by each importer. In this way the importer has a chance to flush out all references it has stored in object fields. This however is not enough. There might be "in-flight" references stored in local variables. These reside on the stacks of active thread and we have no direct control over them. The only thing to do is to make sure unregister() blocks until all method calls that have the service referred from local variables complete. This means the importer must wrap every call to the service in a synchronized block. That block must use the same lock as the importer's ServiceListener. In this way the unregsiter() call will enter the importer and block until all calls to the service complete. Note that this means that the service unregistration can be postponed indefinitely if business control flow threads keep preempting the management control flow thread, which performs the bundle shutdown.

This is how we arrive at the following importer synchronization scheme:

/* 
 * Service tracking code 
 */
private final Object lock = "Lock for HelloService";
private HelloService serv;

void set(HelloService serv) {
  synchronized (lock) {
    this.serv = serv;
  }
}

void HelloServ get() {
  synchronized (lock) {
    if (serv == null) {
      throw new ServiceUnavailableException();
    }
    return serv;
  }
}

...

/* 
 * Service consuming code 
 */
synchronized (lock) {
  HelloService serv = get();
  serv.hello("OSGi");
}

One bad thing about this scheme is that we hold a lock while calling into another bundle. This lock is our own private lock, which rules out the possibility that code we can't control can wait on this lock. Still there is the potential for dead locks from other threads calling inside our bundle. Another drawback is that this scheme kills the potential for concurrency inside the importer. Any threads that pass through the bundle will have to line up for the private lock when they try to use the service. I have used a read-write lock to alleviate this problem and have even profiled this with the Peaberry benchmark. It turns out this is likely not prohibitively expensive but to be sure one must do a huge profiling job (different OSes, different JVMs, different CPUs). As you can see it is worth exploring alternatives that avoid these complications.

The return on our investment in importer synchronization code is the following bundle shutdown sequence:

  1. Unregister all services
    After this step we are free to tear the bundle internals as we wish. If there are any references left to our services this is a bug in the importer.
  2. Deactivate
    Stop threads, close sockets, dispose widgets. We don't have to care about synchronization of the code that contains these resources.
  3. Detach
    Null the references to what's left. This cuts the dead bundle internals from the activator and leaves them to the garbage collector.

As exporters we benefit from this scheme because the normal "work-well or fail clean" guarantees we have to provide for our services end after the first step. If an importer keeps calling our services after step one he can experience random buggy behavior like calls appearing to be successful but returning bad values. Even worse since our service code is not supposed to be called at this time we can end up with open files, unclosed sockets or other cleanup bugs.

The final remark I have on this shutdown scheme is that it heavily relies on the synchronous service event dispatch. I was quite surprised to discover I actually required this! I generally think the synchronous dispatch is dangerous precisely because it allows importers to block the management control flow. Also this creates the potential for unexpected call loops when the listener code finds a way to call back into the bundle that unregisters the service. Both of these can cause really nasty bugs.

Fuzzy

Under this scheme after ServiceRegistration.unregister() returns it is only guaranteed that references from object fields are flushed. References from thread stacks can remain. We can have less synchronization in the importer:

...
/* 
 * Service Consumer code 
 */
HelloService serv = get();
serv.hello("OSGi");

Now we remove the problems of holding locks when calling out to unknown code. Also the importer can be as concurrent as the service implementation allows. Less obligations for the importer translate to more obligations for the exporter. The service is now not permitted to ever exhibit random behavior. It must keep it's "work well or fail clean" contract forever. E.g. the service must be properly composed before it is exported and behave consistently until garbage collected. The service unregistration sits somewhere between these two points to mark the beginning of gradual (e.g. fuzzy) decline of service usage as late importers try to call and crash.

Except simplified importing code this scheme has the additional benefit of handling buggy importers that keep calling the stale service indefinitely. Under fuzzy shutdown these importers will at least crash cleanly rather than also cause trouble in the exporting bundle.

The Fuzzy shutdown sequence remains the same but with different meaning attached to each step:

  1. Unregister all services
    This is a bit redundant because all services will be unregistered by the OSGi framework after the shutdown completes. It still feels like good style to announce the imminent destruction before we close shop. This also caters to importers who choose to follow the deterministic shutdown import scheme for reasons of their own.
  2. Deactivate
    Stop threads, close sockets, dispose widgets. Here we do have to care about synchronization legitimate importers can still call in. This means that all OSGi services have to handle concurrent access either because of the application design or because they can be shutdown from another thread.
  3. Detach
    Null the references that attach the bundle internals to the activator. Now we leave behind a ball of fail-fast objects that will remain behind until all local variable references to the service drain away.

Notice that under Fuzzy shutdown the sum of tracking code inside each importer forms a kind of service cache. Each pair of private lock and service storage field form one cell of this cache. The application code than pulls objects out of the cache every time it needs a service. When the service is gone we have a cache miss - e.g. ServiceUnavailableException. The Peaberry framework makes this thread-safe cache explicit. Under the covers services are pulled into the cache upon a method call and linger inside for a period of several minutes. The only exception are services with sticky decorators. These are typically stateful services and must be kept around until the state loaded inside is used. This scheme is better than using a bunch of ServiceTrackers because each tracker will hold it's service for the entire bundle lifetime even if it was used only once to initialize something.

Java is traditionally fuzzy

There are other precedents in Java for fuzzy shutdown. Consider the way threads are stopped: raise a "shutdown" flag, close your resources in a threadsafe manner and let the thread expire from natural causes. In our case calling unregister() is equivalent to rising a boolean flag. Closing resources covers both disposal of non-memory resources and discontinuation of imports used by our service: both will cause exceptions to calling importers. Generally it seems to be the Java way to have a deterministic startup and a fuzzy shutdown. We can summarize this in one sentence:

Be consistent at all times.

Typically in Java we start application subsystems in a way that guarantees our objects are exposed to application control flow only after being fully set up. E.g. wire together a set of objects, configure them, and only then call start()/open() methods or, in OSGi's case, register the service. When the time comes to stop we remain consistent right up to the point where the garbage collector frees the memory. As we know this happens when truly no one uses the object any more. To precipitate the release of memory as a final act we cause the deterministic release of non-memory resources via close()/dispose() calls. Than we let nature take it's course.

Conclusion

To summarize:

  • Memory vs. Non-Memory
    The Deterministic shutdown scheme tries to guarantee the release of both memory and non-memory resources. The Fuzzy shutdown scheme guarantees the deterministic release of non-memory resources and counts on this to cause clients to release the memory resources as well. We can consider the services used by the bundle as a special case of non-memory resource because these are indeed resources external to the bundle that require explicit release.
  • Importer vs. Exporter
    The Deterministic shutdown scheme causes complexity in the importing code in order to avoid extra invalidation code in the exporter. The Fuzzy scheme embraces the invalidation code as a necessary consequence of being consistent at all times.

Which shutdown scheme is better? It depends on the cost of the invalidation code that needs to be added to the exporter. Empirical data shows that most services are designed for concurrent access anyway so adding an additional close()/dispose() method does not cost too much. This coast can even drop to zero if the service is stateless - just detach it from the activator and let it operate in "bypass mode" until dropped from all importers.

In my initial article on Lazy dynamics I advocated the deterministic shutdown scheme and was against invalidation code in the exporter. In this article I am inclined to drop the determinism in favor of the traditional fuzzy Java shutdown. The multiple simplified importers and the additional safety seem to justify a bit of extra complexity in the exporter. Notice that this is all about releasing resources, not about making the inherently indeterministic service dynamics predictable - we all know this is a lost cause. Under both shutdown schemes you still have to code for clean error handling.

Finally I would be grateful for any additional arguments for either case. Or maybe you have your own shutdown scheme to propose?

Saturday, June 20, 2009

Current state of affairs

These are my comments on this Caucho blog.

Eclipse - not a good OSGi example

Despite it's huge popularity Eclipse is far from being an archetypal example of OSGi usage. The main anti-OSGi feature of Eclipse is their decision not to use the OSGi service model. Instead they provide a complete replacement for OSGi services: the Extension Registry. As far as I know there are two reasons for this:

  • Legacy code: the extension registry was well established when Eclipse moved to OSGi and it would be too costly to toss it out.
  • Bootstrap time: Eclipse is a desktop app. It heavily relies on lazy class loading for improved startup times. The code OSGi service model requires the service implementation loaded before a service is exported. A major requirement of Eclipse is the ability to build the UI out of pure metadata and have the code loaded later. This can't be fulfilled using regular OSGi services.

The Eclipse extension registry is much more complex than the real OSGi service model, which is suitable for any app that does not have Eclipse's unique requirements. Also lately various runtimes have emerged on top of the OSGi service layer that allow the users to only load the service API at startup and postpone the loading of the implementation code until needed. Curiously the people from Eclipse seem to have played a significant role in the development of one of these runtimes - OSGi Declarative Services.

The OSGi dilemma

Generally speaking OSGi does not seem to tolerate gradual migration. You either embrace the high-isolation, service oriented model completely or suffer exploding complexity when trying for partial solutions. The problem here is that apparently any non-trivial Java app is organized around it's own specialized modularity (sharing classes), and service orientation (sharing instances) engines. This is especially true for infrastructural frameworks such as component runtimes and persistence managers. This transforms the migration to OSGi into a serious rewrite. As Eclipse shows if you are not satisfied with the OSGi service layer you can roll out your own relatively easily. The modularity however is the killer - you either let OSGi manage all the class loading for you or not use OSGi.

On top of it OSGi is a low level runtime core that merely enables these cool features but does not make it easy to use them. That is in fact a good design because to enable easy usage OSGi would need to specialize in a certain component/application model. The good new is that this modular core can easily be extended with multiple "opinionated" component runtimes. That is exactly what Spring Source are frantically coding right now. The even better news is that all of these component/application runtimes can interoperate transparently through the OSGi core.

Right now OSGi is hovering on the edge of being ready for prime-time. Many of the conveniences an enterprise developer is used to are yet to appear. So we have a double whammy:

  • Gradual migration of critical infrastructure is costly because it is either the best possible re-design: strong modularity, strong componentization or nothing.
  • This raises the entry cost for new applications and hinders the migration of existing ones.

I think this lack critical mass is the usual dilemma of any game-changing technology.

Join the fun

At the same time the core features of OSGi: strong modularity, service orientation and dynamics, are so compelling that OSGi evolution is rapidly driven by demand. For exmaple currently there are at least four frameworks that allow the writing of POJO applications on top of the OSGi service layer. Also the major enterprise infrastructures are being overhauled to work on OSGi. This is in fact the major focus of the due to be released OSGi 4.2 specification. The field is wide-open! There is no more exciting time to work on OSGi based infrastructure than right now when the future is being shaped.

As for the normal application developers who don't want to risk early adoption - they need to wait until the OSGi environment is properly set up for them. But be prepared. OSGi or something like it is inevitable in the near future.

Tuesday, May 19, 2009

Service dynamics: the lazy man's way

The Problem

There is no doubt that the hardest topic in OSGi is how to deal with service dynamics. In this article I will give you the complete epic story of my suffering and enlightenment on the subject. I will start with the basic nature of the problem and than present two different ways to solve it. There are two key factors that make service dynamics fiendishly hard to get right.

Concurrency

Before I go further I feel obliged to explain one basic and somewhat startling fact: the OSGi container in practice does not run threads of it's own! It is merely a "dead" threadsafe object structure on the heap. The "main" thread is used to setup this structure and start the initial set of bundles. Than it goes to sleep - it's function to merely prevent the JVM from shutting down due to lack of living threads. The "main" thread is typically awaken only at the end of the container shutdown sequence when all other threads are supposed to be dead. It is used to perform some final cleanup before it also dies and lets the JVM exit. This means that all useful work must be done by threads started from bundles during that initial startup sequence. I call these "active bundles". Usually the majority of bundles are "passive bundles". These don't start threads from their BundleActivator.start(). Instead they setup the imports of some service objects, which are than composed into new service objects, which are finally exported. After the start() call returns the bundle just sits there and waits for a thread to call it's exported services. As elegant and lightweight all this might be it also means that the OSGi container does not enforce any threading model - it steps aside and lets the bundles sort it all out between themselves. The container object structure acts as a "passive bundle" (a bundle with ID 0 in fact) getting animated only when a thread from an "active bundle" calls in to perform some interaction with another bundle or with the container itself. Because at any time a random number of threads can call into the OSGi core the container implementers have their work cut out for them. You as an application coder are also no exempt from the suffering.

The concurrency factor is is than this: at all times an OSGi application is subjected simultaneously to two independent control flows. These are the "business logic flow" and the "dynamic management flow". The first one represents the useful work done by the app and has nothing to do with OSGi. Here you choose the design (thread pools, work queues etc) and code your bundles to follow the rules. The second control flow however is entirely out of your hands. It takes place generally when some management application you don't know about plays with the lifecycle of the bundles (this includes installing and uninstalling). Often there are more than one such applications - each with it's own threading rules just like your app. Some examples include provisioning over HTTP, management and monitoring over JMX, even a telnet console interface. Each of these can reach through the OSGi core, call BundleActivator.stop() on a bundle you depend on, and cause the withdrawal of a service you require. When this happens you must be ready to cooperate in the release of the service object. This odd arrangement is explained with the second factor I mentioned.

Direct service references

The second factor has to do with the way objects are exchanged between bundles. Here again OSGi is non-intrusive and lightweight: an importing bundle holds a direct reference to the object owned by the exporting bundle. The chief benefit of this design is that OSGi does not introduce method call overhead between bundles - calling a service is just as fast as calling a privately owned object. The downside is that the importing bundle must cooperate with the exporting bundle to properly release the service object. If an importer retains a reference to the dead service multiple harmful effects take place:

  • Random effects from calls to the half-released service object.
    Because the service object is no longer backed by a bundle calling it can yield anything from wrong results, to random runtime exceptions, to a some flavor of IllegalStateException the exporter has chosen to mark invalid services.
  • Memory leaks because of ClassLoader retention.
    The ClassLoader of the exporter bundle will remain in memory even if the bundle is uninstalled. Obviously each object on the heap must have a concrete implementing class, which in this case is provided by the dead bundle. This leak will happen even if the importer sees the service object through an interface loaded from a third library bundle.

All this means that the importer must track the availability of the service at all times and release all references to the service object when it detects it is going down. Conversely when the service goes back online it must be picked up and propagated to the points where it is used.

The deadly combination

Now let's examine in detail how Concurrency and Direct Service References play together when a service is released. Because we have two execution flows (concurrency), which access the same object reference (direct references) we must synchronize carefully. To aid you in this matter OSGi notifies importers about service state changes in the same thread that executes the service unregistration (e.g synchronously). In other words the management control flow passes directly through the ServiceListener of the importer. This allows the management flow and the business flow to meet inside the importer bundle. Such rendezvous' are critical because the importing bundle can use a private lock to prevent race conditions for the service object reference. If the management flow obtains the lock first (by entering the ServiceListener ) it will block the business flow and flush clean any references to the dying service. After the cleanup the business flow will usually resume with a RuntimeException notifying that the service is gone. Conversely if the business flow obtains the lock first it will block out the management flow and complete the current call to the dying service. In this case we count on the service exporter to first unregister the service and only than release it's resources. If this sequence is followed the service will be fully usable during the last call the business flow makes before it is blocked out by the management flow. Notice that from the point of view of the importer service dynamics are all about crashing safely .

What if service events were delivered asynchronously? Well than the management flow would place an event on some queue and destroy the service without waiting for the clients to release it. Until the importers are notified by some event delivery thread they would be able to call the service while it is being destroyed. To prevent this from happening the exporter would have the additional responsibility to mark the service object as invalid so it can reject clients by tossing exceptions at them. Now we have code to check service validity and throw exceptions in both the exporter and the importer. Also this would likely require all methods of the service object to be synchronized by a common lock. Such a lock would be a coarse granularity lock because it is accessed by all importing code. As such it distorts the concurrency design of the application more than the multiple finer granularity locks used by the individual importers.

Even under the current synchronous event dispatch it is sometimes useful to place invalidation code in your services. This adds additional safety against badly coded importers. For example if a set of bundles form an independent reusable component you can place additional safety code in the services intended for external use while keeping the services used between the constituent bundles simple.

The solution

So far I went to great pains to describe..err..the pain of service dynamics. Now that you are hurting let us discuss the remedy. For now I have exhausted the subject of the correct importer behavior. To recap: the importer must track the service and guarantee atomic service object swaps. No matter what other policies we invent we simply must follow this rule to be safe. Now let's add to this a service export policy. The sum of an import and an export policy should form a comprehensive doctrine about dealing with service dynamics. I will explore two export policies with their corresponding doctrines.

Eager

This school of though shoots for safe service calls. It's motto is "To export a service is to announce it is ready for use". Consider what this means for services that are composed of imported objects. These objects are called "required services". A service can also be "optional - e.g. logging. Under the eager motto when a required service goes down the export is no longer usable. So it must also be withdrawn from the OSGi service registry. This goes the other way too - when the required service comes back online the composite service must be registered once more. This results in cascades of service registrations and unregistrations as chains of dependent services come together and fall apart. Implementing this dynamic behavior varies from hard to exceptionally hard. The problem is that the imports and the exports have to come together into common tracking objects with the proper synchronization. Also quite often this dynamic dependency management is further compounded by the need to track events from non-service sources (for example we track a dynamic configuration to waiting for it to become valid).

Let us suppose we manage to write all of the boilerplate for each of our bundles. Now imagine how a thread races through the OSGi container when it executes the business control flow (e.g. useful work). It will start it's path from some active bundle that drives the particular application. As soon as it calls a service object it will leave it's "home" bundle and enter the bundle that exports the service. If that service is in turn implemented via other services the thread will hop into the bundle of each one and so on. For the original service call to succeed each and every hop must also succeed. Turns out we are trying to achieve a kind of transactional behavior - a call to a chain (or more generally a tree) of services either fully succeeds or can not be made in the first place because the root service is not registered. Under such strong guarantees the active bundle knows ahead of time (or eagerly) that a certain activity can't be performed and can take alternative actions. E.g. rather than react to an error it directly performs the respective error handling. I suppose by writing the complicated import-export boilerplate we avoid writing some exception handling code and don't need to worry about cleanup after a partially successfully service call.

Unfortunately this idea of safe service dynamics is completely utopian. The eager model simply can't work and this is the main point I want to hammer in hard with this blog. Imagine some management control flow kicks in and stops a bundle two hops removed from the current position of the business control flow. Since the business flow has not yet entered the stopped bundle it will not be able to block the management flow from taking down it's services. As a result our thread will run into a wall. Obviously no amount of "local" synchronization by each individual bundle along the path will guarantee the integrity of the entire path. What is needed is a third party - a transaction manager of sorts, to lock the entire path before the business flow starts traversing it. Since such a manager does not currently exist we can conclude that service flickering can't prevent errors caused by disappearing services.

This brings on the question if there is some other benefit to justify the complexity caused by service flickering. We could argue that although we can't guarantee that a service call will succeed at least service flickering can tell us the precise moment after which a service call is guaranteed to fail. This allows us to perform various auxiliary reactions right after a required service goes down. For example if a bundle draws buttons in your IDE and a direct or transitive dependency goes away it can pop a dialog or hide the buttons from the toolbar. Without the cascading destruction of the service chain the buttons will be right there on the toolbar and the user will get exceptions every time he clicks. I say this return does not even approach our huge boilerplate investment. Remember that this only works if every bundle along the dependency chain behaves eagerly - we have a lot of boilerplate to write. This becomes even more ridiculous if you consider the additional complications. Why should we blow the horn loudly during a routine bundle update that lasts 2 seconds? Maybe we should just "flicker" the buttons on the toolbar and postpone the dialog until the failure persist for more than 10 seconds. Should this period be configurable? Also who should react - only active bundles or every bundle along the service chain? Since we don't want to get flooded by dialogs (or other reactions in general) we must introduce some application-wide policy (read "crosscutting concern"). In short we have payed a lot to get back a dubious benefit and as a side effect have introduced a brand new crosscutting concern in our otherwise modular application.

Lazy

This approach defines the service export as "To export a service is to declare an entry point into the bundle". Since the export is merely a declaration it does not require any dynamic flickering. We simply accept that calling a service can result in exception because of a missing direct or transitive service dependency. I call this model "lazy" because here we do not learn about a missing service unless we try to call it. If the service is not there we simply deal with the error. The complete dynamics doctrine than becomes:

  • Explicit registration is used only during bundle startup.
    Generally a bundle should follow this sequence in BundleActivator.start():
    1. Organize the import tracking (as described below).
    2. Build the bundle internal structure. Store it's roots in non-final fields in the activator.
    3. If this is an active bundle start it's threads.
    4. If there are objects to export register them now and store their ServiceRegistrations in non-final fields in the activator.
    Upon completion of this sequence the bundle is started and hooked to the service registry. It's internal structure is spared from garbage collection because it is referenced from within the activator and the activator in turn is referenced from the OSGi container. Now the management control flow can leave the activator and go about it's business. If the bundle has started some threads to execute the business flow they can continue doing their work after the activator is no longer being executed.
  • Importers fail fast
    Every imported service must be tracked, all code that uses the service must be synchronized with the code that swaps the service object in and out. When an attempt is made to call a missing service a RuntimeException is thrown. This exception is typically called ServiceUnavailableException (or SUE).
  • Service errors are handled like regular RuntimeExceptions (faults)
    Upon a SUE you do the same stuff you should do with most exceptions: propagate it to a top-level catch block (fault barrier), do cleanup as the stack unrolls or from the catch block, and finally complete the crashed activity in some other rational way. In detail:
    1. If the service is optional just catch, log and proceed.
      If the service is not critical for the job at hand there's no need to crash. The SUE must be caught on the spot (e.g. we convert a fault to a contingency) and logged. Whether a service is optional depends on the concrete application. We can even imagine partially optional services where only some of the methods calls are wrapped in a try/catch for the SUE while others lead to more comprehensive crashes.
    2. If the service is required and you are a passive bundle clean up your own resources and let the exception propagate.
      Passive bundles don't drive business logic and therefore don't own the activities that call into them. As such they have no right to decide how these activities are completed and should let the exception propagate to the owner active bundle. They still must clean up any internal resources associated with the service call in a try/finally block. Because good coding style requires such cleanup to be implemented anyway it turns out that for passive bundles lazy service dynamics coast nothing.
    3. If the service is required and you are an active bundle declare the current activity as crashed, log, clean up, try contingency actions.
      If you are the bundle who drives the crashed activity it's your responsibility to complete it one way or another. Good design requires that you wrap an exception barrier around the business logic code to absorb crashes. If there is need of resource cleanup you do it as usual. Than you do whatever the application logic dictates: display an error dialog to the user, send an error response to the client, etc.
  • Explicit service unregistration is used only during bundle shutdown
    All bundles should execute the following sequence in BundleActivator.stop():
    1. Bring down all exported services with calls to the respective ServiceRegistration.unregister()
      In this way we make sure no business control flow will call a service and wreak havoc with our shutdown sequence. Also we don't cause trouble to our importers by exposing partially destroyed services.
    2. If you are an active bundle perform graceful shutdown to any threads you drive.
    3. Clean up any non-heap resources you own.
      Close server sockets, files, release UI widgets etc.
    4. Release your heap.
      This is done by explicitly nulling any fields contained in your BundleActivator. After stop() completes the bundle should only consume memory for it's the BundleActivator instance and it's ClassLoader. These are both managed directly by the OSGi runtime. The bundle will mushroom again into a runtime structure on the heap if some management control flow reaches through the OSGi core to call once more BundleActivator.start() ( e.g. a user clicks on "start bundle" in his JMX console).

The beauty of the lazy doctrine is that we manage to almost completely fold the hard problem of service dynamics into the much easier problem of dealing with exceptions properly. Turns out dynamics are not so horrible, they mostly force us to have a consistent error handling and cleanup policy - something any Java app worth it's salt should have anyway.

There is a substantial wrinkle in this smooth picture - the service import code is still hard to write and is quite disruptive to the business logic code. You have to sprinkle synchronizations all over the place to prevent the management control flow and the business control flow from competing for service object references. This issue is addressed by...

Service proxies

There is an infinite number of ways to achieve correct lazy importing behavior. In practice however mostly variants of the pattern I am about to present lend themselves to the limited understanding of the human brain. This pattern is so compelling that very early in OSGi history a utility called ServiceTracker was introduced to capture it. I have used and coded this enough times (sick if it really) I was able to emit these ~100 lines practically in one go and there is a good chance you can paste it in your IDE, and go import some services:

class ServiceHolder<S> implements ServiceListener {
  private final BundleContext bc;
  private final Class<S> type;
  private ServiceReference ref;
  private S service;

  /**
   * Called from BundleActivator.start().
   *
   * (management control flow)
   */
  public ServiceHolder(Class<S> type, BundleContext bc) {
    this.type = type;
    this.bc = bc;
  }

  /**
   * Called by the app when it needs the service. The rest of the code in this
   * class supports this method.
   *
   * (application control flow)
   */
  public synchronized S get() {
    /* Fail fast if the service ain't here */
    if (service == null) {
      throw new RuntimeException("Service " + type + " is not available");
    }
    return service;
  }

  /**
   * Called from BundleActivator.start().
   *
   * (management control flow)
   */
  public synchronized void open() {
    /*
     * First hook our synchronized listener to the service registry. Now we
     * are able to block other management control flows in case they try to
     * change the service status while we initialize.
     */
    try {
      bc.addServiceListener(this, "(" + Constants.OBJECTCLASS + "=" + type.getName() + ")");
    } catch (InvalidSyntaxException e) {
      throw new RuntimeException("Unexpected", e);
    }

    init(bc.getServiceReference(type.getName()));
  }

  /**
   * Called from BundleActivator.stop().
   *
   * (management control flow)
   */
  public synchronized void close() {
    /* Unhook us so the cleanup is not messed up by service events. */
    bc.removeServiceListener(this);

    if (ref != null) {
      bc.ungetService(ref);
    }
  }

  /**
   * Called by the container when services of type S come and go.
   *
   * (management control flow)
   */
  public synchronized void serviceChanged(ServiceEvent e) {
    ServiceReference ref = e.getServiceReference();

    switch (e.getType()) {
    case ServiceEvent.REGISTERED:
      /* Do we need a service? */
      if (service == null) {
        init(ref);
      }
      break;

    case ServiceEvent.UNREGISTERING:
      /* Is this the service we hold? */
      if (this.ref == ref) {
        service = null;
        /* Switch to an alternative if possible */
        init(bc.getServiceReference(type.getName()));
      }
      break;
    }
  }

  @SuppressWarnings("unchecked")
  private void init(ServiceReference ref) {
    if (ref != null) {
      this.ref = ref;
      this.service = (S) bc.getService(ref);
    }
  }
}

There! Now this looks like a real programmer article. Let's imagine we want to import the following wicked cool service.

interface Hello {
  void greet(String who);
}

In the BundleActivator.start() we must set up a ServiceHolder.

private ServiceHolder<Hello> helloHolder;

void start(BundleContext bc) {
  helloHolder = new ServiceHolder<Hello>(Hello.class, bc);
  helloHolder.open();
}

We than propagate the holder inside our bundle to all the places where the service is needed. At each site where in a non-dynamic app we would call the service we instead place the following code:

synchronized (helloHolder) {
  helloHolder.get().greet("Todor");
}

The synchronized wrapper is required even with this one-liner because it is the only way to make sure the service object won't become invalid right after the get() call returns and just before the greet() call begins. Needless to say this is painful and ugly. But it's the only way to be correct. Or is it?

If you squint you will see that we have actually coded the guts of a thread-safe proxy. Let's complete the proxy by wrapping our holder in the original service interface:

class HelloProxy implements Hello {
  private final ServiceHolder<Hello> delegate;

  public HelloProxy(ServiceHolder<Hello> delegate) {
    this.delegate = deleagte;
  }

  public void greet(String who) {
    synchronized (delegate) {
      delegate.get().hello(who);
    }
  }
}

Now we can create the HelloProxy in the activator and use it everywhere through Hello-typed references as if it was the original service. Except now we can store the "service" in final fields and pass it to constructors. Combine this with the rest of the Lazy doctrine and we get a clean separation between the dynamics handling boilerplate (locked in proxies an the activator) and the business logic code. Also the business code now looks just like a regular non-dynamic Java program. Cool! Except coding such proxies can get very tedious in the real world where we use many services with a lot more than one method. Fortunately such proxy generation is quite easy to code as a library or even better an active Service Layer Runtime bundle.

Before I explain how to sort out this last issue I must make an important observation: the eager and lazy models are not mutually exclusive. As the code above illustrates in the core of every lazy bundle runs tracking and reaction code similar to the code that would support an eager bundle. The lazy bundle wraps this tracking core with a stable layer of proxies that shield the application code (and it's control flow) from all the movement happening below. Still if you really need it usually you can plug code into the lower layer and have a hybrid eager(pre-proxy)/lazy(post-proxy) bundle. For example the eager part can do decorations or even complete transformations to the services before they are wrapped in proxies and passed to the lazy part. So if we exclude the dynamic service flashing the lazy model is really a natural evolution of the eager model to a higher level of abstraction.

Service Layer Runtimes

Since OSGi 4.0 it became possible to implement a special type of bundle that can drive the service interactions of other bundles. I call these Service Layer Runtimes (or SLR) because they hide the raw OSGi service layer from the bundles they manage. Although SLRs come in all shapes and sizes they inevitably include a dependency injection component. This is because DI is a natural match for services, which typically enter the bundle from a single point like the activator and need to be propagated all over the bundle internals. Doing this manually is tedious (for example it might require you to build chains of setters called "fire brigades", or worse - use statics). Delegating this task to DI is a huge relief.

Peaberry

I will start with my personal favorite. It is pure Java and is developed as an extension to the sexy Guice framework, which means it is lightweight, powerful and XML free. Peaberry steers the user towards the lazy model and in fact I came up with the idea when thinking how to use the framework most effectively. Using it feels largely like using pure Guice. All you need to do to get a service proxy is to bind the interface of the service to a special provider implemented by Peaberry:

bind(Hello.class).toProvider(service(Hello.class).single());

The proxies are than generated on the fly using ASM. From there normal Guice takes over and injects them as it would any other object. Code written in this way looks a lot like a plain old Java SE, with dynamic proxies practically indistinguishable from local objects. Peaberry has many more features including ways to filter services, import all services of a given type as an Iterable, hook code to the dynamic tracking layer below, do decorations to the services before they are wrapped in proxies. Finally Peaberry is service registry agnostic and allows you to seamlessly mix services from different sources - for example objects from the Eclipse registry can be mixed transparently with OSGi services.

Alas Peaberry is not yet perfect. One area where it lags behind the other SLRs is dynamic configuration. The user has to use the ConfigurationAdmin service to change a configuration or to expose classes as ManagedService to receive dynamic configuration. The other area is the lack of an extender - the user still has to code a minimalistic BundleActivator to set up the Guice Injector. The good new is that Peaberry is currently under active development and these gaps are sure to be plugged soon.

Spring Dynamic Modules

A descent choice, which also supports the lazy model. Again the service proxies are generated transparently for the user. Spring DM relies on the Spring component model to do dependency injection. Although it seems to provide more features than Peaberry it feels much more heavyweight to use.

Declarative Services

This is the only SLR standardized by OSGi. It tries to solve the dynamics problem with traditional Java means. It is high level in that it has a component model. It is low level in that it exposes the components to more of the service dynamic (no proxies). A dependency can either be defined as "dynamic" or "static".

For dynamic dependencies a component must provide a pair of bind()/unbind() callbacks for each service dependency. OSGi DS will do the tracking and call the respective callback. The component than takes over to performs all the swapping and synchronization on it's own. In this case OSGi DS saves the developer only the tracking code.

By default a dependency is "static" and the component only provides only needs to provide bind() methods. Now the component does not have to worry about synchronization or release of the service object. Instead OSGi DS will make re-create the entire component whenever the dependency changes. Alas this is the only way to make sure the old service object is released.

OSGi DS follows the eager service export policy: if a component is exposed as a service and some of it's required dependencies go away the component is unregistered. The consequence is also the cascading deactivation of dependent components. As we saw this lifecycle scheme can not prevent exceptions from failing transitive dependencies. The user code must still have proper error handling in place.

OSGi DS also supports the non Dependency Injection "lookup" style where your components receive a ComponentContext from which to pull out services.

iPojo

Architecturally iPojo seems to be "OSGi DS but done right". The OSGi DM heritage is all over the place. Here as well we deploy components as bundles with each component having it's dependencies managed by a central bundle. As with OSGi DS the components can be exposed as services to the OSGi registry. However apart from this iPojo departs from OSGi DS in very important ways. Most importantly it is highly modular allowing a component to specify a different pluggable handler for each dependency. For this reason it is hard to place iPojo firmly in either the Eager or the Lazy buckets as there are handlers that implement the service proxying behavior. Also the cascading deactivation of components is configurable. The handler magic is added via bytecode weaving, which if you can get over the extra build step pays off when deploying on resource constrained devices.

iPojo integrates the OSGi ConfigAdmin beautifully by establishing 1 to 1 relationship between a component and it's configuration. If you create a new configuration copy this leads to the creation of a new component and vice versa.

All in all iPojo is an interesting proposition developed solely with the OSGi environment in mind. It is definitely worth spending the time to explore. I would recommend iPojo over Spring DM as it feels simpler, cleaner and is more performant.

Update

Since this came out I have been working on refining it. There have been some important revelations worth checking out.