Wednesday, August 19, 2009

Stateful vs Stateless

In this entry I will cautiously venture into the thorny territory of stateless-vs-stateful. For quite some time I have been frying my brain in attempts to introduce some structure into this space. The goal is to define patterns and practices about working with state under OSGi. Generally under OSGi bundles communicate with services so I will do my musings from this perspective.

Hotswapping and the Service Use Spike

"Why not simply use state as we like?" you might ask. As is common with OSGi the short answer is "Because of dynamics!". The longer answer is that under OSGi a bundle can be updated at runtime and this will kill any state accumulated inside the bundle. Ideally we want client bundles to perceive this event as a hotswap of all services the updated bundle exports. I will define a service as being capable of hotswap as

The ability for callers to swap the live service object and experience a timing hiccup as the only side effect.

Using services under OSGi always requires us to go through the same steps:

  1. get
    Service object is obtained.
  2. use
    Service object is called one or more times.
  3. release
    Service object is released

To reflect the peak of activity at the use step I decided to call this sequence the service use spike. The spike "rises" in the get step and "falls" in the release step. You can imagine these spikes drawn on a time axis as the pulse of some odd creature. Obviously a service can be hotswapped safely between the use spikes. The event of a bundle update can be represented as a vertical line cutting through the time axis. Because an update can happen at any time inevitably it will cut through some use spikes. The type of service determines how the bundles who own the interrupted spikes deal with the failure.

Stateless Services

These are the "classic" services. The result of call to a stateles service depends only on the passed parameters - i.e. is a direct function of the parameters. Note that there is no problem for the results to include side effects including database modifications. The single and rather big constraint that makes a service stateles is...

A service qualifies as stateles if it does not store unrecoverable data in the objects comprising it's implementation.

Here unrecoverable basically means data kept in memory rather than externalized into some data store. I.e. once the service is unregistered the data is lost to all clients. This definition permits stateles services to build up state as long they remain indistinguishable from a service that calculates everything at every method call. I.e. we permit intermediate computation caching to improve performance.

Logically the use spike of stateles services spans a single method call. Therefore to retry a use spike we need only the information about this call. In most cases this contextual information is available on the stack of the current thread. E.g. the thread is at a point where it tries to call out to the service and discovers it is not available. It is than free to retry the call against an alternative service or if none are present wait until it appears. This type of behavior is even supported by the OSGi standard service tracking utility.

For performance reasons the use spike of stateles services can be extended to span multiple method calls. This can be done by serving the get step from a cache that in turn obtains the services from the dynamic environment and holds on to them for a while. The release step is than postponed and the use spike is artificially extended. The net result - we do much less get and release than use and performance goes up.

Stateless services are used simultaneously by multiple clients and virtually always need to deal with issues of concurrent access.

Stateful Services

A result from a call to a stateful service depends on previous calls. Here two subsequent calls with identical parameters can yield different results. You can think of these as services with history where every call mutates some internal state and thus influences the outcome of subsequent calls. Because the correctness of our program depends on the accumulated history we care dearly to call the exact same object every time. This means that hotswapping stateful services is in the general case impossible.

Obviously the use spike of stateful services spans many calls, because the user must hold on to a service object as state is accumulated with every subsequent call. If the spike is interrupted at the N-th call to do a retry we need to playback all (N-1) calls leading up to the failure. It is impractical or downright impossible to keep such a call log. For this reason we can not hotswap stateful services. I.e. an update to a stateful service can not go unnoticed. We treat the disappearing of a stateful service as a catastrophic event. I.e. we throw an exception, unroll the stack up to a fault barrier, cleanup any private state we have associated with the stateful service, do contingency reactions.

Stateless services can serve only one client at a time and often are accessed sequentially by that client. So we can expect less concurrency issues here.

Because each client needs it's own copy of the service we need a factory mechanism via, which clients can produce new instances. The default OSGi factory mechanism is too restrictive. It caches service instances on per-bundle basis and does not allow parameters to be passed at construction time. For this reason often stateful services come with a supporting stateles factory service. This service has one or more factory methods that clients call to obtain the "real" stateful service objects. This approach has the drawback that the OSGi container no longer has visibility over these secondary services. As a consequence we count on the clients to not forget to call a close()/dispose() method on the stateful object when they are done with it. OSGi definitely needs an improved factory mechanism if it is to support stateful services as equal citizens. It must support construction parameters and relinquish the caching policy into the hands of the exporting bundle. This is a theme for a future post.

Once we have state we must worry about mutating it in a correct sequence. The beginning and the end of this sequence become particularly interesting when part of the state is stored into an external service. Under OSGi dynamics we can not know when the external service will come and go. So how can we keep our half of the state in lockstep with the dynamic half? This is how we arrive at the notion of a lifecycle and to the problem of how and when to synchronize the lifecycles of dependent "lumps of state". In short:

As soon as we add even one bit of mutable state (e.g. an availability flag) to our service, we also need to add lifecycle to manage that bit.

A service interface plus a lifecycle are one common definition of component. And here you have it! Managing stateful services leads to the need of a strong component model. What are the further repercussions of this I will analyze in future posts.

Conclusion

State is a fact of life in a general purpose programming setting. It is simply not convenient or practical to always externalize the state out of our service so we make it stateles. One example is a bundle that implements a network stack. The clients of this stack need to use Connection objects directly. It would be an atrocity to force them to hold on to a ConnectinData object and pass it to a stateles ConnectionService every time they want to read or write.

Here in lies the dilemma of state: an explicit separation between code (locked in stateles services) and data (locked in DTOs) permits a scalable, composable, hotswappable design. All nice features of a functional-like programming style. At the same time this flies in the face of conventional OO design. This stark switch in style between design in the small (inside the bundle) and design in the large (composing bundles into systems) can be a source of more problems than just going ahead and using state where it makes sense.

In this post I tried to identify the problems caused by state. The idea was to invent some reference framework to think about this stuff so I can get to real solutions in later posts. If OSGi is to become the universal JVM middleware it aspires to be we need clear practices and frameworks to deal with state. Even if the ultimate answer turns out to be "Just don't use stateful services!" the exploration leading up to that is worth it.

Friday, August 14, 2009

Classload acrobatics: code generation under OSGi

In a previous blog I mentioned that the hardest problem we face when porting existing Java infrastructure to OSGi has to do with class loading. This blog is dedicated to the AOP wrappers, ORM mappers and similar code generation engines that face the harshest issues in this area. I will gradually introduce the main problem, present the best current solution, and develop a tiny bit of code that implements it. This blog comes with a working demo project that contains not only the code presented here, but also two ASM-based code generators you can play with.

Classload site conversion

Usually porting a Java framework to OSGi requires it to be refactored to the extender pattern. This pattern allows the framework to delegate all class loading to OSGi and at the same time retain control over the lifecycle of application code. The goal of the conversion is to replace things like

Class appClass = Class.forName("com.acme.devices.SinisterEngine");
...
ClassLoader appLoader = ...
Class appClass = appLoader.loadClass("com.acme.devices.SinisterEngine")

with

Bundle appBundle = ...
Class appClass = appBundle.loadClass("com.acme.devices.SinisterEngine")

Although we must do a non-trivial piece of work to get OSGi to load the application code for us we at least have an nice and correct way to get things working. And work they will even better than before, because now the user can add/remove applications just by installing/uninstalling bundles into the OSGi container. Also the user can break up their application in as many bundles as he wishes share libraries between the applications and all that sweet modular stuff.

Adapter ClassLoader

Sometimes the code we convert has externalized it's class loading policy. This means the classes and methods of the framework take explicit ClassLoader parameters allowing us to dictate where they load application code from. In this case the conversion to OSGi can become a mere question of adapting a Bundle object to the ClassLoader API. This is done by what I call an adapter ClassLoader.

public class BundleClassLoader extends ClassLoader {
  private final Bundle delegate;

  public BundleClassLoader(Bundle delegate) {
    this.delegate = delegate;
  }

  @Override
  public Class<?> loadClass(String name) throws ClassNotFoundException {
    return delegate.loadClass(name);
  }
}

Now we can pass this adapter to the framework code. We can also add bundle tracking code to create the adapters as new bundles come and go. I.e. we are able to adapt a Java framework to OSGi "externally" avoiding the exhausting browsing through the codebase and the conversion of each individual classload site. Here is a highly schematic sample of some code that converts a framework to use OSGi class loading:

...
Bundle app = ...
BundleClassLoader appLoader = new BundleClassLoader(app);

DeviceSimulationFramework simfw = ...
simfw.simulate("com.acme.devices.SinisterEngine", appLoader);
...

Bridge ClassLoader

The coolest Java frameworks do fancy classworking on client code at runtime. The goal usually is to dynamically build classes out of stuff living in the application class space. Some examples are service proxies, AOP wrappers, and ORM mappers. Let's call these generated classes enhancements. Usually the enhancement implements some application-visible interface or extends an application-visible class. Sometimes additional interfaces and their implementations are mixed in as well.

Enhancements augment application code. I.e. the generated objects are meant to be called directly by the application. For example a service proxy is passed to business code to free it from the need to track a dynamic service. Similarly a wrapper that adds some AOP feature is passed to application code in place of the original object.

Enhancements start life as byte[] blocks produced by your favorite class engineering library (ASM, BCEL, CGLIB, ...). Once we have generated our class we must turn the raw bytes into a Class object. I.e. we must make some ClassLoader call it's defineClass() method on our bytes. We have three separate problems to solve:

  • Class space completeness
    First we must determine the class space, into which we can define our enhancements. It must "see" enough classes to allow the enhancements to be fully linked.
  • Visibility
    ClassLoader.defineClass() is a protected method. We must find a good way to call it.
  • Class space consistency
    Enhancements mix classes from the extender and the application bundles in a way that is "invisible" to the OSGi container. As a result the enhancements can potentially be exposed to incompatible versions of the same class.

Class space completeness

Enhancements are backed by code private to the Java framework that generates them. Therefore the extender should introduce the new class into it's own class space. On the other hand the enhancements implement interfaces or extend classes visible in the application class space. Therefore we should define the enhancement class there. Bummer!

Because there is no class space that sees all classes we require we have no other option but to make a new class space. A class space equals a ClassLoader instance so our first job is to maintain one dedicated ClassLoader on top of every application bundle. These are called bridge ClassLoaders, because they merge two class loaders by chaining them like so:

public class BridgeClassLoader extends ClassLoader {
  private final ClassLoader secondary;

  public BridgeClassLoader(ClassLoader primary, ClassLoader secondary) {
    super(primary);
  }

  @Override
  protected Class<?> findClass(String name) throws ClassNotFoundException {
    return secondary.loadClass(name);
  }
}

Now we can use the BundleClassLoader developed earlier:

  /* Application space */
  Bundle app = ...
  ClassLoader appSpace = new BundleClassLoader(app);

  /*
   * Extender space
   *
   * We assume this code is executed in a non-static method inside the extender
   */
  ClassLoader extSpace = getClass().getClassLoader();

  /* Bridge */
  ClassLoader bridge = new BridgeClassLoader(appSpace, extSpace);

This loader will serve requests first from the application space, and if that fails try the extender space. Notice that we still let OSGi do lot's of heavy lifting for us. When we delegate to either class space we are in fact delegating to an OSGi-backed ClassLoader. I.e. the primary and secondary loaders can delegate to other bundle loaders in accordance to the import/export metadata of their respective bundles.

At this point we might be pleased with ourselves - I was for quite some time. The bitter truth however is that the extender and application class spaces combined may not be enough. Everything hinges on the particular way the JVM links classes (also known as resolving classes).

In brief
JVM resolution works on a fine grained or sub-class level.

In detail
When the JVM links a class it does not need the complete descriptions of all classes referenced from the linked class. It only needs information about the individual methods, fields and types that are really used by the linked class. What to our intuition is a monolithic whole to the JVM is a class name, plus a superclass class, plus a set of implemented interfaces, plus a set of method signatures, plus a set of field signatures. All these symbols are resolved independently and lazily. For example to link a method call the class space of the caller needs to supply Class objects only for the target class and for all types used in the method signature. Definitions for the numerous other things that the target class may contain are not needed and the ClassLoader of the calling class will never receive a request for them.

Formally
Class TA from class space SpaceA must be represented by the same Class object in class space SpaceB if and only if:

  • There exists a class TB from SpaceB that refers to TA form it's symbol table (known also as the constant pool).
  • The OSGi container has chosen SpaceA as the provider of class TA for SpaceB.

By example
Imagine we have a bundle BndA that exports a class A. Class A has 3 methods distributed between 3 interfaces: IX.methodX(String), IY.methodY(String), IZ.methodZ(String). Imagine further we have a bundle BndB that has a class B. Somewhere in class B there is a reference A a = ... and a method call a.methodY("hello!"). To get class B to resolve we need to introduce into the class space of BndB class A, and class String. That's all! We don't need to import IX or IZ. We don't need to import even IY because class B does not use it - it uses only A. On the other hand when the exporting bundle BndA resolves class A it must supply IX, IY, IZ because they are directly referenced as interfaces implemented by class A. Finally even BndA does not have to supply any of the super-interfaces of IX, IY, IZ because they are not directly referenced from A.

Now let's imagine we want to present to class B an enhanced version of class A. The enhancement needs to extend class A and override some or all of it's methods. Because of that the enhancement needs to see the classes used in the signatures of all overridden methods. To supply all required classes BndB must contain code that calls each method we mean to override. Otherwise it will have not reason to import the required classes. It is very likely however that BndB calls only a few of A's methods. Therefore BndB likely does not see enough classes to support the enhancement. The complete set can only be supplied by BndA. Double Bummer!

Turns out that we must bridge not the extender and application spaces but the extender space and the space of the enhanced class. I.e. rather than "bridge per application space" we must shift to a "bridge per enhanced space". I.e. application really requires us to bridge the class space of some third party class it can see because it's bundle imports it. How do we do that transitive leap from the application space to the space the application uses? Simple! As we know every Class object can tell us, which is the class space where it is fully defined. For example all we need to do to get the defining class loader of A is to call A.class.getClassLoader(). In many cases however we have a String name rather than a Class object so how do we get A.class to begin with? Simple again! We can ask the application bundle to give us the exact Class object it sees under the name "A". This is a critical step because we need the enhanced and original classes to be interchangeable within the application. Out of potentially many available versions of class A we need to pick the class space of the one used by the application. Here is a schematic of how an extender can maintain a cache of class loader bridges:

...
/* Ask the app to resolve the target class */
Bundle app = ...
Class target = app.loadClass("com.acme.devices.SinisterEngine");

/* Get the defining class loader of the target */
ClassLoader targetSpace = target.getClassLoader();

/* Get the bridge for the class space of the target */
BridgeClassLoaderCache cache = ...
ClassLoader bridge = cache.resolveBridge(targetSpace);

where the bridge cache would look something like

public class BridgeClassLoaderCache {
  private final ClassLoader primary;
  private final Map<ClassLoader, WeakReference<ClassLoader>> cache;

  public BridgeClassLoaderCache(ClassLoader primary) {
    this.primary = primary;
    this.cache = new WeakHashMap<ClassLoader, WeakReference<ClassLoader>>();
  }

  public synchronized ClassLoader resolveBridge(ClassLoader secondary) {
    ClassLoader bridge = null;

    WeakReference<ClassLoader> ref = cache.get(secondary);
    if (ref != null) {
      bridge = ref.get();
    }

    if (bridge == null) {
      bridge = new BridgeClassLoader(primary, secondary);
      cache.put(secondary, new WeakReference<ClassLoader>(bridge));
    }

    return bridge;
  }
}

To prevent memory leaks due to ClassLoader retention I had to use both weak keys and weak values. The goal is to not retain the class space of an uninstalled bundle in memory. I had to use weak values because the value of each map entry references strongly the key thus negating it's weakness. This is the standard advice prescribed by the WeakHashMap javadoc. By using a weak cache I avoid the need to track a whole lot of bundles and do eager reactions to their lifecycles.

Visibility

Okay we finally have our exotic bridge class space. Now how do we define our enhancements in it? The problem as I mentioned is that defineClass() is a protected method of BridgeClassLoader. We could override it with a public method but that would be rude and we will have to code our own checks to see if the requested enhancement has already been defined. Normally defineClass() is called from findClass(), when it determines it can supply the requested class from a binary source. The only information findClass() must relay on to make this decision is the name of the class. So our BridgeClassLoader must think to itself:

This is a request for "A$Enhanced" so I must call the enhancement generator for class "A"! Than I call defineClass() on the produced byte[]. Than I return the new Class object.

There are two remarkable things about that statement.

  • We introduced a text protocol for the names of enhancement classes.
    We can pass to our ClassLoader a single item of data - a String for the name of the requested class. At the same time we need to pass two items of data - the name of the original class and a flag marking it as a subject to enhancement. We pack these two items into a single string of the form
    [name of target class]"$Enhanced"
    . Now findClass() can look for the enhancement marker $Enhanced and when it is present extract the name of the target class. In this way we also introduce a convention for the names of our enhancements. Whenever we see a class name sending on $Enhanced in a stack trace we know this is a dynamically generated class. To mitigate the risk of name clashes with normal classes we make the enhancement marker as exotic as Java allows (e.g. $__service_proxy__).
  • Enhancements are generated on demand.
    We will never try to generate an enhancement twice. The loadClass() method we inherited will first call findLoadedClass(), if that fails it will call parent.loadClass(), and only if that fails it will call findClass(). The fact that we use a strict protocol for the names guarantees findLoadedClass() will work the second time we get a request to enhance the same class. Couple this with the caching of bridge ClassLoaders and we get a pretty efficient solution where at no point we bridge the same bundle space twice or generate redundant enhancement classes.

Here we must also mention the option to call defineClass() through reflection. This approach is used by cglib. I suppose this is a viable option when we want the user to pass us a ready for use ClassLoader. By using reflection we avoid the need to create yet another loader on top of that just so we can access it's the defineClass() method.

Class space consistency

In the end of the day what we have done is to merge two class spaces that were not explicitly connected through the OSGi modular layer. Also we introduced a search order between those spaces similarly to the search order of the evil java class path. I.e. we have potentially eroded the class space consistency of the OSGi container. Here is a scenario of how bad things can happen:

  1. Extender uses package com.acme.devices and requires exactly version 1.0
  2. Application uses package com.acme.devices and requires exactly version 2.0.
  3. Class A refers directly to com.acme.devices.SinisterDevice.
  4. Class A$Enhanced refers directly to com.acme.devices.SinisterDevice from it's internal implementation.
  5. Because we search the application space first A$Enhanced will be linked against com.acme.devices.SinisterDevice version 2.0, while it's internal code was compiled against com.acme.devices.SinisterDevice version 1.0.

As a result the application will see mysterious LinkageErrors and/or ClassCastExceptions. Triple Bummer!

Alas there does not yet exist an automated way to handle this problem. We must simply make sure the enhancement code refers directly only to "very private" implementation classes that are not likely to be used by anyone else. We can even build private adapters for any external API's we might want to use and than refer to those from the enhancement code. Once we have a well defined implementation subspace we can use that knowledge to limit the class leakage. We now delegate to the extender requests only for the special subset of private implementation classes. This will also limit the search order problem allowing us to switch between application-first and extender-first search. One good policy to keep things under control is to have a dedicated package for all enhancement implementations. Than we only check for classes who's name begins with that package. Finally we sometimes need to judiciously relax this isolation policy for certain singleton packages like org.osgi.framework. I.e. we can feel pretty safe to compile our enhancement code directly against org.osgi.framework because at runtime everyone in the OSGi container will see the same org.osgi.framework - it is supplied by the OSGi core.

Putting it all together

Everything from this class loading saga can be distilled in the following ~100 lines of code.

public class Enhancer {
  private final ClassLoader privateSpace;
  private final Namer namer;
  private final Generator generator;
  private final Map<ClassLoader , WeakReference<ClassLoader>> cache;

  public Enhancer(ClassLoader privateSpace, Namer namer, Generator generator) {
    this.privateSpace = privateSpace;
    this.namer = namer;
    this.generator = generator;
    this.cache = new WeakHashMap<ClassLoader , WeakReference<ClassLoader>>();
  }

  @SuppressWarnings("unchecked")
  public <T> Class<T> enhance(Class<T> target) throws ClassNotFoundException {
    ClassLoader context = resolveBridge(target.getClassLoader());
    String name = namer.map(target.getName());
    return (Class<T>) context.loadClass(name);
  }

  private synchronized ClassLoader resolveBridge(ClassLoader targetSpace) {
    ClassLoader bridge = null;

    WeakReference<ClassLoader> ref = cache.get(targetSpace);
    if (ref != null) {
      bridge = ref.get();
    }

    if (bridge == null) {
      bridge = makeBridge(targetSpace);
      cache.put(appSpace, new WeakReference<ClassLoader>(bridge));
    }

    return bridge;
  }

  private ClassLoader makeBridge(ClassLoader targetSpace) {
    /* Use the target space as a parent to be searched first */ 
    return new ClassLoader(targetSpace) {
      @Override
      protected Class<?> findClass(String name) throws ClassNotFoundException {
        /* Is this used privately by the enhancements? */
        if (generator.isInternal(name)) {
          return privateSpace.loadClass(name);
        }

        /* Is this a request for enhancement? */
        String unpacked = namer.unmap(name);
        if (unpacked != null) {
          byte[] raw = generator.generate(unpacked, name, this);
          return defineClass(name, raw, 0, raw.length);
        }

        /* Ask someone else */
        throw new ClassNotFoundException(name);
      }
    };
  }
}

public interface Namer {
  /** Map a target class name to an enhancement class name. */
  String map(String targetClassName);

  /** Try to extract a target class name or return null. */
  String unmap(String className);
}

public interface Generator {
  /** Test if this is a private implementation class. */
  boolean isInternal(String className);

  /** Generate enhancement bytes */
  byte[] generate(String inputClassName, String outputClassName, ClassLoader context);
}

Enhancer captures only the bridging pattern. I have externalized the code generation logic into a pluggable Generator. The generator receives a context ClassLoader from where it can pull classes and use reflection on them to drive the code generation. The text protocol for the enhancement class names is also pluggable via the Namer interface. Here is a final schematic code for how such an enhancement framework can be used:

...
/* Setup the Enhancer on top of the current class space */
ClassLoader privateSpace = getClass().getClassLoader();
Namer namer = ...;
Generator generator = ...;
Enhancer enhancer = new Enhancer(privateSpace, namer, generator);
...

/* Enhance some class the app sees */
Bundle app = ...
Class target = app.loadClass("com.acme.devices.SinisterEngine");
Class<SinisterDevice> enhanced = enhancer.enhance(target);
...

The Enhancer framework presented above is more than pseudocode. In fact during the research of this blog I really built it and tested it with two separate code generators mixing it in the same OSGi container. The result was too fun to keep for myself so I put it up on Google Code for everyone to play:

Enhancer

Those interested in the class generation process itself can examine the two demo ASM-based generators. Those who read the InfoQ article on service dynamics may notice that the proxy generator uses as private implementation the ServiceHolder code presented there. I try to put my code where my mouth is.

Conclusion

The classload acrobatics resented here are used in a number of infrastructural frameworks under OSGi. Classload bridging is used by Guice, Peaberry and Spring Dynamic Modules to get their AOP wrappers and service proxies to work. I have seen classload adapter used on EclipseLink JPA to get it working on a non-equinox OSGi container. When I hear the Spring guys say they did serious work on Tomcat to adapt it to OSGi I imagine they had to do classload site conversion or a more serious refactor to externalize Tomcat's servlet class loading altogether.

Acknowledgements

Many of the lessons in this blog were extracted from the excellent code Stuart McCulloch wrote for Guice and Peaberry. For examples of industrial strength classload bridging look here and here. There you will see how to handle some additional aspects like security, the system class loader, better lazy caching, and concurrency. Thank you Stuart!

I also am obliged to this article by Peter Kriens. Before I read this I though I had OSGi class loading figured out. I hope my meticulous explanations on JVM linking will be a useful contribution to Peter's work. Thank you Peter!

Tuesday, August 04, 2009

Published on InfoQ

Some time ago a authored an article for InfoQ on the topic of service dynamics. I am proud to say my creation was finally published. The article is a refined fusion between two of my previous blogs with some added content. I write this entry for the people who have read those previous blogs. After all I have to cater to the few readers I have accumulated so far if I want to stand a chance of getting more people to comment on my ideas. So...

Precious readers of mine! Do not bother reading the entire InfoQ article - you will be disappointed to discover mostly rehashed old stuff. Instead jump straight to the Fighting Code Distortion section, where the most important new content is located.