Wednesday, September 14, 2005

Could not find server in ObjectManager list

You can register your DataSnap appserver as stateless by calling RegisterPooled in the overriden UpdateRegistry class method of your remote data module.The problem with the way httpsrvr.dll manages stateless objects is that, ehm, it's stateful. ;-) More precisely, it's stateless only within a single IIS session.

The Object Manager uses a stringlist of class IDs and each item contains a list of instances of that class. Here is a brief summary of what happens between the client (TWebConnection) and the server (httpsrvr), in case of stateless/pooled appservers:
  1. The client sends asCreateObject request with the appserver's CLSID to the server. (see TStreamedConnection.DoConnect).
  2. The server checks its list of CLSIDs (creating a new item if it does not exist) and return its index within the list back to the client. No instance of the class is created at this point.
  3. The client stores the returned integer value which is used to identify the class in subsequent asInvoke (method call) requests.
  4. The client sends a asInvoke request to execute a method remotely on the appserver. (The communication between a TClientDataSet and a TProvider also works via remote method calls, the relevant methods are defined in the IAppServer interface which is implemented by all TRemoteDataModule descendants.)
  5. The server checks its list of CLSIDs, finds the entry and checks its instance pool for any unlocked (ie. not currently processing a request) instance. In case no idle instance can be found at the moment, it will create a new one, up to the maximum size of the pool you specified in your RegisterPooled call. (In case all instances are locked and the pool is already at its maximum size, the server will return a "Server is busy" error.)
  6. The client may send additional asInvoke requests as needed. The server dispatches the calls to any currently idle instance in the appropriate pool. Hence, your appserver logic must be stateless, ie. it cannot assume that any context is preserved between the calls since they may be executed by different instances.
  7. Cleanup on the server is performed by a separate garbage collector thread which releases any unused instances after a specified timeout. Even if no instance is any longer active (the pool size drops to zero), the item in the list of CLSIDs remains unchanged so clients may issue additional requests; these will be served by instances created on demand.
  8. When the client is finished with the appserver, it will send an asFreeObject request (see TDataDispatch.Destroy which is called due to reference counting). If the appserver has been registered as stateless, the server does not release any instance at this point - rather, the instances are kept in the pool, ready to serve additional requests or eventually be released by the garbage collector.

But what happens if you restart IIS anywhere after step 3? Sure, upon receiving another HTTP request with a relevant URL, IIS will load and run httpsrvr.dll again, but the Object Manager's list of CLSIDs is empty after a fresh start; the integer value sent by the client makes no sense since it cannot be resolved to a CLSID. This is the case when the server sends the "Could not find server in ObjectManager list" error back to the client. In an even worse case, if other clients have sent requests in the meantime, the list indexes may be different so the call will be dispatched to an instance of a different class, which will probably result in some COM exception because of the (very probable) typeinfo mismatch. (In a yet worse scenario, think about what happens when the method dispatch IDs and signatures of the two different appservers match by coincidence. ;-)) In either case, from the client's point of view, important state information (the association between the CLSID and its token) is lost.

So how can this problem be solved? I can imagine two options:

  1. Modify the DataSnap code so it never uses the tokens. Let the client pack the CLSID with every call. The disadvantages of this approach are:
    • GUIDs take 16 bytes, as opposed to 4 bytes for an integer.
    • All your client executables out there will become obsolete/incompatible with your appservers; you'll have to recompile and redistribute them.
  2. Modify httpsrvr.dll so it ensures that the token values don't change from one IIS session to another. This can be done by preloading the list in the initialization and not on demand so the index values are not dependent on the order in which clients send requests.

I have chosen the latter option. To preload the list of registered appservers, I use the (slightly modified) GetMIDASAppServerList procedure from the MConnect unit. The thousands of already distributed client applications will continue working as before; in addition, I can restart my web server between their calls without interrupting their work.