Schema Replication Notes.  16th January 2004


Paul Taylor presented a description of how schema replication for R-GMA may be implemented based on  the Bully algorithm. He explained the principal of the 
bully algorithm, where there are multiple schemas where the one that has the 
lowest ID is the master. A master is elected each time a new schema joins, or 
when the current master fails to respond.

This prompted various discussions on the details of the mechanism.
If one instance of the schema receives a request for an update, this instance contacts the master and asks  `can I update'.  The master responds with an OK and then sends the update to all other instances of the  schema. One suggestion was that there was no need for the master to send a different message to the  schema requesting the update to that to all other schemas, the master could just return the update. This  would be all right provided if the master did not allow the update it sent a message or exception back to the requesting schema, notifying that the update failed.  

For the schema to work, must ensure that each schema gets the update message.

If a new schema comes up, with a lower ID than the master, it must first obtain the schema from the current master before it forces an election. 

The system administrator will administrate (i.e. set up the Ids of the schema) initially.  No automated configuration.
Each schema instance should present a security credential to prove it's really allowed to register.

There are various synchronization and startup problems, and it was recognized that we would not solve all these during the meeting.   We thought about keeping a list of schemas in the registry. There is a basic bootstrapping problem that we need to think about when we re-design.

There is also the problem that the Schemas could become fragmented if a network connection failed. E.g. if the schema is split between 2 countries and there are several schemas in each country then the two could start working independently and then the schemas could diverge. This presents the problem of how to re-synchronize. Various possibilities were discussed, including having a quorum of schemas, only some of 
which are allowed to update, and not allowing updates unless there are a quorum of schemas.

The problem of how to remove a schema if it's no longer used in the registry was mentioned.  A registry cannot know that its removing the last reference to a schema. Possibly a weekly clean-up?

It was noted that registry replication has a different design. 

Testing the schema replication was mentioned, including testing in the resilience framework with extra servlets on one node to save having to test with a number of machines.