THE IDEAL WLCG INFORMATION SYSTEM
=================================
Author: Andrea Sciaba'
Date: January 8, 2013
NOTE: this is not the CMS point of view but my personal view, as it discusses some implementation details that a VO should not bother about. It should not be quoted as coming from CMS and can be safely ignored.

The purpose of the IS is to allow users to know:
1) what resources exist in the Grid (resource discovery)
2) the properties of the resources (resource selection and usage)
3) the status of the resources (resource monitoring)

1) and 2) are equally important, and more important than 3).

Resource discovery
------------------
The first step for the user is to know what resources are in the Grid. Typical queries are:
- List all sites
- List all instances of services of type X at site Y
At this level, there are only two categories of objects: sites and services. Each of them
has a unique identifier and services have a type and a parent site. This information is
static and it should never change during the lifetime of the site or service.
It should be always available, even if the site or service is temporarily unavailable.
The IS should provide a well defined and unique endpoint to retrieve this information, for
the whole WLCG (including the Tier-3 sites that are officially part of EGI, OSG or NorduGrid).
As such, this part of the IS may be a service specific to WLCG, like REBUS. Any change to
the published sites and services done by site administrators must be quickly propagated
to this service. How this happens is an implementation detail. The service interface should be
easy to use, possibly a REST interface using popular formats, like JSON or XML, for the
output.

Resource properties
-------------------
Knowing the identifier of a resource is almost never enough to use it. Therefore the IS
should publish, directly or indirectly, all the information that is needed to use a service.
Here "indirectly" refers to the case where the IS publishes endpoints that can be queried
in turn to retrieve the resource properties.
Not to reinvent the wheel, the GLUE 2 schema should be used for definition of the resource
properties. GLUE 2 is mature enough not to have to impose new requirements on it, so I will
not. GLUE 1.3 is not maintained and it cannot be part of a long term perspective.
The resource properties, by nature, should be generated by the service itself as they
usually must be correct for the service to be usable as intended. Their publication should
be resilient to glitches and "short" downtimes (e.g. whenever the unavailability of the
service lasts less that the validity of the result of a resource selection query). Therefore
the availability of the properties information should NOT be used as an indication that
the service is available.
Querying this information requires access to all attributes, both to retrieve their values
and to express requirements on them. It must be possible to exploit relationships between
different types of services and their sites. Typical queries are:
- List all CEs in country X accepting CMS jobs and return their complete Globus identifiers
- List all CEs "close" to a given SE
- List all CEs with SL6 and with more than 100 job slots
It is highly desirable to be able to query OSG, EGI and NG resources in exactly the same way.
A fully flexible query language is not absolutely required, but the query results should
be easily machine-parseable.
We can distinguish two types of properties:
a) those that MUST be correctly known for the service to be usable (e.g. ports or endpoints)
b) those that do not directly affect the usability, if wrong, but prevent some queries from
returning correct results (e.g. number of cores or total disk space).
It must be guaranteed that the published values are at least compliant to the schema and
meaningful. Ideally, it should never be possible to see an obviously wrong value being
published. This still leaves room for "realistic" but wrong values. Service developers should
ensure that properties of type a) are never wrong and properties of type b) are not "too"
wrong (e.g., size-like values should be within a few percent of their true value).
It should never happen that some properties are so broken that people give up on them and
stop even asking for them to be correct.
Changes to these properties must be propagated within a few minutes, in particular for the a)
type. Typically changes are due to configuration changes or upgrades of the service,
therefore as a result of a human intervention.

Resource monitoring
-------------------
Service monitoring information needs not to come from the IS: it is acceptable to get it
from the service itself, according to its own methods. Here, the IS may provide a more
convenient way to access the information. If it does, though, it must ensure that the
information is correct and coherent with any other way to retrieve the same information
from the service. In WLCG, there is now no use case for resource matching based on
highly dynamic information. In any case, there must not be "bogus" information: whatever
cannot be reliably calculated, must not be published. Hopefully this will not be the case
for any mandatory attribute.

VO-custom information
---------------------
Experience shows that VOs need to generate their own site and service information. Examples
are:
- VO-specific site names (also aggregating more sites)
- custom services (i.e. services not known to GOCDB/OIM or completely managed by the VO)
- experiment contacts at sites
- services relevant for the VO at the site
- tier number
- parent/child relationships between sites
- pledges
- etc.
There is no need for the IS to accommodate this information, which is best managed by the
VO using their own databases. Traditionally, the ability to publish experiment "tags" to CEs
was successfully exploited, but alternatives could be devised. I do not consider it as a
requirement. The VOs should do their best, though, not to unnecessarily duplicate information
easily discovered from the IS, or at least put in place automatic validation tools that
ensure that the information in their own databases does not become inconsistent.

Protocols and interfaces
------------------------
The only real requirements are a REST interface or a convenient API supported by popular
languages, standard output formats and a simple query language. The actual protocol
(e.g. LDAP) should be hidden, if possible.
All the GLUE attributes should be exposed by definition, as this does not add to
the complexity of the API and makes easier to implement new use cases.

Role of GOCDB and OIM
---------------------
The purpose of GOCDB and OIM is to contain "custom" information useful for their Grid projects, for example to publish contact information, production status and downtimes.
As GOCDB and OIM are two very different systems, it would be desirable to implement a
uniform interface allowing users to transparently access the relevant information.
Concerning the idea of expanding them to support more functionality proper to the IS,
it does not seems an ideal solution, as it requires changes to both systems being done
in coordination.
However, it might be the best option for new services which do not have an information
provider and which are used by more than one VO.