I.e. in order to run monitoring in a completelely distributed fashion we have to redefine monitoring services as web/grid services . Thin data presentation clients (thin since they don't require experiment specific software to run) will ask to such services the description of the monitored subsystem and the monitoring data . These clients can also request the start of new tasks and have full control on monitoring. All the data will be transferred by using grid/web services standards(xml,soap,wsdl...).
The benefits of the grid computing for monitoring are:
If we consider how a typical next generation experiment (CMS) is planning for this task we see that we are in the phase of requirements definition[4]. The monitoring itself is done mostly in the so called Filter Farm , a computer cluster, part of a complex hardware system that ensures the readout of events [5]. The main problems are: is possible to develop a framework that will answer all use cases? Running the reconstruction framework on the filter farm will solve all requirements?
This is the old (local) way to consider the problem.We would like to see if we can complement this (or radically replace it) by defining the monitoring application not in terms of framework development but in terms of a hierarchy of web services. This means defining what are the monitoring services and what is their "granularity". Note that at this level we are defining only a set of web addresses(or ,if you prefer, a web interface) . There is nothing in a web address that binds the answer to a specific implementation. This has the further advantage that the implementation can be changed as the environment changes. For example : we can store histograms in a data base or compute them on the fly. The implementation will need of course the reconstruction framework to run on the filter farm but the monitoring application interface is decoupled from the implementation details.
This is what makes the monitoring application Grid and Web - aware and should make it possible to better exploit the Grid resources. We have also to consider that many monitoring tasks may require Tier0 and Tier1 resources and so it is important not to hard code a given implementation in the software. (I would like here to make more clear this point by an analogy: imagine what a disaster would be for Google if the implementation of answering a search would depend on a few computers to be always on and running some specific software. Google is able to adapt and scale so gracefully because any computer can be replaced with another in a second, and you can add any number of computers if the load increases.)
Some of the research issues that could be addressed by implementing this monitoring application on the grid are: