Monitoring HEP experiments with grid/web services

HEP detectors have a complex structure consisting of millions of electronic channels. These channels are grouped in a hierarchy up to a few subsystems that are built and monitored by different collaborations.Each one of the subsystems has still millions of channels to monitor and its monitoring is a task shared by hundreds people distributed in many laboratories spread in the world. Monitoring in the past [1],[2] was done locally in the control room and its interface was based mainly on a "histogram browser" that would allow the inspection(but also the creation and management) of a hierarchy of plots mimicking the detector parts tree. In the new detectors the huge number of channels to monitor and the increase of the number of people in charge requires a new approach that we can summarize saying that a web browser replaces the histogram browser and each histogram or group of histograms is accessible and can be managed through a web address .

I.e. in order to run monitoring in a completelely distributed fashion we have to redefine monitoring services as web/grid services . Thin data presentation clients (thin since they don't require experiment specific software to run) will ask to such services the description of the monitored subsystem and the monitoring data . These clients can also request the start of new tasks and have full control on monitoring. All the data will be transferred by using grid/web services standards(xml,soap,wsdl...).

The benefits of the grid computing for monitoring are:

If we consider how a typical next generation experiment (CMS) is planning for this task we see that we are in the phase of requirements definition[4]. The monitoring itself is done mostly in the so called Filter Farm , a computer cluster, part of a complex hardware system that ensures the readout of events [5]. The main problems are: is possible to develop a framework that will answer all use cases? Running the reconstruction framework on the filter farm will solve all requirements?

This is the old (local) way to consider the problem.We would like to see if we can complement this (or radically replace it) by defining the monitoring application not in terms of framework development but in terms of a hierarchy of web services. This means defining what are the monitoring services and what is their "granularity". Note that at this level we are defining only a set of web addresses(or ,if you prefer, a web interface) . There is nothing in a web address that binds the answer to a specific implementation. This has the further advantage that the implementation can be changed as the environment changes. For example : we can store histograms in a data base or compute them on the fly. The implementation will need of course the reconstruction framework to run on the filter farm but the monitoring application interface is decoupled from the implementation details.

This is what makes the monitoring application Grid and Web - aware and should make it possible to better exploit the Grid resources. We have also to consider that many monitoring tasks may require Tier0 and Tier1 resources and so it is important not to hard code a given implementation in the software. (I would like here to make more clear this point by an analogy: imagine what a disaster would be for Google if the implementation of answering a search would depend on a few computers to be always on and running some specific software. Google is able to adapt and scale so gracefully because any computer can be replaced with another in a second, and you can add any number of computers if the load increases.)

Some of the research issues that could be addressed by implementing this monitoring application on the grid are:

  1. CDF General monitoring framework

  2. CDF Strip tracker monitoring.

  3. Monitoring CMS Tracker construction and data quality using a grid/web service based on a visualization tool.

  4. CMS DAQ Monitoring requirements (Draft).

  5. CMS Physics Monitoring.

Maintained by Giuseppe Zito:
Last modified: