The inner chamber of the shell

An old fart guide to CMS software(CMSSW)


Before starting

I am rewriting this guide for CMSSW. You find here the old Orca based guide.


CMSSW is an absolute good. CMSSW is life. All around its margins lies the gulf.
My purpose is very simple: I would like to learn in 10 minutes how to do an histogram of a physical quantity. To do this I don't want to follow postgraduate courses on oo things. So ,first of all, of course, let's start from CMS Software Page . A lot of material. Difficult to understand what can be useful to a beginner. Perhaps this Workbook and these tutorials 6May2006, 6June2006,19July2006.

Well, after an hour I haven't still solved my problem but I have found wikis,forums, manuals and documents about everything but my simple problem.Perhaps I will need more than 10 minutes! Anyhow there is a name that pops up almost everywhere CMSSW. So I try to understand what is this CMSSW.

CMSSW is a framework. Is used everywhere CMS software is needed. Implements a software bus model wherein there is one executable called cmsRun and many plug-in modules.

Ok you got it! No? More or less having to do your histogram with a framework is like getting a jet when you just want a bycicle to go home. Or getting a factory to build hammers when you just need a simple hammer. The good news is that now you get all this code in one place CMS software CVS repository neatly packed in a two level hyerarchy . You can browse this huge amount of code and also make searches by using a tool called LXR. You can access it directly on /afs/

The people that works at this code tells me how there was a few years ago a dark age when all the code was split in different realms with strange names like Orca, Cobra, Oscar, Famos, Iguanacms.

Six degrees of complexity : recipe for the impatient user

Here is the CMSSW tutorial from the workbook.:
(on lxplus)
cd /tmp 
mkdir $USER
cd $USER
scramv1 project CMSSW CMSSW_0_6_0
cd CMSSW_0_6_0/src
eval `scramv1 runtime -csh`
tar xfz Tutorial_0_5_0.tgz
scramv1 b
cd Tutorial/Analysis1/test
cmsRun tutorial.cfg
(I got some error messages pointing to the names of two files in tutorial.cfg :the
the problem is that I am using a CMSSW_0_5_0 tutorial with CMSSW_0_6_0;
the format of instructions is of course changed. By comparing the tutorial.cfg
file with an updated file VisDocumentation/VisTutorial/cmssw-reco.cfg I can get the  corrections to do:
 file:/tmp/HTB_011609.root instead of HTB_011609.root
FileInPath file = "CondFormats/HcalMapping/test/hbho_ring12_tb04.txt" instead of string file = "hbho_ring12_tb04.txt"
end of comment)
I got a ton of printed lines and a brand new "tutorial.root" file. I can start now "root" by just writing
root tutorial.root
TBrowser b;
(to exit)
The command TBrowser opens a graphic window and I can browse the histograms in tutorial.root I am moved, almost crying. I was able to put toghether 7 modules of the framework in order to analyze some data and get some histograms. I have also requested the use of other 6 EventSetup modules indicated in the code with es_module :these are special modules that implement resources or services available to normal modules. Data necessary to configure a module are defined as module parameters. All that by using the file tutorial.cfg . The only problem is that I don't have the slightest idea which plug-in modules I have to use to create my histogram and how my configuration file should be written. I have downloaded some C++ code and compiled it: what was its use? Thanks to the code I downloaded I was able to do my first travel in the CMSSW jet. What I should know to pilot this jet myself? Or should I always depend on some ookid available nearby to do it for me?

But ,anyway, this example is useful to understand exactly why CMS software is so complex. If you compare CMS experiment with previous experiments you have the following additional layers of complexity between you and the data:

  1. C++ - Do you understand the code copied in the Tutorial directory, in the example before? Do you understand the code in the CVS repository?
  2. Objects - Which objects is this application using? How are they implemented in the code? If I want to access other informations, where should I look?
  3. GRID - Which objects are persistent? How do I access the events I am interested?
  4. EDM+CMSSW framework - Ok, this should shield me from the previous layers but I have to learn how I can write configuration files that will get the data that I need and process them with a cascade of plug-in modules until I get some output data.
  5. CVS - You must know this to access and manage the source code.
  6. SCRAM - Its importance can be seen from the previous example. So what all these commands do? What is the meaning of the XML commands in the BuildFile files that you find everywhere in the CVS repository?

Know your tools!

From the list in the previous section, it must be evident that now you have at your disposal a really awful set of new tools. Unfortunately it is very difficult to use them, since you don't even know what is their name! And you feel like the apprentice sorcerer that tries to use the spell book of his master. But anyhow, let's try some spells and see what happens. (In giving the commands that follow it is important to realize that "scramv1" and "cvs" commands result depends on the directory where you type them.)
SpellResult of the spellThings to be carefull about before you cast it
Click on Cern Computer Resourcesto have information on computers, disk space and other resources available in Cern
scramv1 list
List of all public projects and their releases You get also the main directory of the release. By looking at .SCRAM/Linux__2.2/ after this directory, you have a list of all packages needed by the project.
scramv1 project CMSSW CMSSW_0_6_0 
Create your own private project starting from the public release indicated.Requires lot of disk space!
scramv1 tool list
Lists all tools available in SCRAMThe command must be done in the directories of a private project
scramv1 tool info toolname
Lists all information about a given toolditto
eval `scramv1 runtime  -csh`
Makes all the libraries known to SCRAM accessible
scramv1 runtime  -csh
Prints the result of the previous command without executing it use -sh if you use sh like shell
scramv1 build
The equivalent of make for SCRAM.Executes the Buildfile contained in the directory where you are.Every Buildfile uses other Build files (Command use) which use other Buildfiles,etc The command builds an executable (Command bin) and/or libraries (command lib).
Give the option echo_INCLUDE to know which directories are searched for include.
In a Buildfile put a first line with:
to add an arbitrary directory to search for Include Files.
scramv1 b distclean
To undo the effect of previous scram build in your local Release Area
scramv1 b CXXUSERFLAGS=-g
Compile with g flag for debugging
cmscvsroot projectname
Define CVS repository for the project so you can access the source
cvs --help-commands
List of CVS commands
cvs checkout  Modulename
cvs co Modulename
Get a local copy of the module source in your working directory that you can editWill get the head version i.e. the most recent version.This can also be a version not working properly.
cvs co -r version Modulename
To checkout a particular(stable) version of the module
cvs diff
To know the differences between what is in the repository and what you have in your directory
cvs update -r version
To get your local copy in synch with the repositoryThis can be necessary when the head version is no more working and you want to go back to some stable previous version. The version tag should be normally of the form ORCA_4_5_0
cvs update -A
To reset tags This can be necessary if you get the following message "cvs add: cannot add file on non-branch tag"
cvs add Modulename
To add a new module to the repositoryOnly for developers! Must be followed by a cvs ci command . The file Root in the CVS directory must contain the following line:
cvs ci -m "message"  Modulename
To update a module in the repositoryOnly for developers!
cvs remove Modulename
To remove a module from the repositoryOnly for developers!
cvs tag -b release0
To tag a (stable)version as release0: a new branch is createdOnly for developers!
cvs status
To know the current status of your working area modules
cvs rtag -b -r release0 release0.1 Modulename 
To connect branch release0.1 to branch release0. This can be useful to deal with bugs corrections to a stable version.Only for developers!
cvs log Modulename
To list the versions of a moduleWithout Modulename will work recursively on the directory
cvs log -N -d ">2002-9-1" | more
To list all revisions done after the specified date
cvs diff -r version Modulename
diff with a previous version of a module
klog.krb gzito -cell
cvs -d 
commit -m "message"  
to access repository from a node outside Cern

Recovering data from CMS software black hole

What I really need is a document that explains to me where all relevant data is (corresponding to the list of data banks of previous experiments) plus an example program that will show me how I access these data.

The tutorial example will be useful to understand how this can be done. Let's look at a code snippet from

void DemoAnalyzer1::analyze(edm::Event const& e, edm::EventSetup const& iSetup) {
 // These declarations create handles to the types of records that you want
  // to retrieve from event "e".
  edm::Handle<HBHERecHitCollection> hbhe_hits;
  edm::Handle<HORecHitCollection> ho_hits;
  edm::Handle<HcalTBTriggerData> triggerD;

  // Pass the handle to the method "getByType", which is used to retrieve
  // one and only one instance of the type in question out of event "e". If
  // zero or more than one instance exists in the event an exception is thrown.
Before going to the details on how to extract data from the containers , the code snippet shows the fact that all event data is identified by a type and a label and ,if you know them you can access the data by using the methods getByType, getManyByType, getByLabel, getManyByLabel. The label corresponds to two names but the second name defaults to the null string. The first name indicates the producer i.e. the module that produced the data. The second name indicates the data label. The data label can be discovered in an interactive way by using ROOT on the input file. In the images produced by root you get a characteristic list of names of event data. Each name is in the form dataType_producerName_dataName. For example: SiStripClusterCollection_ThreeThresholdClusterizer_ProcessOne. Another way to discover interactively these names is to use the Iguana event display (see later) that produces for each event a similar list using the names "Friendly Name" for type, "Module Label" for producer and "Instance Name" for data label (a little confused?) .
To summarize: every chunk of event data is identified by 3 names. The method getByType uses the first name, getByLabel the second and the third. Note that when you do a request, there can be many blocks of event data which satisfy the request.

IGUANA : CMSSW visualization

(The situation of CMSSW event visualization is changing rapidly: please use this Cms event visualization (unofficial) FAQ to have an uptodate report.)
All these powerful tools and still no histogram! Perhaps IGUANA can help me. Iguana is used for event and other data visualization . Yes it is an event display and has a simple manual.
CMSSW_0_6_0 Visualisation(IGUANA_6_9_2)

Event Dump

By now I am starting to understand. All event data is connected to the edm::Event object. Each kind of event data has a "name" and can be retrieved in a uniform way using that name. Event data is stored in root files and can be inspected interactively using the program ROOT. You can even do interactively some simple histogram on the data. Hey this is something that also a Fortran damaged brain like mine can understand!

As this workbook page Different Ways to Make an Analysis explains, there are 3 different ways to do analysis. 1)with Bare ROOT 2) Framework-Lite (FWLite)mode 3) using the full CMSSW framework with an EDAnalyzer(i.e. you can go home 1)using a bycicle 2)a car 3)a jet ). It is really a big relief to know that I don't need to learn the full CMSSW framework to do my histogram! But before I start inspecting root files I have to remember that I must:

cd CMSSW_0_6_0/src
eval `scramv1 runtime -csh`
this is important in order to get the right version of the ROOT program. A simple find in the code repository :
find /afs/ -name "*.root"
will give me a list of root files used to test the code.The directory Configuration/Applications/data seems to contain a lot of those data files produced presumably by the configuration files in the directory.

By examining a root file in SimTracker/SiPixelDigitizer I discover the name and format of pixel tracker simhit data. Nice!

Other files to inspect can be found by looking VisDocumentation/VisTutorial/test/README.1st. By examining the events in dtdigis.root I discover the name and format of digis in Muon Detector.

Now a file containing 5 almost complete events.

So the proverbial good news is that data format can be easily inspected interactively by looking at events with root. The bad news is that data format changes with almost any new release! So how to cope with this awful situation? The files in Configuration/Applications/data will help us. Suppose that you are at release CMSSW_1_1_0_pre2 and want to check what is the format now: easily done!

cd CMSSW_1_1_0_pre2/src
eval `scramv1 runtime -csh`
 cmscvsroot CMSSW
cvs login
cvs co -r CMSSW_1_1_0_pre2 Configuration/Applications/
cd Configuration/Applications/data/
cmsRun -p sim_rec_10muons_1-10GeV.cfg
After this I get a root file with 3 events produced with this release that I can inspect.

Where are all these petabytes of events?

CMS events are like the aliens referred by Fermi in his famous sentence: So, where is everybody? Waiting for the real ones, there are huge deposits of simulated events. But where? Somewhere on the grid. The grid what?

Anyhow let's try to start with this Release Validation samples . After a few lines I was able to collect the following self explanatory acronyms :crab, dbs, phedex,lfn,pfn. I skip these and go direcly to what seems to be a list of list of files with these strange names:

. I am informed that I can use directly this name in the configuration file in this way:
source = PoolSource { 
  untracked vstring fileNames = {'/store/unmerged/RelVal/2006/7/24/RelVal081Higgs-ZZ-4Mu/GEN-SIM-DIGI-RECO/0006/44E28C85-8D1B-DB11-9EEC-000E0C3EFB43.root'}
I try this with my Iguana cfg file and it seems to work but only running on a Cern computer. So where are they ...?

Reading again the helpful page I learn that I am using the LFN of the event dataset : L seems to be for Logical , so it is a kind of generic name of the data set. This has to be confronted with the PFN (where P stays for Physical:I am getting smart at these matters) that says where the events are physically stored. The answer from the same page is (for CERN):

i.e. you must add the string rfio:/castor/ before the LFN . So I am starting to understand. All these petabytes of events require Cern to use a huge mass storage called CASTOR where data is accessible through a protocal called rfio.The files in Castor can be manipulated by commands similar to those I use on my Linux box but with an additional rf before:
So the command
rfdir /castor/
will show me the highest cms directory in Castor.

Now I have gigabytes of available space on my portable. Why not copy a file locally? Since I haven't access to castor from my portable I have to do it in two steps(on lxplus):

  cd /tmp
  rfcp /castor/ .
  scp E4D2DBBA-F250-DB11-98C6-000E0C3F0935.root
After ten minutes the events are on my portable ready to be inspected with Iguana or root!
Exploring castor I discover a /castor/ directory with subdirectories that start with /RelVal120: so these are CMSSW_1_2_0 events! I copy another file.
rfio and castor are only two possible protocol/store on the grid. Another is dcache. Dcache store of grid datasets can be "explored" by using the command:
ls /pnfs/
The copy of dataset to local store can be done in this case using the command dccp instead of rfcp.

In fact the whole story seems to be a lot more interesting. Datasets on the grid have a kind of generic name LNF that you can use in your configuration file . The framework will take care to get the copy of the dataset nearest to you. Which copy is used and by which protocol (rfio,dcache,etc) should be transparent to the user.

For example I got a post to hypernews which says:

  Reco output will be registered with datasetpaths like

To know the complete LNF I go to this service or this service selecting "MCGlobal/Writer". Although the use of these two services may seem confusing and slow it isn't very difficult to get tons of LNF of dataset of reconstructed data like:
You can use in your cfg file more LNF's separated by commas. Datasets are grouped in run (7282 in this case) and for each dataset you know the number of events(71). Then I try the LNF in my cfg file. If I got an error it could depend on many things:
  1. The computer you use has no access to the Grid (in this case you can either call an expert to make your computer directly connected to the Grid or do the trick previously explained to do a local copy on your computer).
  2. The dataset is corrupted, perhaps empty . A check using commands like "rfdir" (for rfio protocol) or "ls" for dcache protocol should sort out the problem.
  3. The dataset is OK but some event (perhaps the first one) is corrupted. In this case try to skip a few event.
To conclude this story we must mention that it is suggested to use something called CRAB to analyze these data. Also it seems that a certain ProdAgent appears everywhere the grid is used. But these will be the subjects of future posts.

Where is my database?

The normal input-processing-output schema is translated in the configuration file in the following : source,es_source-module,es_module-output. This means more or less that there are two kind of processing modules: normal modules that process event data defined in a block source; event setup modules that implement "services" and consume data provided by block es_source. This block defines the database of "non event data" used by the program. Note that normal modules don't access directly the database ; instead they rely on these event setup modules or services to access this data.

So now the one million dollars question: where is this database? Looking at the cfg file I get lines like:

es_source = PoolDBESSource {
VPSet toGet = {{ string record = "SiStripFedCablingRcd" string tag = "Si
StripCabling_TIF_v1" }
 ,{ string record = "SiStripPedestalsRcd" string tag = "SiStripPedNoise_TIF_v1_p"}
 ,{ string record = "SiStripNoisesRcd" string tag = "SiStripPedNoise_TIF_v1_n"}
 bool loadAll = true
   #WithFrontier   untracked bool siteLocalConfig = true
   string connect = "oracle://orcon/CMS_COND_STRIP"
 untracked string catalog = "relationalcatalog_oracle://orcon/CMS_COND_GENERAL"
          string timetype = "runnumber"
          untracked uint32 messagelevel = 0
          untracked bool loadBlobStreamer = true
          untracked uint32 authenticationMethod = 1
   es_source = PoolDBESSource {
     VPSet toGet = {{ string record = "SiStripFedCablingRcd" string tag = "SiStripCabling_TIF_v1" }
     ,{ string record = "SiStripPedestalsRcd" string tag = "SiStripPedNoise_TIF_v1_p"}
     ,{ string record = "SiStripNoisesRcd" string tag = "SiStripPedNoise_TIF_v1_n"}

   untracked bool siteLocalConfig = true
  string connect = "frontier://cms_conditions_data/CMS_COND_STRIP"
  string timetype = "runnumber"

  PSet DBParameters ={
   untracked string authenticationPath = ""
  untracked bool loadBlobStreamer = true

It seems that the same data can be accessed in two ways: the oracle/orcon way and the "frontier" way. What ?!?

Let's try to understand: the Grid or better the LCG (LHC Computing Grid) is hyerarchy of one T0 central node, a few T1 regional nodes and then many T2 and T3 lesser nodes. These data should be supplied to all of them. Fortunately only T0 or Cern will update (for now) the database, all others nodes need only a read only copy of the database. So the "Database" is in Cern at the central node on a cluster of Oracle database server (Yes Oracle the famous database company). This central database can be accessed directly from Cern but also from T1 nodes where it is replicated (always by Oracle) authomatically.

But the same database can be accessed with another more simple "mechanism" the "frontier way" that uses a web protocol with url requests and xml answer (essentially you query the database sending an URL to the frontier Web server and you get the answer as a XML file). This way to access can be used of course from everywhere (we are on the Web!) but in order to be efficient and fast is used by using squids (!?!). Not the gentle marine creatures but the local web server used to store locally (cache they call it) the response to queries. In this way they speed a lot access to central DB. That's the end of our story populated by oracles, squids, frontiers and modules.

Welcome to the new 8PD era of chaos: understanding the 8headed monster

In programming we have this wonderful way to introduce complexity in programs. We make it transparent to the final user : meaning that the user has still the impression that it is using the same simple program. Also, when I buy a new car, I expect it to have the simple mechanical gadgets that were present 50 years ago, although now a car hides a lot of intelligent chips inside.

In CMS data taking there has been a very short golden age where everything seemed to be so simple. There was only one Primary Dataset (PD) containing all collisision events. Out of this a few Secondary datasets (SD) were built for special studies. Then we had the usual skims (CS:common skims) that are welcome to users wanting to do a fast analysis on a sample of preselected events. There were a well known set of rules to select good collisions out of the PD.These were implemented in a simple recipe that was used by any user simply copying it in their programs without thinking too much about the its meaning.

But as the accellerator beams became more and more populated with protons and squeezed and the frequency of collision increased, the single PD scheme was no more possible. We had to split the single PD to 8. You would think that the way to do this is just to have 8 datasets which are created in parallel but have the same format. So the end user has only to collect the complete list for a run and his/her program would work as before.

Think again about it. The 8PD have each on a different event selection! Only their merge is almost equal to the old unique PD but for a small detail that now you can have also repeated events! Here comes the 8headed monster.To study good collisions you have now to analyze 8 different datasets, each one using different rules! This is a perverse scheme , the complete opposite of a transparent change!

I am absolutely sure that the committee that decided this has made the best decision from the technical point of view. The problem is how to make this change transparent to the final naive user. From this point of view the way event triggering is done (the lines defining rules refer to trigger) is a mess. You have at least three different levels (or types?) of trigger: L1(Level 1),HLT(High level trigger),TT(Technical trigger bits). Than you have the possibility to combine them in any logical combination. You can (pre)scale any trigger:meaning that you take only a fraction of events which gives that trigger. You have numbered trigger algo(rithms). Many triggers are in fact not hard coded in hardware , but computer programs that can be changed any time. New triggers can be added. Just to give an idea : this is the code defining the trigger for a single run and this is the result in terms of event counts and percentages.

Standalone Installation of CMSSW with apt: Post Tenebras Lux!

I had a lot of problems in the past installing CMS software on my computer. The thing that really drove me crazy was that after doing a lot of work installing the latest release, you had to redo it again for the new release after a few days. I have used the apt installer and was especially pleased from the fact that I could install the newest release with four commands described in the manual:
eval `$VO_CMS_SW_DIR/ config -path $VO_CMS_SW_DIR -csh`
apt-get update
apt-cache search cmssw
apt-get install cms+cmssw+CMSSW_0_8_0
I had misty eyes when I saw (after a few minutes) the "Done" printed and I could immediately test the new release on my computer. That's progress!

To conclude this topic I must mention here how you get rid of old installations when your computer get completely full. Unfortunately this isn't mentioned in the manual. If you try the "obvious": apt-get remove cms+cmssw+CMSSW_1_2_0_pre4 you get the not so obvious message:

Package cms+cmssw+CMSSW_1_2_0_pre4 is not installed, so not removed
. So I am trying this quick fix:
scramv1 remove CMSSW CMSSW_1_2_0_pre4
rm -rf /cms/slc3_ia32_gcc323/cms/cmssw/CMSSW_1_2_0_pre4
As you see in this case, programmers always follow the golden rule of not making something perfect to avoid the envy of the Gods!

The dance of tags between Releases

Between one Release and the following in CMSSW develops an extraordinary dance of tags. Tags indicate a (temporary) version of a package ready to enter the next Release. Now the problem is the following. If I want to test my package for the next release, almost certainly my changes depend on changes done from other people. This means that I can't test my package alone but I must download a set of modified other packages (called in short tags). In fact this isn't really sufficient, because probably to use those tags, I must modify my cfg file. This is what I call the Dance of Tags between releases.

Some benevolent colleague sends you a monstruos list of new tags necessary to test a new feature.Then you commit the tags one by one. The command:

will give you information about what new tags you have. Then you try "scramv1 b" hoping for the best. Normally this should work from the main "src" directory by the magic of "BuildFile". After a lot of time a bunch of new libraries , plugins and stand-alone programs are built in a cache "lib/slc4_ia32_gcc345/".

If you don't get any error, then you can try your (updated) cfg file in the appropriate directory. Unfortunately you start getting strange messages of missing plugins or "pure virtual method called". These are almost always due to the fact that the complex machine hasn't worked perfectly and now you are trying to use toghether new and old code.

At this point you start becoming desperate and you begin using strange commands like:

scramv1 b -r 
that seems to work like when you bang an hammer on a malfunctioning machine. Cross you fingers and try again "scramv1 b" perhaps from another directory.

Getting information from HyperNews the collective memory of CMS

HyperNews CMS Forums is fast becoming the collective memory of CMS with its hundredths of messages posted daily to many forums. What makes it so useful, is the possibility to search all these messages. If your question is not answered by a simple search then you can post your problem to the appropriate forum contacting the experts and other CMS interested people.

I would like to test how successfull is this tool. I have the following problem:the apt installer stops with the following error message:

E: Sub-process /home/zito/cms//bin/rpm-wrapper returned an error code (79)
I do a search of words on this sentence on HyperNews and discover that the Forum Software Distribution Tools deals with problems like this. But there is no post that seems to answer my problem. Apparently no one has had before this error code (79). So I make a post.After two hours I get the answer. Not so bad!


Proper data taking will start only in 2007, but we have to get ready! So I have started experimenting with what online access to CMS data could be. I am interested in tracker monitoring so the proper place for me to get data is the so called Filter Farm. A kind of computer cluster of Filter Units that make possible access to raw data to be used for data quality monitoring(DQM). Filter Farm, FU, DQM a whole new bunch of acronyms to learn. However , considering the complexity behind it, Physics and Data Quality Monitoring infrastructure is really simple. It is based on a three tiered architecture with a Collector that receives data from many hardware and software Sources and makes them available to many Clients. Collector, Source (this should be a FU during data taking) and Client can run on different computers.The basic unit handled is a Monitoring Element (which corresponds more or less to a single online histogram). Monitoring Elements are served by sources in a tree: the Client will build also its tree of ME by registering those ME in which is interested.Easy piece...
Now what kind of software you run on a DQM application? Exactly the same that you run offline.

Iguana in this case acts as a client.


These are small recipes learnt from experts by asking around or just by looking experts working on a terminal.
Access interactively from lxplus with root a file known with its LFN (stored on the grid at Cern)root rfio:/castor/<LFN>
Compile a package in such a way that you can inspect the code during execution with gdbYou have to download the package from the repository , add a line in a BuildFile and recompile it for example:
cvs co -r CMSSW_2_0_7 VisReco/VisCustomTracker
cd VisReco/VisCustomTracker
vi BuildFile
(add after use commands, following line )
<flags CXXFLAGS="-O0 -g3 -fno-inline">
scramv1 b
Copy or select a few events from an input dataset to an output datasetUse the following configuration file:
process copyfile  = {
untracked PSet maxEvents = {untracked int32 input = 3}
      source = PoolSource {
      untracked vstring fileNames = {'file:/tmp/30525A21-BD8E-DC11-A52A-000423D9939C.root'}
       module output = PoolOutputModule {
            untracked string fileName = "Simevents.root"
      endpath copydata = {output}
Display the first events of a RECO dataset on castor
/afs/ rfio:/castor/ 
Find the dataset of a specific run in CastorIn dbs give the search "find run where run = 62938"
How to test a cfg file without events (for example to check in iguana only the detector display) Comment in the cfg file the lines connected to event input and replace them with:
        source = EmptySource {untracked int32 maxEvents = 2}
How to try to understand if a Frontier connection is workingBefore running "cmsRun" with es_source = PoolDBESSource define:
setenv FRONTIER_LOG_FILE /tmp/frontier.log
In the file defined you will get the complete dialog between the program and the frontier database servers
How to avoid that a cfg file crashes because of a "ProductNotFoundError"Insert in the configuration file the following lines;
untracked PSet options =  {
   vstring SkipEvent = {"ProductNotFound"}
more informations here
How to check a configuration file for Iguana and the input datasetInsert in the configuration file the following line at the beginning:
service = Tracer {}
Now run using the command line
cmsRun -p myconfigfile
Should have a report of errors in the cfg and the events.
How to check in Iguana if a bug depends on input dataRun iguana and without selecting nothing to display , check "auto event"Iguana will call the method "OnNewEvent()" for each twig. If you obtain a crash , you know that its cause is reading and loading data in event and not displaying it.
How to check that you have the right version of software to run a CMSSW
How to check flags and libraries used to build your CMSSW programscramv1 b -v
How to create a few events to test the corrections to be done to the last integration build TTbar.cfi -s GEN,SIM,DIGI,L1,DIGI2RAW -n 3 --eventcontent FEVTSIM --conditions FrontierConditions_GlobalTag,IDEAL_V1::All --relval 10000,100 --datatier 'GEN-SIM-DIGI-RAW' --dump_python
cmsRun myreco -s RAW2DIGI,RECO,POSTRECO -n 3 --filein file:TTbar_cfi__GEN_SIM_DIGI_L1_DIGI2RAW.root --eventcontent RECOSIM  --conditions FrontierConditions_GlobalTag,IDEAL_V1::All --dump_python

Look in the release Integration hypernews for more informations.
How to know the content of a event root fileEdmDumpEventContent filename If the file is on the grid at cern (i.e. filename starts with /store/... then give as filename "rfio:/castor/"
Find memory leaks in your programvalgrind --leak-check=full cmsRun -p mycfg
What to do if you are stuck in compiling the code with scramv1 b and you are not able to get your mods included in the programGo back in the CMSSW_x_y_z/src directory and give the commands:
scramv1 b distclean
scramv1 b
To display with iguana the events in a root file without using a cfg fileiguana filename For example:
iguana rfio:/castor/
iguana will create automatically a cfg file with the name iguana-<time>.cfg
To allow the printout from iguana

 env LOG=stderr iguana cfgfile 
or (to capture log messages in a file)
  env LOG=stderr iguana cfgfile  > & log.log
How to list all tags in fronter dbcmscond_list_iov -c frontier:// -aLook here for more informations.

The eternal sunshine of the spotless tutorial

With each new tutorial there is hope in me that it will at last reveal all secrets of CMS software. But after the usually brilliant presentation without a glitch, the delusion comes in the next days. When you try to repeat what you learned , you start seeing things falling apart. After a month the tutorial material is completely useless. What happened? The CMS software is so complex and in addition , changes so swiftly, that the only way to create a good tutorial is to assemble the day before a "simplified platform" that will last only for few days. If the tutorial authors had to be completely frank about the subject, then they would say: look guys this is very complex, everything changes, there is no hope for you newcomer but to ask the expert to install everything for you. After this ordeal is over , yes it is as simple as writing cmsRun -p myconf.cfg but don't expect it to work for more than a few days. Than you have to restart again with the ordeal or follow a new tutorial.

Python too!

Although I still dream in Fortran, I love new languages like Python. They are powerful and modern like C++ but in the same time, very simple to learn and use. So the idea of having in CMS the possibility to use this language to write configuration files looks promising. The same can be said for the possibility to use pyRoot (which is python integrated in Root) to do analysis in interactive way. Imagine the possibility to create interactively any class present in CMSSW and explore its use by using its methods.

Unfortunately integrating CMSSW in python I understand that shouldn't be so easy. All .cfi files now existing in data directories (configuration file fragments built to be included in other cfg files) must be duplicated as files in python directories. Also there should exist a kind of dictionary that should describe to python the actual implementation of each CMSSW C++ class. Once this is done you should be able to start playing with python.

In the end a python cfg file looks very similar to the original and there is no hint that we have gained something from the change. Anyhow a python cfg file is used exactly in the same way as a normal cfg file, i.e: cmsRun

If you want to use interactively python to analyze a root file than you have to give the following commands:

cvs co PhysicsTools/PythonAnalysis/
setenv PYTHONPATH $CMSSW_BASE/src/PhysicsTools/PythonAnalysis/python:$PYTHONPATH
cd PhysicsTools/PythonAnalysis/examples
(copy in this directory your root file: in this case TTbar.root)
b = TBrowser()
(the root browser opens: this is important to know what collections and informations for each collection we have)
events = EventTree("TTbar.root")
for event in events:
    tracks = event.generalTracks
    print len(tracks)
    for track in tracks:
CTRL_D 	(to exit from python)
I am delighted. I was able width a few lines to loop on all events knowing how many generalTracks are for each event, and then to loop on tracks printing their pt. The only problem I had was to understand exactly what string to put in tracks=event.generalTracks to get generalTracks . The examples present in the directory ,although useful, didn't make the matter clear. According to the documentation you should either give a small "alias" or a very complex complete "branch name". At this point, miracles of python, I discovered this little recipe to get all "aliases"and branch names in event.
for alias in events.getListOfAliases():
As I said I love Python and its programmers. They always care that everything is simple.

Taming the monster:refactoring with less dependencies

Being used to procedural programming ("a la Fortran") I see a program as a machine that does some task. If the task is done well and if I understand the way it is done enough that I can change the code to do some modification, I don't care too much on how the program does it. It seems that with object oriented programming the things aren't really like this. You usually don't build a program to solve some specific problem, but you create a framework i.e. a collection of classes that try to solve a class of problems.
Doing this the things can get very wrong because of the complexity of the relationship between classes (the dependencies).
It seems that is just this that went wrong with the first version of CMS software (Orca,Cobra,etc...). The software became like the mythical monster Hydra with nine heads. A tangle of classes. The only way out was to restart from scratch (CMSSW) this time trying to minimize dependencies. This was really a refactoring of the software since most of the old code could be reused.

Just to see where the problem is just look at this picture. The dependencies are the black arrows.

The CMSSW is now at least a single headed monster and each day hundreds of changes are done to the code. Then when is night in Europa the mithical nightly build begins. The CMS people try to see if the tiger is still tamed by compiling everything. After a few hours comes the verdict. Often everything is ok and it seems like a miracle: the tiger is still in the cage. But every few weeks a disaster happens and all packages get compilation or link errors. The following morning panic spreads in the collaboration as individuals read the nightly response. The tiger is out. Frenetic work starts to get it tamed again.

Happy End?

This story has no happy end.There is no easy way to start with CMSSW. Even developers are continuosly challenged by new changes and have to struggle to get their code working. There is no royal way to CMS software. Even with a tool like Iguana that should be very simple to use the truth can be summarized by these conclusions that I presented in a tutorial session:

It is easy to visualize CMS detector and data provided that ... You have five experts around!

  1. An hardware expert to buy for you the right pc , the right graphics card,..
  2. A linux expert to install on this computer the right linux distribution,the drivers,etc
  3. A CMS software expert to install the four or five CMSSW releases necessary to see all data available
  4. A configuration file expert to handle you the right cfg file for your data and CMSSW release
  5. A data/grid expert to give you the right coordinates on the grid of the data you need to visualize

The CMSSW situation reminds me of what happened when IBM introduced a new machine with the infamous Job Control Language . The new machine (IBM 360) was a big success but no one really understood the JCL. The solution to this problem by us the old farts , was:get from a colleague a working deck of cards and you would use it again and again without trying to understand what was the meaning of such garbage like //GO SYSYN DD *. So,if you want to do quickly your plot, get a working cfg file from your expert colleague and don't try to understand it but only hope that it would still work with the next release. Otherwise you have to bother your colleague again.

CMSSW is a big success but it isn't for the faint of heart. But if the language used in the cfg files is difficult and the way it works isn't easily understood, you must consider that it has to deal with a very,very complex environment.

Guidelines for good CMS talk writing

Judging from the comments to a draft of a CMS talk written by me to be presented at Chep09 conference, it contained almost all the mistakes a newcomer can do in such a presentation. For this reason I report here the guidelines for such a talk from CDF, in order to help young people to avoid this mess. You can use my presentation as a negative example since almost any slide breaks these guidelines.
    Guidelines for good talk writing from CDF
  1. Who is your audience? Your first task is to figure out who is going to listen to your talk, and what you want them to carry off with them. Are you selling the excitement and the adventure, or trying to convince difficult critics you got it right? Do the need or want nitty gritty details or just the highlights and the conclusions? For young scientists, especially, this can be THE most difficult hurdle to overcome in giving a good talk. You have worked for months or years, you are intimately aware of and indeed in love with all the details, and you want to share it. In a conference talk, you must resist this temptation. And in CDF group meetings, the details may be important but without the context they are not useful. Start at the beginning and remind yourself why, exactly, you and your colleagues are doing the project. Make that your first slide...then telling how you did it, and what the results are will flow naturally. Then think again about who is actually sitting there in the audience, what their questions might be. Are there theorists? Grad students? Ancient (distinguished) professors? Make sure you give them some handles to grasp your subject. On every slide the basic message of that slide should be clear. Tell them what conclusions to draw, and then they will draw those conclusions. When in doubt, pitch your talk as if to a first-year grad student...
  2. Go high tech...mostly.

    Hand-written notes surrounding plots xeroxed onto plastic, handled until the finger grease smudges are darker than the shading of the histograms...yecchhh! Things have come a long way. Electronically produced presentations using LaTeX, Powerpoint, and other programs, projected with an LCD projector, are much more accessible to the audience, easier to prepare, transmit, and modify, and won't end up sliding off the table into a random pile on the floor. Programs like Powerpoint allow you to put in lots of fancy animation, which can be used t good effect, but which can also be annoying if used for something other than to enhance the impact of the presentation. For example, don't make text/plots appear or fly in unless there is a good reason, such as a new thing flying in on top of an old one. Fonts are fun, but can also be a source of annoyance. You really, really don't want to annoy your audience, either. Ditto for colors, and boxes, and clip art, and anything else your office mate thinks is annoying. (I was once at a talk comparing various SUSY models, and the speaker put a photo of a supermodel in swim suits on every slide, with little word bubbles coming out of their heads...I think this guy left the field.) There are indeed many tools to learn. Nearly all our plots, for example, start out as postscript, but end up in some other form which can be imported into a graphics program. In the process, these can be rendered illegible or worse. Make sure you understand the graphics format you are using, and the program you are using to convert from one format to another. Spend a little time learning the finer points of programs like Adobe Illustrator (an indispensible tool, by the way - get your boss to buy it) and you can make compact, high-resolution, transportable plot images.

  3. Make it readable!

    Now we come to one of the biggest problems in our field: making decent plots. Consider the following two plots. Which one do you want in your talk?

    In addition to making the plots readable, the TEXT must be readable too. Far too often speakers put such small font on the slide that it is not legible. As a rule of thumb, print out your slides, and measure the font size with a ruler. It it's less than about 4 mm, it will not be readable from the back of the room, and should be increased. You can't fit what you want to say if you make the font bigger? Then read the next section...
  4. A picture is worth 1000 words

    On one end of the bad talk spectrum is the kind that is all text: block after block of paragraphs, read to you by the speaker. Then the speaker will switch to a page with just a plot on it, with no explanatory text. Then it's back to the block paragraphs. While this represents an extreme, it does actually happen and it's not pretty. Very often, though, one en- counters something nearly as bad: a talk where the plots are reduced to near-icon size, lots of big, clear, full sentences describing them on the page. Don't do that. Your plots should take up the vast majority of the area of your slides. If it isn't about 2/3 of the area, cut down the text and make the plots larger. In cutting down the text, pare it to the bone, then boil the bones. Get rid of articles, verbs, adjectives...anything that is not the essence of the information of what you are trying to say. Really, you don't need or want all those words! Also, do you really need the experiment logo and that of your institution on every page of your talk? It takes up a lot of area... Think about sitting in a talk . You want to see the plots, understand what they are saying, and get the message from the speaker, who should be adding what's not on the slide itself. If it's already written on the slide, you stop listening to the speaker!

  5. Less is more

    As noted above, you love your analysis. You have lived it for the past fourteen weeks, through sleepless nights, hours, and meetings. You know every detail, and think it's pretty cool. Clearly everyone wants to know all the details, right? What your audience wants to know is why you did it, a bit about how you did it, and what the result was and what happens next. (And whether it's the best anyone ever did!) Keep it to that, and you will have a winning talk. Add in a bunch of boring details and they'll start looking at the parallel session schedule for a better talk to go to. Keep it simple and to the point!

  6. Spend the time

    Sure, you could go to the web, grab someone else's talk, change a few fonts here and there, update a plot or two, and presto! You have yourself a new talk! Properly done and researched, a talk will typically take you a MINIMUM of a factor of 40 in time compared to the length of the delivered talk. A 30 minute talk takes at least 20 hours of solid prep time. Think about every slide, what you re going to say, and whether this is the best way to organize the material. Get hold of the original plots (you will need the eps to write the proceedings!), make the plots look nice, read the CDF notes, and design the talk yourself. No doubt you will oftentimes find yourself giving a talk which someone else from CDF gave recently. Naturally you will use the same plots, and probably a lot of the same text and ideas. Do your best, though, to make this your talk, and bring to it your unique perspective.

  7. Practice!

    Maybe it's clear to you what you mean, but is it clear to everyone else? The only way to know before delivering your talk is to practice what you want to say in front of yourself, and then in front of other people. You actually use different parts of your brain to speak and to listen, and you might be surprised to hear what you are actually saying. The only solution is to practice, practice, practice, with whoever you can get to listen. Listen to what your practice audience says, too, and make changes, because chances are they are right.

The last word

==================== This part not yet updated to CMSSW =========================================

Recovering data from CMS software black hole 2: containers and iterators

Hey,we have managed to run THREE sample programs correctly!! It's time to get professional about it, and start entering in the wonderful intricacies of C++ and Carf concerning containers(i.e. collections of objects) and iterators. First let's look at three snippets of code from the three examples:
  1. List of pile_up events
    // And now the event numbers of all pileup events
    cout << "--- PileUp events are: " ;
    G3EventProxy::pu_range PUrange = ev->pileups();
    for (G3EventProxy::pu_iterator ipu = PUrange.first; ipu != PUrange.second; ipu++) {
    PUeventsUsed++; // count them
    if (ipu != PUrange.first) {
    cout << ", ";
    cout << (*ipu).id().eventInRun();
    cout << endl;
  2. List of Calorimeter Towers
    RecItr<EcalPlusHcalTower> MyCaloTower(ev->recEvent());
    /* Print Run and Event number and fill them to our Ntuple */
    cout << "Run #" << ev->simTrigger()->id().runNumber() << "; ";
    cout << "Event #" << ev->simTrigger()->id().eventInRun() << endl;
    float Ecalo=0.0; // for the total calorimetric energy
    float Eecaltotal=0.0;
    float Ehcaltotal=0.0;
    HepPoint3D TowerPosition;
    /* Loop over all CaloCluster objects */
    while ( {
    Ecalo += MyCaloTower->Energy();             // sum up the total energy
    Eecaltotal+=MyCaloTower->EnergyEcalTower(); // sum up the Ecal energy
    Ehcaltotal+=MyCaloTower->EnergyHcalTower(); // sum up the Hcal energy
    TowerPosition = MyCaloTower->Position();
    /* Print energy, azimuth and pseudo-rapidity of a cluster and fill this
    * to our Ntuple */
    cout << "New Tower E(tot/Ecal/Hcal)=" << setw(8) << setprecision(3)
     << MyCaloTower->Energy() << "/"
     << MyCaloTower->EnergyEcalTower() << "/" 
     << MyCaloTower->EnergyHcalTower() << " GeV"
     << "; phi=" << setw(8) << setprecision(4) << TowerPosition.phi() << " rad"
     << "; eta=" << TowerPosition.pseudoRapidity() 
     << endl;
  3. List of Clusters
    RecItr<CaloCluster> MyCluster(ev->recEvent(),"EcalFixedWindow_5x5");
    // Just print the event number (see CARF/G3Event/interface/G3EventHeader.h)
    cout << "===========================================================" << endl;
    cout << "=== Private analysis of event #"<< ev->simTrigger()->id().eventInRun() 
    << " in run #" << ev->simTrigger()->id().runNumber() << endl;
    eventsAnalysed++; // some statistics: count events and runs processed
    if (ev->simTrigger()->id().runNumber() != lastrun) {
    lastrun = (unsigned int) ev->simTrigger()->id().runNumber();
    // Here is the loop over all clusters
    while ( {
    nClusters++; // count the clusters
    // Print some of the cluster properties 
    // see Calorimetry/CaloCommon/interface/CaloCluster.h
    cout << "Cluster " << nClusters <<": E=" << MyCluster->Energy()
     << ", eta="<< MyCluster->Eta()
     << ", phi="<< MyCluster->Phi() << endl;
    // Fill them to (our) histograms. Defined in ExClusterHistos.h
    UserHists->FillEepCluster(MyCluster->Energy(), MyCluster->Eta(), 
The three variables ipu, MyCaloTower and MyCluster are iterators. Iterators in oo programming are used whenever we want to examine a list or container of objects. Once you have defined an iterator for the container of objects that interests you (events, tracks, vertices, clusters, etc..) then looping through the objects becomes trivial:
while ({ }
in the last two cases. In the first case(list of pileup events) it is slightly more complex:
 for(iterator=firstitem;iterator!=lastitem;iterator++){ }
In any case the iterator inside the loop points to the current object and we can use it to get all information about the object.
From the code is apparent that ORCA has a general purpose iterator named RecItr that can be used for all kind of
RecObj objects. It is sufficient to write
RecItr<objectname> ip(ev->RecEvent())
and the variable ip will point to the first object of the type indicated for the event indicated.This is a reconstructed object and a Reconstruction on Demand similar to the Action on Demand of the previous section is performed in this case in the following way: if the requested RecObj is not present in the data base ,it is computed on the fly by the "default" module used for the object. In the case 3 you see that it is also possible to select a RecObj computed by a module indicated by us (EcalFixedWindow_5x5). In this way we can test new reconstruction algorithms.
What reconstructed objects are available for our analysis apart from CaloCluster and EcalPlusHcalTower? Easy, you have to look at the documentation for RecObj and there you will find a clickable map of objects connected to it (in jargon derived o inheriting objects). So CaloCluster is son of RecObj. But also these derived objects may have derived classes that are also RecObj,etc.. So EcalPlusHcalTower is son of CaloCluster. As such you have access both to quantities in CaloCluster like Energy and to quantities in EcalPlusHcalTower like EnergyHcalTower.
From the three examples is also apparent that an ActiveObserver class gets a pointer to the current G3EventProxy as an argument. This is the variable ev. Then ev->recEvent() points to the reconstructed event information with all its RecObj, instead ev->simTrigger() can be used to track informations concerning the simulated event like tracks,vertices and MCgenerator information. These are the containers ev->simTrigger()->tracks() vertices() and rawgenparts(). Instead ev->pu_range is a container of pileup_events for the current event.

A complete event, object by object.

You remember when we did an event dump to understand how various information was stored? At last, we know enough that we can try to get the equivalent with a oo data base, i.e. recovering all persistent objects describing a single event. For example, we can take the first event processed in the simple example analysis. This is an event of the dataset eg_1gam_pt25 with owner jetNoPU_CERN of the federation cmspf01::/cms/reconstruction/user/jet0900/jet0900.boot. This is event 1 of Run 37 (as you can see from the output listing). We will use ootoolmgr described previously to track all information about the event in the federation:i.e. all persistent objects describing the event.The list of all ORCA objects in the ORCA manual will be our guide: Looking in the database catalog we find 12 files connected to this dataset these are named
The other 9 have the same name but starting with EVD1 EVD2 EVD3.
These files,as you have to guess from Autumn 2000 production information, contain only the Digis objects created by running SimReader or ooDigi. To complete the description of the event we must add the files created by G3Reader or ooHits which have a different owner (jetHit120_2D_CERN) and have the following names:
also these repeated with EVD1,EVD2,EVD3 with a total of other 20 files.
This information is summarized in the production sheet with the following lines:
eg_1gam_pt25   jetHit120_2D_CERN   Collections   0 1 2 3 
eg_1gam_pt25   jetHit120_2D_CERN   Events   0 1 2 3 
eg_1gam_pt25   jetHit120_2D_CERN   Hits   0 1 2 3 
eg_1gam_pt25   jetHit120_2D_CERN   MCInfo   0 1 2 3 
eg_1gam_pt25   jetHit120_2D_CERN   THits   0 1 2 3 

eg_1gam_pt25   jetNoPU_CERN   Collections   0 1 2 3 
eg_1gam_pt25   jetNoPU_CERN   Digis   0 1 2 3 
eg_1gam_pt25   jetNoPU_CERN   Events   0 1 2 3 
So event 1 Run 37 must be tracked in these 32 Objectivity databases! The first group of files contains the objects connected to SimTrigger; the second those pointed from RecEvent.
Here starts the "dump" of event 37 1

Scrambled (Build)files

It is time to understand those intriguing little files. What follow are the Buildfiles used in the Orca User manual examples:
  1. List of Run and event number
    <lib name=Workspace><lib>
    <Group name=RecReader>
    <External ref=COBRA Use=CARF> 
    <bin file=ExRunEvent.cpp>my favourite application</bin>
    This is the smallest set that will work. Note that all external libraries are loaded using the BuildFile in COBRA/CARF.This BuildFile uses the tag RecReader to select the classes needed to read RecHits. SimHits are read by the libraries loaded with the tag SimReader.
    The lines starting with lib and bin say to Scram that it should create a shared library and an executable ExRunEvent. To find them just give the command
     where ExRunEvent 
    then search in the bin directory given and in the nearby lib directory.
    To find out which libraries has Scram loaded, you must look at the
    output that you get from the command scram build .
  2. List of Run and event number + getting Tracker layout information
    <lib name=Workspace><lib>
      <Use name=Tracker>
    <Group name=RecReader>
    <External ref=COBRA Use=CARF> 
    <bin file=ExRunEvent.cpp>my favourite application</bin>
    We have only added a card saying to Scram that the BuildFile of the subsystem Tracker must be used.
  3. List of towers
    <lib name=Tutorial>
      <lib name=EcalPlusHcalTower>
      <lib name=CaloCluster>
      <Group name=CaloRecHitReader>
      <Use name=Calorimetry>
      <Group name=RecReader>
    <External ref=COBRA Use=CARF>
    <External ref=root>
      <bin file=ExTutNtuple.cpp></bin>
    The main change here , is that we request the use of the BuilFile in the subsystem Calorimetry.This BuildFile will load many different sets of libraries and the set that we want is selected by the tag CaloRecHitReader. Note that in addition to these libraries, we request explicitily the libraries of subsystem Calorimetry EcalPlusHcalTower and CaloCluster.
    Finally, to create the n-tuple, the external system root is requested.
  4. List of clusters
    <External ref=cern>
    <External ref=cmsim>
    <External ref=HepODBMS>
    <External ref=Objectivity>
    <lib name=ExCalorimetry>
      <Group name=CaloHitReader>
      <Group name=CaloRHitWriter>
      <Group name=CaloRHitReader>
      <lib name=EcalFixedWindow>
      <lib name=CaloData>   
      <lib name=CaloCluster>
      <Use name=Calorimetry>
      <Group name=RecReader>
      <Use name=CARF>
      <Use name=Utilities>
      <bin file=ExClusterHistograms.cpp></bin>
      <Group name=CaloHitReader>
      <Group name=CaloRHitWriter>
      <Group name=CaloRHitReader>
      <lib name=EcalFixedWindow>
      <lib name=EcalDynamical>
      <lib name=CaloData>   
      <lib name=CaloCluster>
      <Use name=Calorimetry>
      <Group name=RecReader>
      <Use name=CARF>
      <Use name=Utilities>
      <bin file=ExCompClusterers.cpp></bin>
So, first of all we must specify in a Build file what we want to build with a tag <bin file=sourcefile></bin> .
To build an executable scram must use
external ref=productname libraries,include files and other stuff connected to the external product
lib name=libname libraries found in the SCRAM search path.
Use name=package the package indicated.In fact this means a reference to the Buildfile of the package.
Group name=groupname The group will set a switch that will control the loading of used packages. What really happens depends on the BuildFile of the package.
The tag environment is used to group other commands.The first and the last commands in a buildfile must be <environment> and </environment>. If these are missing, SCRAM will imply their presence. Any other environment tag is used to separate different "environments" like in the example 3 where we build two executables.
The interplay between the "Use" and the "Group" tag is difficult to understand unless you have a look at the BuildFile of the "used" package, for example CARF . What libraries the package will "export" to you depends on the "Groups" named G3NoMain, G3Reader, SimReader, RecReader. Note the tag "export" used to define the interface to the package.Understanding what are the groups to use for each package should (hopefully) be documented in the ORCA manual .
How you decide wich packages to use? For this you must look in your source and see where the include files come from. Unfortunately this isn't very easy so ,it is better to proceed with another example.

Another illuminating example

scram project ORCA ORCA_4_5_0
cd ORCA_4_5_0/src
eval `scram runtime -csh`
cmscvsroot ORCA
cvs co   Examples/CompGenRec
cd Examples/CompGenRec
scram b
setenv OO_FD_BOOT cmspf01::/cms/reconstruction/user/jet0900/jet0900.boot
setenv CARF_INPUT_DATASET_NAME eg_1gam_pt25
Let's look at the result: This example is interesting since the program will access generated muons, calorimeter clusters, tracker tracks and muon detector tracks through four methods(subroutines) called getGeneratorParticles, getCalorimeterClusters, getTrackerTracks, getMuonTracks. As you can see from the code the access to each kind of information is far for simple and the BuildFile itself is also complex. No wonder that we can't get a simple program filling some quantity in a histogram running:the navigation to access any single piece of information that before was as easy as counting 1,2,.. in a bank, now is a complete nightmare.

DetUnit: another piece of the puzzle

This is another important piece of the puzzle:this concerns the second layer of complexity i.e. the object model of CMS.Before introducing this new object let's see again the phases of reconstruction. We have 3 phases and each phase produces objects that can be in part persistent. Everything starts with SimHits stored in DB. Then we have Digits that must be equal to what we get with real data.Note that some Digits wan't be stored in DB since they will be immediately processed and transformed in RecHits stored in DB. Viceversa some RecHits will be only virtual objects, since will be computed on the fly when needed. At the end of the reconstruction we have the RecObj(tracks and clusters) stored in DB.
Now let's go back to DetUnit and the detector.Each module of the detector is represented in the software by a DetUnit object.Let's look first at a pictorial representation of the various objects used(the so called class diagram) .So we have thousands of DetUnits objects modelling the real detector.Every DetUnit has pointers to other objects which contain the geometrical information of the module(absolute position, orientation, etc...). From this point of view this is similar to what we had in the past. The novelty is that now you can access also event data through DetUnit.All information associated with a detector module like Digis, simulated (Geant) hits and reconstructed hits(cluster) are accessed through the corresponding DetUnit. In the case of simulated hits SimHits this happens through an object SimDet: i.e. DetUnit points to SimDet that has a method returning a vector of SimHit pointers. RecHits are instead created on the spot by the corresponding DetUnit.Digis are provided by DetUnit through another object named Readout. The Readout object can be viewed as a container of Digis corresponding to a single DetUnit. In the Orca output you see frequently the sentence:
creating a ROU Slave Factory  for:
followed by some acronym. This is connected to the mechanism seen before. The first two letters in the acronym indicate wich kind of subdetector ORCA is handling:

Raw data physical clustering

Let's think about Objectivity federation as a physical "container" composed from files("databases") segmented in containers. The problem of the people writing the CMS software was:
  1. Decide which objects should be stored and which not.
  2. If an object is not stored ensure that enough information is stored that we can recreate it from what we have in the database.
  3. Decide what happens if we recalculate the same objects:we update the previous objects or we create a copy (cloning in ooslang).
  4. Try to put toghether objects that are mostly processed toghether
  5. Decide what name to give to each database and container (not an easy thing when you have thousands of them)
  6. Decide where to store the single object.
  7. Last but not least, try to do this in such a way that can be implemented also with other databases(if CMS decides to change from Objectivity).
As you see, not an easy task. From the production page the main strategy used can be seen. In this page each number corresponds to a database: so this is a schematic representation of the federation. The databases are grouped in dataset. Each dataset is divided in two or more sections belonging to different owner. Note that a single owner encompasses many datasets. For each section you have some lines named "Events" and "Collections" which are repeated. This shows the strategy used of cloning events and collections of events(runs) instead of updating a single object. The first event object is created by ooHit, the others by running ooDigi with different parameters.The other databases (except for MCInfo) contain SimHits produced by Montecarlo and Digis(i.e. Digits and RecHits that should be equal to raw data objects). Examining these databases you can see that the names of containers refer to pieces of the detector. This is because it was decided that raw data belonging to the same sub-detector are clustered toghether.


Hey! I have become a CMS developer! I feel like a mediaeval knight after the investiture. How you become a developer? Only another knight (pardon developer) can give you the title by adding your name to the list of developers responsible for some module in the repository. After that you are able to commit your changes to the repository.
Now I feel a big responsibility: how can my brain damaged by decades of Fortran programming cope with the new generation of oo knights? The code in the previous sections gives an idea on how my C++ programming is Fortran-like. I have really to start studying these 3 documents like the bible: Reading these guidelines, it is clear that I have to give up using my beloved Fortran arrays and to start using STL containers and iterators.

Changes,changes and more changes!

The good news about the programmer's life is that you never get bored. The bad news is that everything keeps changing and CMS software is no exception. This picture gives a vivid idea on how releases follow releases and everything changes.
Let's start with a few new acronyms:

DDD - Detector Description Database is a database describing (guess it!) the detector on ascii files containing XML tags plus a C++ Api to access this description. Fortunately for us the final users can (probably) safely ignore all the technicalities.

LCG - LHC Computing Grid project. A group of people working (among other things) on the replacement of Objectivity with Root.

Root - A Cern software by the same people that gave us Paw and Zebra. Root is intended in fact as a OO replacement for these successfull packages. Root will provide also the object persistency replacing Objectivity. In principle COBRA will hide end-users like us from the complexities of Root. But ,judging from the Root home page it is also possible that,in the future, it will bring at last an easy interface Paw-like for those that don't know oop.

POOL - Pool Of persistent Objects for LHC is the software provided by LCG that will take care of persistency of objects(in slang the persistency framework) this is the name of the root replacement of Objectivity.

For what concerns this document the most important change was the release of Iguana 4 with the new plugin architecture. This means ,said in plain words, that the objects like TwigMuBarRpcSimHits that we used in the previous paragraphs don't anymore exist:i.e. catastrophe! We have to restart from scratch!

ORCA_7.0 brings all these new things and it is interesting to compare the list of used packages with the same list for the previous release.

An inspection of the CVS repository for Visualization/MuonVis shows that the objects used in the previous paragraph have been thrown in the Attic but they seems to be replaced by the following new objects:

But first things first, let's try to run Iguana! After many hours of browsing the code and the documentation , in the Release note of ORCA_6_3_0 I find the magic words:
To start the visualisation, type "iguana" and select "COBRA". To display an object, first select the object and then click on the visualisaton box next to it.
Obvious? Isn't it! So these are the steps to run the iguana plugin with ORCA_6_3_0 (this is the first release the plugin architecture was introduced in Orca)
  1. cd /afs/
  2. eval `scram runtime -csh`
  3. cd
  4. source testfed.csh
  5. iguana

Now we try to add a branch "Event/Muon/Barrel/RpcMyHits" by cloning the branch "Event/Muon/Barrel/RpcSimHits" The complete procedure is:

  1. cd your local area
  2. project ORCA
  3. scram project ORCA ORCA_6_3_0
  4. cd ORCA_6_3_0/src
  5. cvs co -r ORCA_6_3_0 Visualisation
  6. cd Visualisation
  7. scram build
  8. iguana (it should work like before but using the code compiled in our area)
  9. cp MuonVis/src/ MuonVis/src/
  10. cp MuonVis/interface/VisMuBarRpcSimHitsTwig.h MuonVis/interface/VisMuBarRpcMyHitsTwig.h
  11. modify MuonVis/src/
  12. modify MuonVis/interface/VisMuBarRpcMyHitsTwig.h
  13. modify MuonVis/interface/xtypeinfo.h
  14. modify MuonVis/src/
  15. modify MuonVis/src/
  16. scram build
  17. iguana (cross your finger!):result file
Hey , we are still in business! These are instead the steps to create a new branch in the Event subtree.
  1. modify CobraVisMain/src/ adding:
     m_document->addContentProxy ("COBRA/Event/CustomTracker");  
  2. Create a bunch of classes

Objectivity out, enters ROOT!

Let's enter in a brave new world : Root, Grid, RH7.3, LCG, etc ... Let's try ORCA_7_1_1 !
  1. cd /afs/
  2. eval `scram runtime -csh`
  3. cd
  4. cp orcarc .orcarc
  5. iguana
The result is the following error message:
Xlib:  extension "GLX" missing on display "lxplus080:23.0".
Inventor error in SoQtGLWidget::SoQtGLWidget(): OpenGL not available!
The brave new world has to wait...

Second try a few days later

  1. cd orca
  2. project ORCA
  3. scram project ORCA ORCA_7_2_0_pre13
  4. cd ORCA_7_2_0_pre13/src
  5. cvs co -r ORCA_7_2_0_pre13 Visualisation
  6. cd Visualisation
  7. setenv SCRAM_ARCH Linux__2.4/gcc3
  8. scram build
  9. cp ~/orcarc .orcarc
  10. eval `scram runtime -csh`
  11. iguana
The result is still:
Xlib:  extension "GLX" missing on display "lxplus010:15.0". 
But now I know that the problem is with my computer X server that lacks this Opengl extension.

Third try a month later, after I have installed XFree86 Version 4.1.0 and solved a problem of test data samples inaccessibility with the help of Werner Jank

  1. cd orca
  2. project ORCA
  3. scram project ORCA ORCA_7_2_1
  4. cd ORCA_7_2_1/src
  5. cvs co -r ORCA_7_2_1 Visualisation
  6. cd Visualisation
  7. scram build
  8. cp ~/orcarc .orcarc
  9. eval `scram runtime -csh`
  10. iguana
It works again!
Not obvious with all the changes done. The most important is that there is no Objectivity data base. Now you can get a list of the data files with a simple:
rfdir suncmsc:/data/valid/ORCA_7_2_0
Iguana (that now uses the Coin 3D implementation of OpenInventor) is again almost unusable. We are two years back with many visualizations no more working and continuous crashes.

Fourth try with the next version of Orca containing Iguana

  1. cd orca
  2. project ORCA
  3. scram project ORCA ORCA_7_3_0
  4. cd ORCA_7_3_0/src
  5. cvs co -r ORCA_7_3_0 Visualisation
  6. cd Visualisation
  7. scram build
  8. cp ~/orcarc .orcarc
  9. eval `scram runtime -csh`
  10. env LOG=stderr iguana
    note where iguana freezes and remove the .reg file with following command:
  11. rm /afs/
  12. env LOG=stderr iguana (now is OK) but the next time I run Iguana after changing something in the plugins, I must remove the plugin cache with following command:
  13. rm /afs/
Some not very obvious things to do.(This version has also a change in the name of plugin: this is no more named "COBRA" but "ORCA").

Running ORCA_7_4_0

Orca 7_4_0 doesn't have Iguana but we try to run a modified in Workspace in order to get out the tracker data.
  1. cd orca
  2. project ORCA
  3. scram project ORCA ORCA_7_4_0
  4. cd ORCA_7_4_0/src
  5. cvs co -r ORCA_7_4_0 Workspace
  6. cd Workspace
  7. scram b shared
  8. scram b bin
  9. eval `scram runtime -csh`
  10. cp ~/orcarc .orcarc
  11. ExRunEvent
Here the modified Buildfile, and two other files added in the same directory:,CuTkBuilderInORCA.h
Note in the .orcarc file that for the first time the informations about datasets are taken from this catalog in Xml.
This is the first ORCA release that uses POOL for persistency.In fact a look at the list of used packages reveals a host of new acronyms. These are LCG Applications shared among LHC experiments.

Running ORCA_7_5_0

We proceed as for ORCA_7_4_0
  1. cd orca
  2. project ORCA
  3. scram project ORCA ORCA_7_5_0
  4. cd ORCA_7_5_0/src
  5. cvs co -r Tutorial031114a Workspace
  6. cd Workspace
  7. scram b shared
  8. scram b bin
  9. eval `scram runtime -csh`
  10. cp ~/orcarc .orcarc
  11. ExRunEvent

Running IGUANACMS 1.1.0 with ORCA_7_5_0

The examinations never end (this is the title of a Eduardo De Filippo play)!
We have to switch from IGUANA to IGUANACMS.
New site, new manual.
All the code in ORCA/Visualisation is obsolete.
Let's start again from scratch!
  1. cd orca
  2. project IGUANACMS
  3. scram project IGUANACMS IGUANACMS_1_1_0
  4. cd IGUANACMS_1_1_0
  5. cvs co -d src -r IGUANACMS_1_1_0 IGUANACMS
  6. cd src
  7. scram build
  8. eval `scram runtime -csh`
  9. cd VisOrca/VisOrcaMain/test
  10. cp ~/orcarc .orcarc
  11. iguana --list
Now we try to build a new module "VisCustomTracker"
  1. cd IGUANACMS_1_1_0/src/VisOrca
  2. cp -r VisTracker VisCustomTracker
  3. modify VisOrcaMain/src/ adding the following line:
     m_document->addContentProxy ("ORCA/Event/CustomTracker");   
  4. in VisCustomTracker rename the following files:
    mv  src/  src/
    mv  src/  src/
    mv  src/  src/
    mv  interface/VisTkEventContent.h  interface/VisCuTkEventContent.h
    mv  interface/VisTkTwig.h  interface/VisCuTkTwig.h
    mv  interface/VisTkSimHitsTwig.h  interface/VisCuTkSimHitsTwig.h
  5. Now modify these 8 files: .src/ , src/ , src/ , src/ , interface/VisCuTkEventContent.h , interface/VisCuTkTwig.h , interface/VisCuTkSimHitsTwig.h , interface/xtypeinfo.h
  6. scram b
  7. cd ..
  8. iguana
This is the result

Running IGUANACMS 1.3.0 with ORCA_7_5_2

This new version of Iguanacms contains also the CustomTracker plugin.
  1. cd orca
  2. project IGUANACMS
  3. scram project IGUANACMS IGUANACMS_1_3_0
  4. cd IGUANACMS_1_3_0/src
  5. cvs co -r IGUANACMS_1_3_0 VisOrca
  6. cd VisCustomTracker
  7. scram build
  8. eval `scram runtime -csh`
  9. cd ../VisOrca/VisOrcaMain/test
  10. iguana

Running IGUANACMS 1.7.0 with ORCA_8_1_1 and COBRA_7_8_1

  1. cd orca
  2. project IGUANACMS
  3. scram project IGUANACMS IGUANACMS_1_7_0_pre1
  4. cd IGUANACMS_1_7_0_pre1/src
  5. cvs co VisOrca/VisCustomTracker
  6. cvs co VisOrca/VisOrcaMain
  7. cd VisOrca/VisCustomTracker
  8. scram build
  9. eval `scram runtime -csh`
  10. cd VisOrcaMain/test
  11. vi .orcarc (to add :ORCA/Event/CustomTracker)
  12. iguana

Running IGUANACMS 1.10.0 based on IGUANA_5_1_1 with ORCA_8_3_0,OSCAR 3.3.1 and COBRA_7_8_6

  1. cd orca
  2. project IGUANACMS
  3. scram project IGUANACMS IGUANACMS_1_10_0
  4. cd IGUANACMS_1_10_0/src
  5. cvs co VisOrca/VisCustomTracker
  6. cd VisOrca/
  7. scram build
  8. eval `scram runtime -csh`
  9. cp ~/orca/IGUANACMS_1_9_1/src/VisOrca/VisOrcaMain/test/.orcarc orcarc
  10. iguana -c orcarc

Explore your Federation(2)!

Objectivity is out, but we still have Federations.They are now implemented using LCG software.This means that you can have your federation on your computer but also on the Grid. Now we will do again this exercise of exploring our federation in order to discover what is changed.The good things are: Are you ready for the exploration? First you have to get the catalog name of the federation. For example : have a text copy here in case your browser doesn't see xml. This catalog describes the federation contained in /castor/ and composed by 341 files.
In order to use we have to:
  1. setenv PATH /afs/$PATH
  2. setenv SCRAM_ARCH rh73_gcc32
  3. setenv CVSROOT
  4. cvs login
  5. type password cvs
  6. mkdir ~/mypool
  7. cd mypool
  8. scram project POOL POOL_1_4_0
  9. cd ~/mypool/POOL_1_4_0/src
  10. cvs co -r POOL_1_4_0 Examples
  11. cd Examples/
  12. cd SimpleWriter
  13. scram b
  14. eval `scram runtime -csh`
  15. rehash
  16. SimpleWriter
  17. cd ../../
  18. cp Examples/SimpleWriter/pool.env .
  19. Modify pool.env replacing xmlcatalog_file:FileCatalog.xml with xmlcatalog_
  20. eval `scram runtime -csh pool.env`
Playing with this you discover that each file has a LFN Logical File Name and a PFN Physical File Name.There are also for each file 7 "metadata" attributes: jobid , dataset, DataType, FileCategory, runid, DBoid, owner. Note that each "file" corresponds to the Objectivity "database".

(Real) Developers see (design) patterns everywhere!

As I enter more in the intricacies of the CMS object model, I am starting to understand some strange slang real OO developers use when they refer to the objects implemented.For example Observer, Lazy Observer, Dispatcher, Builder, Proxy, Singleton, Adapter, Facade, Factory ... These are all Design Patterns: more or less programming recipes to solve well known problems that appear frequently in OO programming.We have already in this document used this pattern jargon when we spoke about Action on demand , containers and iterators.
The idea is good: don't reinvent the wheel and use some clever ways already tested. The only problem is that it is very difficult for a newbie to OO programming to realize that his problem can be solved by the Singleton pattern nicely provided by other CMS developers in this directory.
For a funny discussion about the strange way oo programmers solve problems, look at this discussion about Why I hate frameworks.

Running the CMS Helloworld on one hundred computers all around the world

[Added February 2005] This is getting like a blog, so from now on I'll add the date the item was written.
By now you should know what is the CMS "Helloworld": it is the famous ExRunEvent in module Workspace. This program doesn't do nothing useful but reading input events and giving a summary of how many they were.In principle you can use it as a template to write more interesting programs but this is not straightforward since you must know the CMS data model! Anyhow we would like to run this program on the Grid. But before doing this we review briefly how to run it on your desktop. To run it in local , you must have access to some datasets accessible with rfio. In .orcarc you specify the Pool catalogue for the dataset:
InputFileCatalogURL = @{xmlcatalog_}@
and the Input Collection
InputCollections = /System/hg_2x1033PU761_TkMu_g133_CMS/hg03_hzz_2e2mu_130a/hg03_hzz_2e2mu_130a

Now we run ExRunEvent on the Grid: The following steps are very well documented and are needed to be done only once.

Now you can give the command (always on the UI):
that will enable you to use the grid for 12 hours.
Now you can try all the commands described in the LCG-2 User Guide. For example executing some command on another Grid computer:
globus-job-run /bin/hostname
To run our Helloworld on the Grid we must write a file in a language called jdl (job description language) and specify which program we must run , the input files, the output files, etc.. and then send this file to the Grid. Not easy for a newby. So I'll use the tool CRAB that will do everything for you. First of all you download CRAB using CVS. Then you have to modify the CRAB file crab.cfg The program needs only the name "ExRunEvent" but you must have before done a
eval `scram runtime -csh`
in the correct directory (i.e. the program will use the environment variables set by SCRAM to get everything).
The program needs also a copy of ".orcarc" To create and send your first 2 jobs to the Grid now you write:
./ -bunch_creation 2 -bunch_submission 2
To check job status:
edg-job-status -i Jobs/log/submission_id.log

Let's start again from EDM!

Up to now changes in CMS software have been relatively small. The change from Objectivity to POOL federations has been gradual and in some way transparent to an end user like me , because I was always using COBRA the CMS framework. But now that's a BIG change! Let's rewrite the CMS framework. Forget COBRA, start using EDM! Well not really, since ,if we drop Cobra now, we have to stop everything. Let's say, for sometime (hopefully a few months, but who knows?) we have to provide two versions of each CMS application, one for Cobra, the other for EDM. Wow, that's something! It's interesting to know the reasons for such rewriting. I'll paste some citations: The change has also been marked by a new CMS software web page.
New acronyms:CMSSW contains the new software. Here you can find a nice DataFormats directory with a description of raw data. Raw data are produced by FEDs,packed in standard headers/trailers, then collected in the Builder Units (BU) and made available in the Filter Units(FE).You can think about FED,BU and FE as specialized online computers organized in clusters. The FED raw data is not convenient for direct use. Some processing is necessary before it can be used: the result of these operations is the Digi. At the level of the Digi it is not the FED that is recorded as source of the raw data but the module.A module is a detector unit described by a DetUnit and identified by a detector identifier DetUnitId. All these features are provided in the new framework through a Geometry service. Let's try it:

CMSSW_0_0_1_pre9 The result is a Ascii dump of tracker modules geometric quantities.
Note that the C++ code defines a SEAL plugin(?!?) and you run this plugin by using the file "runP.txt" where you define the environment in which the plugin should run.Geometry is now an EventSetup object (?!). No more singletons with observers in the new framework!DetUnit has been replaced by GeomDetUnit.


CMSSW_0_2_0 Visualisation Some documentation is available from here and from VisDocumentation/VisManual.

CMSSW_0_3_0_pre4 Visualisation(IGUANA_6_4_0)

At least: the CMS workbook is coming!

After many years of marching in the dark,a year before data taking starts, when the CMS software is slowly rewritten inside CMSSW, the light at the end of the tunnel! The CMS Workbook

From its Introduction:

My hope is that it will make this guide obsolete.


Page author Giuseppe Zito:
Last update: