CMSSW is an absolute good. CMSSW is life. All around its margins lies the gulf. |
Well, after an hour I haven't still solved my problem but I have found wikis,forums, manuals and documents about everything but my simple problem.Perhaps I will need more than 10 minutes! Anyhow there is a name that pops up almost everywhere CMSSW. So I try to understand what is this CMSSW.
CMSSW is a framework. Is used everywhere CMS software is needed. Implements a software bus model wherein there is one executable called cmsRun and many plug-in modules.
Ok you got it! No?
More or less having to do your histogram with a framework is like getting a jet when you
just want a bycicle to go home. Or getting a factory to build hammers when you just need
a simple hammer. The good news is that now you get all this code in one place
CMS software CVS
repository neatly packed in a two level hyerarchy . You can browse this huge amount of code
and also make searches by using a tool called LXR.
You can access it directly on /afs/cern.ch/cms/Releases/CMSSW/
.
The people that works at this code tells me how there was a few years ago a dark age when all
the code was split in different realms with strange names like Orca, Cobra, Oscar, Famos, Iguanacms.
Six degrees of complexity : recipe for the impatient user
Here is the CMSSW tutorial from the workbook.:
(on lxplus) cd /tmp mkdir $USER cd $USER scramv1 project CMSSW CMSSW_0_6_0 cd CMSSW_0_6_0/src eval `scramv1 runtime -csh` wget http://cern.ch/jmans/cms/Tutorial_0_5_0.tgz tar xfz Tutorial_0_5_0.tgz bash downloadData.sh scramv1 b cd Tutorial/Analysis1/test cmsRun tutorial.cfg (I got some error messages pointing to the names of two files in tutorial.cfg :the the problem is that I am using a CMSSW_0_5_0 tutorial with CMSSW_0_6_0; the format of instructions is of course changed. By comparing the tutorial.cfg file with an updated file VisDocumentation/VisTutorial/cmssw-reco.cfg I can get the corrections to do: file:/tmp/HTB_011609.root instead of HTB_011609.root FileInPath file = "CondFormats/HcalMapping/test/hbho_ring12_tb04.txt" instead of string file = "hbho_ring12_tb04.txt" end of comment)I got a ton of printed lines and a brand new "tutorial.root" file. I can start now "root" by just writing
root tutorial.root TBrowser b; .q (to exit)The command TBrowser opens a graphic window and I can browse the histograms in tutorial.root I am moved, almost crying. I was able to put toghether 7 modules of the framework in order to analyze some data and get some histograms. I have also requested the use of other 6 EventSetup modules indicated in the code with
es_module
:these are special modules that implement
resources or services available to normal modules. Data necessary to
configure a module are defined as module parameters.
All that by using the file tutorial.cfg . The only problem is that
I don't have the slightest idea which plug-in modules I have to use to create
my histogram and how my configuration file should be written.
I have downloaded some C++ code and compiled it: what was its use? Thanks to the code I downloaded I was able to do my first travel in the CMSSW jet. What I should know to pilot this jet myself? Or should I always depend on some ookid available nearby
to do it for me?
But ,anyway, this example is useful to understand exactly why CMS software is so complex. If you compare CMS experiment with previous experiments you have the following additional layers of complexity between you and the data:
Spell | Result of the spell | Things to be carefull about before you cast it |
Click on Cern Computer Resources | to have information on computers, disk space and other resources available in Cern | |
scramv1 list | List of all public projects and their releases | You get also the main directory of the release. By looking at .SCRAM/Linux__2.2/ after this directory, you have a list of all packages needed by the project. |
scramv1 project CMSSW CMSSW_0_6_0 | Create your own private project starting from the public release indicated. | Requires lot of disk space! |
scramv1 tool list | Lists all tools available in SCRAM | The command must be done in the directories of a private project |
scramv1 tool info toolname | Lists all information about a given tool | ditto |
eval `scramv1 runtime -csh` | Makes all the libraries known to SCRAM accessible | |
scramv1 runtime -csh | Prints the result of the previous command without executing it | use -sh if you use sh like shell |
scramv1 build | The equivalent of make for SCRAM.Executes the Buildfile contained in the directory where you are. | Every Buildfile uses other Build files (Command use) which use other Buildfiles,etc
The command builds an executable (Command bin) and/or libraries (command lib).
Give the option echo_INCLUDE to know which directories are searched for include. In a Buildfile put a first line with: INCLUDE+=path/dirto add an arbitrary directory to search for Include Files. |
scramv1 b distclean | To undo the effect of previous scram build in your local Release Area | |
scramv1 b CXXUSERFLAGS=-g | Compile with g flag for debugging | |
cmscvsroot projectname | Define CVS repository for the project so you can access the source | |
cvs --help-commands | List of CVS commands | |
cvs checkout Modulenameor cvs co Modulename | Get a local copy of the module source in your working directory that you can edit | Will get the head version i.e. the most recent version.This can also be a version not working properly. |
cvs co -r version Modulename | To checkout a particular(stable) version of the module | |
cvs diff | To know the differences between what is in the repository and what you have in your directory | |
cvs update -r version | To get your local copy in synch with the repository | This can be necessary when the head version is no more working and you want to go back to some stable previous version. The version tag should be normally of the form ORCA_4_5_0 |
cvs update -A | To reset tags | This can be necessary if you get the following message "cvs add: cannot add file on non-branch tag" |
cvs add Modulename | To add a new module to the repository | Only for developers! Must be followed by a cvs ci command
. The file Root in the CVS directory must contain the following line::kserver:cmscvs.cern.ch:/cvs_server/repositories/ORCA |
cvs ci -m "message" Modulename | To update a module in the repository | Only for developers! |
cvs remove Modulename | To remove a module from the repository | Only for developers! |
cvs tag -b release0 | To tag a (stable)version as release0 : a new branch is created | Only for developers! |
cvs status | To know the current status of your working area modules | |
cvs rtag -b -r release0 release0.1 Modulename | To connect branch release0.1 to branch release0 . This can be useful to deal with bugs corrections to a stable version. | Only for developers! |
cvs log Modulename | To list the versions of a module | Without Modulename will work recursively on the directory |
cvs log -N -d ">2002-9-1" | more | To list all revisions done after the specified date | |
cvs diff -r version Modulename | diff with a previous version of a module | |
klog.krb gzito -cell cern.ch cvs -d :kserver:cmscvs.cern.ch:/cvs_server/repositories/IGUANACMS commit -m "message" | to access repository from a node outside Cern |
The tutorial example will be useful to understand how this can be done.
Let's look at a code snippet from DemoAnalyzer1.cc
.
void DemoAnalyzer1::analyze(edm::Event const& e, edm::EventSetup const& iSetup) { // These declarations create handles to the types of records that you want // to retrieve from event "e". // edm::Handle<HBHERecHitCollection> hbhe_hits; edm::Handle<HORecHitCollection> ho_hits; edm::Handle<HcalTBTriggerData> triggerD; // Pass the handle to the method "getByType", which is used to retrieve // one and only one instance of the type in question out of event "e". If // zero or more than one instance exists in the event an exception is thrown. // e.getByType(hbhe_hits); e.getByType(ho_hits); e.getByType(triggerD);
analyze
receives a pointer e
to the
object edm::Event
which contains all event data.
e.getByType(handle to types of event data)
will retrieve
the data from the event and store them in a container in memory.
edm::Handle
.
getByType, getManyByType, getByLabel, getManyByLabel
.
The label corresponds to two names but the second name defaults to the null string.
The first name indicates the producer i.e. the module that produced the data. The second name
indicates the data label.
The data label can be discovered in an interactive way by using ROOT on the input file.
In the images produced by root you get a characteristic list of names of event data. Each name
is in the form dataType_producerName_dataName
. For example:
SiStripClusterCollection_ThreeThresholdClusterizer_ProcessOne.
Another way to discover interactively these names is to use the Iguana event display (see later) that produces for each event a similar list using the names "Friendly Name" for type,
"Module Label" for producer and "Instance Name" for data label (a little confused?)
.
getByType
uses the first name, getByLabel
the second and the third. Note that when you do a request, there can be many blocks of event data
which satisfy the request.
edm::Event
object. Each kind of event data has a "name"
and can be retrieved in a uniform way using that name. Event data is
stored in root files and can be inspected interactively using the program
ROOT. You can even do interactively some simple histogram on the
data. Hey this is something that also a Fortran damaged brain like mine
can understand!As this workbook page Different Ways to Make an Analysis explains, there are 3 different ways to do analysis. 1)with Bare ROOT 2) Framework-Lite (FWLite)mode 3) using the full CMSSW framework with an EDAnalyzer(i.e. you can go home 1)using a bycicle 2)a car 3)a jet ). It is really a big relief to know that I don't need to learn the full CMSSW framework to do my histogram! But before I start inspecting root files I have to remember that I must:
cd CMSSW_0_6_0/src eval `scramv1 runtime -csh`this is important in order to get the right version of the ROOT program. A simple find in the code repository :
find /afs/cern.ch/cms/Releases/CMSSW/CMSSW_0_6_0/src/ -name "*.root"will give me a list of root files used to test the code.The directory Configuration/Applications/data seems to contain a lot of those data files produced presumably by the configuration files in the directory.
By examining a root file in SimTracker/SiPixelDigitizer I discover the name and format of pixel tracker simhit data. Nice!
Other files to inspect can be found by looking VisDocumentation/VisTutorial/test/README.1st. By examining the events in dtdigis.root I discover the name and format of digis in Muon Detector.
Now a file containing 5 almost complete events.
So the proverbial good news is that data format can be easily inspected interactively by looking at events with root. The bad news is that data format changes with almost any new release! So how to cope with this awful situation? The files in Configuration/Applications/data will help us. Suppose that you are at release CMSSW_1_1_0_pre2 and want to check what is the format now: easily done!
cd CMSSW_1_1_0_pre2/src eval `scramv1 runtime -csh` cmscvsroot CMSSW cvs login cvs co -r CMSSW_1_1_0_pre2 Configuration/Applications/ cd Configuration/Applications/data/ cmsRun -p sim_rec_10muons_1-10GeV.cfgAfter this I get a root file with 3 events produced with this release that I can inspect.
Anyhow let's try to start with this Release Validation samples . After a few lines I was able to collect the following self explanatory acronyms :crab, dbs, phedex,lfn,pfn. I skip these and go direcly to what seems to be a list of list of files with these strange names:
/store/unmerged/RelVal/2006/10/1/RelVal102CJets50-120/GEN-SIM-DIGI-RECO/0006/76AC6FC5-F250-DB11-BBDA-000E0C3F0D36.root. I am informed that I can use directly this name in the configuration file in this way:
source = PoolSource { untracked vstring fileNames = {'/store/unmerged/RelVal/2006/7/24/RelVal081Higgs-ZZ-4Mu/GEN-SIM-DIGI-RECO/0006/44E28C85-8D1B-DB11-9EEC-000E0C3EFB43.root'} }I try this with my Iguana cfg file and it seems to work but only running on a Cern computer. So where are they ...?
Reading again the helpful page I learn that I am using the LFN
of the event dataset : L
seems to be for Logical , so it is a kind of generic name of the data set. This has to be confronted with the PFN
(where P
stays for Physical:I am getting smart at these matters) that says where the events are physically stored. The answer from the same page is (for CERN):
rfio:/castor/cern.ch/cms/LFNi.e. you must add the string
rfio:/castor/cern.ch/cms/
before the LFN
. So I am starting to understand.
All these petabytes of events require Cern to use a huge mass storage
called CASTOR where data
is accessible through a protocal called rfio.The files in Castor
can be manipulated by commands similar to those I use on my Linux box
but with an additional rf
before:
rfdir rfcp rfrmSo the command
rfdir /castor/cern.ch/cms/will show me the highest cms directory in Castor.
Now I have gigabytes of available space on my portable. Why not copy a file locally? Since I haven't access to castor from my portable I have to do it in two steps(on lxplus):
cd /tmp rfcp /castor/cern.ch/cms/store/unmerged/RelVal/2006/10/1/RelVal102BJets50-120/GEN-SIM-DIGI-RECO/0006/E4D2DBBA-F250-DB11-98C6-000E0C3F0935.root . scp E4D2DBBA-F250-DB11-98C6-000E0C3F0935.root zito@pcmennea.ba.infn.it:/data/After ten minutes the events are on my portable ready to be inspected with Iguana or root!
castor
I discover a /castor/cern.ch/cms/store/RelVal/2006/12/16/
directory with subdirectories that start with /RelVal120
: so these are CMSSW_1_2_0 events! I copy another file.
ls /pnfs/cmsfarm1.ba.infn.it/data/cms/phedex/LNFThe copy of dataset to local store can be done in this case using the command
dccp
instead of rfcp
.
In fact the whole story seems to be a lot more interesting. Datasets on the grid have a kind of generic name LNF that you can use in your configuration file . The framework will take care to get the copy of the dataset nearest to you. Which copy is used and by which protocol (rfio,dcache,etc) should be transparent to the user.
For example I got a post to hypernews which says:
Reco output will be registered with datasetpaths like /TAC-TIBTOB-120-DAQ-EDM/RECO/CMSSW_1_3_0_pre6-DIGI-RECO-Run-00007287-SliceTestTo know the complete LNF I go to this service or this service selecting "MCGlobal/Writer". Although the use of these two services may seem confusing and slow it isn't very difficult to get tons of LNF of dataset of reconstructed data like:
/store/data/2007/4/6/TAC-SliceTIBTOBTEC-120-DAQ-EDM-CMSSW_1_3_0_pre6-DIGI-RECO-Run-0007282/DIGI-RECO/0000/02058003-08E5-DB11-983F-000E0C3F0614.rootYou can use in your cfg file more LNF's separated by commas. Datasets are grouped in run (7282 in this case) and for each dataset you know the number of events(71). Then I try the LNF in my cfg file. If I got an error it could depend on many things:
So now the one million dollars question: where is this database? Looking at the cfg file I get lines like:
es_source = PoolDBESSource { VPSet toGet = {{ string record = "SiStripFedCablingRcd" string tag = "Si StripCabling_TIF_v1" } ,{ string record = "SiStripPedestalsRcd" string tag = "SiStripPedNoise_TIF_v1_p"} ,{ string record = "SiStripNoisesRcd" string tag = "SiStripPedNoise_TIF_v1_n"} } bool loadAll = true #WithFrontier untracked bool siteLocalConfig = true string connect = "oracle://orcon/CMS_COND_STRIP" untracked string catalog = "relationalcatalog_oracle://orcon/CMS_COND_GENERAL" string timetype = "runnumber" untracked uint32 messagelevel = 0 untracked bool loadBlobStreamer = true untracked uint32 authenticationMethod = 1 }or
es_source = PoolDBESSource { VPSet toGet = {{ string record = "SiStripFedCablingRcd" string tag = "SiStripCabling_TIF_v1" } ,{ string record = "SiStripPedestalsRcd" string tag = "SiStripPedNoise_TIF_v1_p"} ,{ string record = "SiStripNoisesRcd" string tag = "SiStripPedNoise_TIF_v1_n"} } untracked bool siteLocalConfig = true string connect = "frontier://cms_conditions_data/CMS_COND_STRIP" string timetype = "runnumber" PSet DBParameters ={ untracked string authenticationPath = "" untracked bool loadBlobStreamer = true } }It seems that the same data can be accessed in two ways: the oracle/orcon way and the "frontier" way. What ?!?
Let's try to understand: the Grid or better the LCG (LHC Computing Grid) is hyerarchy of one T0 central node, a few T1 regional nodes and then many T2 and T3 lesser nodes. These data should be supplied to all of them. Fortunately only T0 or Cern will update (for now) the database, all others nodes need only a read only copy of the database. So the "Database" is in Cern at the central node on a cluster of Oracle database server (Yes Oracle the famous database company). This central database can be accessed directly from Cern but also from T1 nodes where it is replicated (always by Oracle) authomatically.
But the same database can be accessed with another more
simple "mechanism" the "frontier way" that uses a web protocol with url
requests and xml answer (essentially you query the database sending an URL
to the frontier Web server and you get the answer as a XML file). This way to
access can be used of course from everywhere (we are on the Web!) but in order
to be efficient and fast is used by using squids (!?!). Not the gentle marine creatures but the local web server used to store locally (cache they call it) the response to queries. In this way they speed a lot access to central
DB. That's the end of our story populated by oracles, squids, frontiers and modules.
In CMS data taking there has been a very short golden age where everything seemed to be so simple.
There was only one Primary Dataset (PD) containing all collisision events. Out of this a few Secondary datasets
(SD) were built for special studies. Then we had the usual skims (CS:common skims) that are welcome to users
wanting to do a fast analysis on a sample of preselected events. There were a well known set of rules to
select good collisions out of the PD.These were implemented in a simple recipe that was used by any user
simply copying it in their programs without thinking too much about the its meaning.
But as the accellerator beams became more and more populated with protons and squeezed and the frequency
of collision increased, the single PD scheme was no more possible. We had to split the single PD to 8.
You would think that the way to do this is just to have 8 datasets which are created in parallel but
have the same format. So the end user has only to collect the complete list for a run and his/her program
would work as before.
Think again about it. The 8PD have each on a different event selection! Only their merge is almost equal
to the old unique PD but for a small detail that now you can have also repeated events! Here comes
the 8headed monster.To study good collisions you have now to analyze 8 different datasets, each one using
different rules! This is a perverse scheme , the complete opposite of a transparent change!
I am absolutely sure that the committee that decided this has made the best decision from the technical point of view.
The problem is how to make this change transparent to the final naive user. From this point of view the way event triggering
is done (the lines defining rules refer to trigger) is a mess. You have at least three different levels (or types?) of trigger:
L1(Level 1),HLT(High level trigger),TT(Technical trigger bits). Than you have the possibility to combine them in any
logical combination. You can (pre)scale any trigger:meaning that you take only a fraction of events which gives that trigger.
You have numbered trigger algo(rithms). Many triggers are in fact not hard coded in hardware , but computer programs
that can be changed any time. New triggers can be added. Just to give an idea : this is the code defining the trigger for a single run and this is the result in terms of event counts and percentages.
To conclude this topic I must mention here how you get rid of old installations when your computer get completely full. Unfortunately this isn't mentioned in the manual. If you try the "obvious":
Welcome to the new 8PD era of chaos: understanding the 8headed monster
In programming we have this wonderful way to introduce complexity in programs. We make it transparent to the final
user : meaning that the user has still the impression that it is using the same simple program.
Also, when I buy a new car, I expect it to have the simple mechanical gadgets that were present 50
years ago, although now a car hides a lot of intelligent chips inside.
Standalone Installation of CMSSW with apt: Post Tenebras Lux!
I had a lot of problems in the past installing CMS software on my computer.
The thing that really drove me crazy was that after doing a lot of work installing the latest release, you had to redo it again for the new release after a
few days. I have used the apt installer and was especially pleased from the
fact that I could install the newest release with four commands described
in the manual:
eval `$VO_CMS_SW_DIR/aptinstaller.sh config -path $VO_CMS_SW_DIR -csh`
apt-get update
apt-cache search cmssw
apt-get install cms+cmssw+CMSSW_0_8_0
I had misty eyes when I saw (after a few minutes) the "Done" printed and I
could immediately test the new release on my computer.
That's progress!
apt-get remove cms+cmssw+CMSSW_1_2_0_pre4
you get the not so obvious message:
Package cms+cmssw+CMSSW_1_2_0_pre4 is not installed, so not removed
.
So I am trying this quick fix:
scramv1 remove CMSSW CMSSW_1_2_0_pre4
rm -rf /cms/slc3_ia32_gcc323/cms/cmssw/CMSSW_1_2_0_pre4
As you see in this case, programmers always follow the golden rule of not making
something perfect to avoid the envy of the Gods!
Some benevolent colleague sends you a monstruos list of new tags necessary to test a new feature.Then you commit the tags one by one. The command:
showtagswill give you information about what new tags you have. Then you try "scramv1 b" hoping for the best. Normally this should work from the main "src" directory by the magic of "BuildFile". After a lot of time a bunch of new libraries , plugins and stand-alone programs are built in a cache "lib/slc4_ia32_gcc345/".
If you don't get any error, then you can try your (updated) cfg file in the appropriate directory. Unfortunately you start getting strange messages of missing plugins or "pure virtual method called". These are almost always due to the fact that the complex machine hasn't worked perfectly and now you are trying to use toghether new and old code.
At this point you start becoming desperate and you begin using strange commands like:
scramv1 b -rthat seems to work like when you bang an hammer on a malfunctioning machine. Cross you fingers and try again "scramv1 b" perhaps from another directory.
I would like to test how successfull is this tool. I have the following problem:the apt installer stops with the following error message:
E: Sub-process /home/zito/cms//bin/rpm-wrapper returned an error code (79)I do a search of words on this sentence on HyperNews and discover that the Forum Software Distribution Tools deals with problems like this. But there is no post that seems to answer my problem. Apparently no one has had before this error code (79). So I make a post.After two hours I get the answer. Not so bad!
Access interactively from lxplus with root a file known with its LFN (stored on the grid at Cern) | root rfio:/castor/cern.ch/cms/<LFN>
| ||||||||||||||||||||||||||||||||||||||||||
Compile a package in such a way that you can inspect the code during execution with gdb | You have to download the package from the repository , add a line in a BuildFile and recompile it for example:cvs co -r CMSSW_2_0_7 VisReco/VisCustomTracker cd VisReco/VisCustomTracker vi BuildFile (add after use commands, following line ) <flags CXXFLAGS="-O0 -g3 -fno-inline"> scramv1 b | ||||||||||||||||||||||||||||||||||||||||||
Copy or select a few events from an input dataset to an output dataset | Use the following configuration file:process copyfile = { untracked PSet maxEvents = {untracked int32 input = 3} source = PoolSource { untracked vstring fileNames = {'file:/tmp/30525A21-BD8E-DC11-A52A-000423D9939C.root'} } module output = PoolOutputModule { untracked string fileName = "Simevents.root" } endpath copydata = {output} } |
myconf.cfg
cmsRun -p myconf.cfg
but don't expect it to work for more than a few days. Than you have to restart again with the ordeal or follow a new tutorial.
Unfortunately integrating CMSSW in python I understand that shouldn't be so
easy. All .cfi
files now existing in data
directories (configuration file fragments built to be included in other cfg files) must be duplicated as _cfi.py
files in python
directories.
Also there should exist a kind of dictionary that should describe to python the
actual implementation of each CMSSW C++ class. Once this is done you should be
able to start playing with python.
In the end a python cfg file looks very similar to the original and there is
no hint that we have gained something from the change. Anyhow a python cfg file is used exactly in the same way as a normal cfg file, i.e: cmsRun cfgfile_cfg.py
.
If you want to use interactively python to analyze a root file than you have to give the following commands:
cvs co PhysicsTools/PythonAnalysis/ setenv PYTHONPATH $CMSSW_BASE/src/PhysicsTools/PythonAnalysis/python:$PYTHONPATH cd PhysicsTools/PythonAnalysis/examples (copy in this directory your root file: in this case TTbar.root) python b = TBrowser() (the root browser opens: this is important to know what collections and informations for each collection we have) events = EventTree("TTbar.root") for event in events: tracks = event.generalTracks print len(tracks) for track in tracks: print track.pt() (return) CTRL_D (to exit from python)I am delighted. I was able width a few lines to loop on all events knowing how many
generalTracks
are for each event, and then to loop on tracks printing their pt
. The only problem I had was to understand exactly what string to put in tracks=event.generalTracks
to get generalTracks
. The examples present in the directory ,although useful, didn't make the matter clear. According to the documentation you should either give a small "alias" or a very complex complete "branch name".
At this point, miracles of python, I discovered this little recipe to get all
"aliases"and branch names in event.
for alias in events.getListOfAliases(): alias.Print()As I said I love Python and its programmers. They always care that everything is simple.
Just to see where the problem is just look at this picture. The dependencies are the black arrows.
The CMSSW is now at least a single headed monster and each day hundreds of changes are done to the code. Then when is night in Europa the mithical nightly build begins. The CMS people try to see if the tiger is still tamed by compiling everything. After a few hours comes the verdict. Often everything is ok and it seems like a miracle: the tiger is still in the cage. But every few weeks a disaster happens and all packages get compilation or link errors. The following morning panic spreads in the collaboration as individuals read the nightly response. The tiger is out. Frenetic work starts to get it tamed again.
It is easy to visualize CMS detector and data provided that ... You have five experts around!
CMSSW is a big success but it isn't for the faint of heart. But if the language used in the cfg files is difficult and the way it works isn't easily understood, you must consider that it has to deal with a very,very complex environment.
Hand-written notes surrounding plots xeroxed onto plastic, handled until the finger grease smudges are darker than the shading of the histograms...yecchhh! Things have come a long way. Electronically produced presentations using LaTeX, Powerpoint, and other programs, projected with an LCD projector, are much more accessible to the audience, easier to prepare, transmit, and modify, and won't end up sliding off the table into a random pile on the floor. Programs like Powerpoint allow you to put in lots of fancy animation, which can be used t good effect, but which can also be annoying if used for something other than to enhance the impact of the presentation. For example, don't make text/plots appear or fly in unless there is a good reason, such as a new thing flying in on top of an old one. Fonts are fun, but can also be a source of annoyance. You really, really don't want to annoy your audience, either. Ditto for colors, and boxes, and clip art, and anything else your office mate thinks is annoying. (I was once at a talk comparing various SUSY models, and the speaker put a photo of a supermodel in swim suits on every slide, with little word bubbles coming out of their heads...I think this guy left the field.) There are indeed many tools to learn. Nearly all our plots, for example, start out as postscript, but end up in some other form which can be imported into a graphics program. In the process, these can be rendered illegible or worse. Make sure you understand the graphics format you are using, and the program you are using to convert from one format to another. Spend a little time learning the finer points of programs like Adobe Illustrator (an indispensible tool, by the way - get your boss to buy it) and you can make compact, high-resolution, transportable plot images.
Now we come to one of the biggest problems in our field: making decent plots. Consider the following two plots. Which one do you want in your talk?
On one end of the bad talk spectrum is the kind that is all text: block after block of paragraphs, read to you by the speaker. Then the speaker will switch to a page with just a plot on it, with no explanatory text. Then it's back to the block paragraphs. While this represents an extreme, it does actually happen and it's not pretty. Very often, though, one en- counters something nearly as bad: a talk where the plots are reduced to near-icon size, lots of big, clear, full sentences describing them on the page. Don't do that. Your plots should take up the vast majority of the area of your slides. If it isn't about 2/3 of the area, cut down the text and make the plots larger. In cutting down the text, pare it to the bone, then boil the bones. Get rid of articles, verbs, adjectives...anything that is not the essence of the information of what you are trying to say. Really, you don't need or want all those words! Also, do you really need the experiment logo and that of your institution on every page of your talk? It takes up a lot of area... Think about sitting in a talk . You want to see the plots, understand what they are saying, and get the message from the speaker, who should be adding what's not on the slide itself. If it's already written on the slide, you stop listening to the speaker!
As noted above, you love your analysis. You have lived it for the past fourteen weeks, through sleepless nights, hours, and meetings. You know every detail, and think it's pretty cool. Clearly everyone wants to know all the details, right? What your audience wants to know is why you did it, a bit about how you did it, and what the result was and what happens next. (And whether it's the best anyone ever did!) Keep it to that, and you will have a winning talk. Add in a bunch of boring details and they'll start looking at the parallel session schedule for a better talk to go to. Keep it simple and to the point!
Sure, you could go to the web, grab someone else's talk, change a few fonts here and there, update a plot or two, and presto! You have yourself a new talk! Properly done and researched, a talk will typically take you a MINIMUM of a factor of 40 in time compared to the length of the delivered talk. A 30 minute talk takes at least 20 hours of solid prep time. Think about every slide, what you re going to say, and whether this is the best way to organize the material. Get hold of the original plots (you will need the eps to write the proceedings!), make the plots look nice, read the CDF notes, and design the talk yourself. No doubt you will oftentimes find yourself giving a talk which someone else from CDF gave recently. Naturally you will use the same plots, and probably a lot of the same text and ideas. Do your best, though, to make this your talk, and bring to it your unique perspective.
Maybe it's clear to you what you mean, but is it clear to everyone else? The only way to know before delivering your talk is to practice what you want to say in front of yourself, and then in front of other people. You actually use different parts of your brain to speak and to listen, and you might be surprised to hear what you are actually saying. The only solution is to practice, practice, practice, with whoever you can get to listen. Listen to what your practice audience says, too, and make changes, because chances are they are right.
The prototype mentioned below is tagged for tonight's nightly. The mutex for the lock is in FWCore/Utilites and the service that uses it is in FWCore/Services. The service can be used with the PoolSource to synchronize the threads. To do that all you need to do is add: service = LockService{} to your cfg file (and of course make sure that DQMServices are locking on the same mutex).
during CRUZET4 processing we had troubles again because of ESProducts taken in beginJob. ESProducts can be IOV dependent but beginJob() has no knowledge of the "current run" (so you get garbage and crashes). Please check your code and remove access to ESProducts in beginJob unless you are 100% sure that it is safe (if you have any doubt please ask). Tags are expected as soon as possible (tracker and egamma already provided some fixes) You can simply move the "get" of the ESHandle into the beginRun (if you are sue your IOV is only run based and not time based) or directly in the "produce" method of you producer (if you need to do time expensive operation on the retrieved object you can check the cacheIdentifier before retrieving or write an additional ES producer.
==================== This part not yet updated to CMSSW =========================================
// And now the event numbers of all pileup events
cout << "--- PileUp events are: " ;
G3EventProxy::pu_range PUrange = ev->pileups();
for (G3EventProxy::pu_iterator ipu = PUrange.first; ipu != PUrange.second; ipu++) {
PUeventsUsed++; // count them
if (ipu != PUrange.first) {
cout << ", ";
}
cout << (*ipu).id().eventInRun();
}
cout << endl;
RecItr<EcalPlusHcalTower> MyCaloTower(ev->recEvent());
/* Print Run and Event number and fill them to our Ntuple */
cout << "Run #" << ev->simTrigger()->id().runNumber() << "; ";
cout << "Event #" << ev->simTrigger()->id().eventInRun() << endl;
UserNtuples->FillGeneral(ev->simTrigger()->id().runNumber(),ev->simTrigger()->id().eventInRun());
float Ecalo=0.0; // for the total calorimetric energy
float Eecaltotal=0.0;
float Ehcaltotal=0.0;
HepPoint3D TowerPosition;
/* Loop over all CaloCluster objects */
while (MyCaloTower.next()) {
Ecalo += MyCaloTower->Energy(); // sum up the total energy
Eecaltotal+=MyCaloTower->EnergyEcalTower(); // sum up the Ecal energy
Ehcaltotal+=MyCaloTower->EnergyHcalTower(); // sum up the Hcal energy
TowerPosition = MyCaloTower->Position();
/* Print energy, azimuth and pseudo-rapidity of a cluster and fill this
* to our Ntuple */
cout.setf(ios::showpoint);
cout << "New Tower E(tot/Ecal/Hcal)=" << setw(8) << setprecision(3)
<< MyCaloTower->Energy() << "/"
<< MyCaloTower->EnergyEcalTower() << "/"
<< MyCaloTower->EnergyHcalTower() << " GeV"
<< "; phi=" << setw(8) << setprecision(4) << TowerPosition.phi() << " rad"
<< "; eta=" << TowerPosition.pseudoRapidity()
<< endl;
UserNtuples->AddTower(MyCaloTower->Energy(),TowerPosition.phi(),
TowerPosition.pseudoRapidity());
}
RecItr<CaloCluster> MyCluster(ev->recEvent(),"EcalFixedWindow_5x5");
// Just print the event number (see CARF/G3Event/interface/G3EventHeader.h)
cout << "===========================================================" << endl;
cout << "=== Private analysis of event #"<< ev->simTrigger()->id().eventInRun()
<< " in run #" << ev->simTrigger()->id().runNumber() << endl;
eventsAnalysed++; // some statistics: count events and runs processed
if (ev->simTrigger()->id().runNumber() != lastrun) {
lastrun = (unsigned int) ev->simTrigger()->id().runNumber();
runsAnalysed++;
}
// Here is the loop over all clusters
while (MyCluster.next()) {
nClusters++; // count the clusters
// Print some of the cluster properties
// see Calorimetry/CaloCommon/interface/CaloCluster.h
cout << "Cluster " << nClusters <<": E=" << MyCluster->Energy()
<< ", eta="<< MyCluster->Eta()
<< ", phi="<< MyCluster->Phi() << endl;
// Fill them to (our) histograms. Defined in ExClusterHistos.h
UserHists->FillEepCluster(MyCluster->Energy(), MyCluster->Eta(),
MyCluster->Phi());
}
while (iterator.next()){ }in the last two cases. In the first case(list of pileup events) it is slightly more complex:
for(iterator=firstitem;iterator!=lastitem;iterator++){ }In any case the iterator inside the loop points to the current object and we can use it to get all information about the object.
RecItr<objectname> ip(ev->RecEvent())and the variable ip will point to the first object of the type indicated for the event indicated.This is a reconstructed object and a Reconstruction on Demand similar to the Action on Demand of the previous section is performed in this case in the following way: if the requested RecObj is not present in the data base ,it is computed on the fly by the "default" module used for the object. In the case 3 you see that it is also possible to select a RecObj computed by a module indicated by us (EcalFixedWindow_5x5). In this way we can test new reconstruction algorithms.
EVD0_Events.eg_1gam_pt25.jetNoPU_CERN EVD0_Digis.eg_1gam_pt25.jetNoPU_CERN EVD0_Collections.eg_1gam_pt25.jetNoPU_CERNThe other 9 have the same name but starting with EVD1 EVD2 EVD3.
EVD0_Collections.eg_1gam_pt25.jetHit120_2D_CERN EVD0_Events.eg_1gam_pt25.jetHit120_2D_CERN EVD0_Hits.eg_1gam_pt25.jetHit120_2D_CERN EVD0_MCInfo.eg_1gam_pt25.jetHit120_2D_CERN EVD0_THits.eg_1gam_pt25.jetHit120_2D_CERNalso these repeated with EVD1,EVD2,EVD3 with a total of other 20 files.
eg_1gam_pt25 jetHit120_2D_CERN Collections 0 1 2 3 eg_1gam_pt25 jetHit120_2D_CERN Events 0 1 2 3 eg_1gam_pt25 jetHit120_2D_CERN Hits 0 1 2 3 eg_1gam_pt25 jetHit120_2D_CERN MCInfo 0 1 2 3 eg_1gam_pt25 jetHit120_2D_CERN THits 0 1 2 3 eg_1gam_pt25 jetNoPU_CERN Collections 0 1 2 3 eg_1gam_pt25 jetNoPU_CERN Digis 0 1 2 3 eg_1gam_pt25 jetNoPU_CERN Events 0 1 2 3So event 1 Run 37 must be tracked in these 32 Objectivity databases! The first group of files contains the objects connected to SimTrigger; the second those pointed from RecEvent.
<environment> <lib name=Workspace><lib> <Group name=RecReader> <External ref=COBRA Use=CARF> <bin file=ExRunEvent.cpp>my favourite application</bin> </environment>This is the smallest set that will work. Note that all external libraries are loaded using the BuildFile in COBRA/CARF.This BuildFile uses the tag RecReader to select the classes needed to read RecHits. SimHits are read by the libraries loaded with the tag SimReader.
where ExRunEventthen search in the
bin
directory given and in the nearby lib
directory.
<environment> <lib name=Workspace><lib> <Use name=Tracker> <Group name=RecReader> <External ref=COBRA Use=CARF> <bin file=ExRunEvent.cpp>my favourite application</bin> </environment>We have only added a card saying to Scram that the BuildFile of the subsystem Tracker must be used.
<environment> <lib name=Tutorial> <lib name=EcalPlusHcalTower> <lib name=CaloCluster> <Group name=CaloRecHitReader> <Use name=Calorimetry> <Group name=RecReader> <External ref=COBRA Use=CARF> <External ref=root> <bin file=ExTutNtuple.cpp></bin> </environment>The main change here , is that we request the use of the BuilFile in the subsystem Calorimetry.This BuildFile will load many different sets of libraries and the set that we want is selected by the tag CaloRecHitReader. Note that in addition to these libraries, we request explicitily the libraries of subsystem Calorimetry EcalPlusHcalTower and CaloCluster.
<External ref=cern> <External ref=cmsim> <External ref=HepODBMS> <External ref=Objectivity> <lib name=ExCalorimetry> <environment> <Group name=CaloHitReader> <Group name=CaloRHitWriter> <Group name=CaloRHitReader> <lib name=EcalFixedWindow> <lib name=CaloData> <lib name=CaloCluster> <Use name=Calorimetry> <Group name=RecReader> <Use name=CARF> <Use name=Utilities> <bin file=ExClusterHistograms.cpp></bin> </Use> </Use> </Use> </lib> </environment> <environment> <Group name=CaloHitReader> <Group name=CaloRHitWriter> <Group name=CaloRHitReader> <lib name=EcalFixedWindow> <lib name=EcalDynamical> <lib name=CaloData> <lib name=CaloCluster> <Use name=Calorimetry> <Group name=RecReader> <Use name=CARF> <Use name=Utilities> <bin file=ExCompClusterers.cpp></bin> </Use> </Use> </Use> </lib> </environment> </External> </External> </External> </External>
external ref=productname | libraries,include files and other stuff connected to the external product |
lib name=libname | libraries found in the SCRAM search path. |
Use name=package | the package indicated.In fact this means a reference to the Buildfile of the package. |
Group name=groupname | The group will set a switch that will control the loading of used packages. What really happens depends on the BuildFile of the package. |
scram project ORCA ORCA_4_5_0 cd ORCA_4_5_0/src eval `scram runtime -csh` cmscvsroot ORCA cvs co Examples/CompGenRec cd Examples/CompGenRec scram b setenv OO_FD_BOOT cmspf01::/cms/reconstruction/user/jet0900/jet0900.boot setenv CARF_INPUT_OWNER jetNoPU_CERN setenv CARF_INPUT_DATASET_NAME eg_1gam_pt25 rehash ExCompMuonLet's look at the result: This example is interesting since the program will access generated muons, calorimeter clusters, tracker tracks and muon detector tracks through four methods(subroutines) called getGeneratorParticles, getCalorimeterClusters, getTrackerTracks, getMuonTracks. As you can see from the code the access to each kind of information is far for simple and the BuildFile itself is also complex. No wonder that we can't get a simple program filling some quantity in a histogram running:the navigation to access any single piece of information that before was as easy as counting 1,2,.. in a bank, now is a complete nightmare.
creating a ROU Slave Factory for:followed by some acronym. This is connected to the mechanism seen before. The first two letters in the acronym indicate wich kind of subdetector ORCA is handling:
DDD - Detector Description Database is a database describing (guess it!) the detector on ascii files containing XML tags plus a C++ Api to access this description. Fortunately for us the final users can (probably) safely ignore all the technicalities.
LCG - LHC Computing Grid project. A group of people working (among other things) on the replacement of Objectivity with Root.
Root - A Cern software by the same people that gave us Paw and Zebra. Root is intended in fact as a OO replacement for these successfull packages. Root will provide also the object persistency replacing Objectivity. In principle COBRA will hide end-users like us from the complexities of Root. But ,judging from the Root home page it is also possible that,in the future, it will bring at last an easy interface Paw-like for those that don't know oop.
POOL - Pool Of persistent Objects for LHC is the software provided by LCG that will take care of persistency of objects(in slang the persistency framework) this is the name of the root replacement of Objectivity.
For what concerns this document the most important change was the release of Iguana 4 with the new plugin architecture. This means ,said in plain words, that the objects like TwigMuBarRpcSimHits that we used in the previous paragraphs don't anymore exist:i.e. catastrophe! We have to restart from scratch!
ORCA_7.0 brings all these new things and it is interesting to compare the list of used packages with the same list for the previous release.
An inspection of the CVS repository for Visualization/MuonVis shows that the objects used in the previous paragraph have been thrown in the Attic but they seems to be replaced by the following new objects:
To start the visualisation, type "iguana" and select "COBRA". To display an object, first select the object and then click on the visualisaton box next to it.Obvious? Isn't it! So these are the steps to run the iguana plugin with ORCA_6_3_0 (this is the first release the plugin architecture was introduced in Orca)
Now we try to add a branch "Event/Muon/Barrel/RpcMyHits" by cloning the branch "Event/Muon/Barrel/RpcSimHits" The complete procedure is:
m_document->addContentProxy ("COBRA/Event/CustomTracker");
Xlib: extension "GLX" missing on display "lxplus080:23.0". Inventor error in SoQtGLWidget::SoQtGLWidget(): OpenGL not available!The brave new world has to wait...
Xlib: extension "GLX" missing on display "lxplus010:15.0".But now I know that the problem is with my computer X server that lacks this Opengl extension.
rfdir suncmsc:/data/valid/ORCA_7_2_0Iguana (that now uses the Coin 3D implementation of OpenInventor) is again almost unusable. We are two years back with many visualizations no more working and continuous crashes.
m_document->addContentProxy ("ORCA/Event/CustomTracker");
mv src/VisTkEventContent.cc src/VisCuTkEventContent.cc mv src/VisTkTwig.cc src/VisCuTkTwig.cc mv src/VisTkSimHitsTwig.cc src/VisCuTkSimHitsTwig.cc mv interface/VisTkEventContent.h interface/VisCuTkEventContent.h mv interface/VisTkTwig.h interface/VisCuTkTwig.h mv interface/VisTkSimHitsTwig.h interface/VisCuTkSimHitsTwig.h
ootoolmgr
.It allows you to explore only the catalog but not the data(the single objects in the container).For these we have to use
CMS sample programs.
cvs
ExRunEvent
in module Workspace
.
This program doesn't do nothing useful but reading input events and giving
a summary of how many they were.In principle you can use it as a template
to write more interesting programs but this is not straightforward
since you must know the CMS data model! Anyhow we would like to run this
program on the Grid. But before doing this we review briefly how to run it on your
desktop.
tcsh setenv VO_CMS_SW_DIR /opt/exp_software/cms source /opt/exp_software/cms/cmsset_default.csh
InputFileCatalogURL = @{xmlcatalog_http://webcms.ba.infn.it/cms-software/orca/hg03_hzz_2e2mu_130a_rfio.xml}@and the Input Collection
InputCollections = /System/hg_2x1033PU761_TkMu_g133_CMS/hg03_hzz_2e2mu_130a/hg03_hzz_2e2mu_130a
Now we run ExRunEvent
on the Grid:
The following steps are very well documented and are needed to be done only
once.
.globus
directory
mkdir .globus openssl pkcs12 -nocerts -in cert.p12 -out ~/.globus/userkey.pem openssl pkcs12 -clcerts -nokeys -in cert.p12 -out ~/.globus/usercert.pem chmod 0400 ~/.globus/userkey.pem
grid-proxy-initthat will enable you to use the grid for 12 hours.
globus-job-run gridba2.ba.infn.it /bin/hostnameTo run our Helloworld on the Grid we must write a file in a language called jdl (job description language) and specify which program we must run , the input files, the output files, etc.. and then send this file to the Grid. Not easy for a newby. So I'll use the tool CRAB that will do everything for you. First of all you download CRAB using CVS.
crab.cfg
eval `scram runtime -csh`in the correct directory (i.e. the program will use the environment variables set by SCRAM to get everything).
./crab.py -bunch_creation 2 -bunch_submission 2To check job status:
edg-job-status -i Jobs/log/submission_id.log
cvs co -r CMSSW_0_0_1_pre9 Geometry/CommonDetUnit cvs co -r CMSSW_0_0_1_pre9 Geometry/TrackerSimAlgo
cvs co -r CMSSW_0_2_0 Geometry/CommonDetUnit cvs co -r CMSSW_0_2_0 Geometry/TrackerSimAlgo
process GeometryTest = { # empty input service, fire 10 events source = EmptySource {untracked int32 maxEvents = 2} es_source = XMLIdealGeometryESSource {string GeometryConfiguration="testConfiguration.xml"} es_module = TrackerGeometricDetESModule {} es_module = TrackerDigiGeometryESModule {} module print = AsciiOutputModule {} # module prod = TrackerDigiGeometryAnalyzer {} module prod = TrackerDigiGeometryAnalyzer {} #provide a scheduler path path p1 = {prod} }
cvs co VisFramework/VisFrameworkBase cvs co -r CMSSW_0_3_0_pre4 Geometry/MuonCommonData cvs co -r CMSSW_0_3_0_pre4 Geometry/CSCSimAlgo cvs co -r CMSSW_0_3_0_pre4 Geometry/CMSCommonData cvs co -r CMSSW_0_3_0_pre4 DetectorDescription/Parser
From its Introduction: