In CMS, the SM processes are generated and simulated centrally, and we don't have to worry about them. But for a specific BSM search, a physicist has to take care of the BSM samples themselves, if they are not being handled already. For this section, I am using vector-like leptons as a reference BSM signal arXiv:1510.03556. First, we need to integrate this new physics model to the existing standard model in an event generator like MadGraph. Typically, theorists share their BSM model in the Universal FeynRules Output (UFO) format, which contain automatized matrix-element generators. These files contain python modules which can be easily included as an extension to MadGraph. This allows us to play with the new particles and the Feynman rules associated with them. I have taken the latest VLL UFO file from the repository: github.com/prudhvibhattiprolu.
This is the order of installation for this setup.
Here are some important packages required before installation. I am mentioning the versions that I have in my setup, but any recent versions should also work fine. I recommend using conda to install everything. It's important that ROOT is build with the same Python version that is used here. For detailed instructions on how to handle Python environments and install ROOT properly, visit here.
Packages | Versions |
---|---|
python | 3.10.9 |
cmake | 3.22.1 |
git | 2.34.1 |
g++, gcc | 11.4.0 |
ROOT | 6.26/10 |
I would also recommend to keep all of these MC-generation tools inside the same directory. For me, the working directory is
/home/phazarik/mcgeneration
.
MadGraph can be downloaded from the official website. Building is not needed. The
binary file is available at mg5amcnlo/bin/mg5_aMC
. I have given the full path to MadGraph in my .bashrc
file as
follows.
alias mg5="/home/phazarik/mcgeneration/mg5amcnlo/bin/mg5_aMC"
If you are not importing any BSM model, MadGraph setup is done. In case of any BSM model (like VLL in my case), the BSM models are unpacked into the MadGraph directory as follows.
/home/phazarik/mcgeneration/mg5amcnlo/models/VLLS_NLO /home/phazarik/mcgeneration/mg5amcnlo/models/VLLD_NLO
These model files in my example are compatible with MadGraph version v2 that was built on python2, but should also work with the latest MadGraph v3 that uses python3. In order to use MadGraph v3, these model files are made compatible with python3, and then imported into MadGraph as follows.
shell>> mg5 #Takes me into MadGraph interface. MG5_aMC> set auto_convert_model T #For compatibility with python3. MG5_aMC> import model VLLD_NLO
If the following outputs pop-up successfully, then the setup is ready.
INFO: Change particles name to pass to MG5 convention
the definition of 'j' and 'p' to 5 flavour scheme.
definitions of multiparticles l+ / l- / vl / vl~ unchanged
multiparticle all = g ghg ghg~ u c d s b u~ c~ d~ s~ b~ a gha gha~ ve vm vt e- mu- ta- ve~ vm~ vt~ e+ mu+ ta+ t t~ z w+ ghz ghwp ghwm h w- ghz~ ghwp~ ghwm~ taup nup taup~ nup~
INFO: Change particles name to pass to MG5 convention
definitions of multiparticles p / j / l+ / l- / vl / vl~ unchanged
multiparticle all = g ghg ghg~ u c d s b u~ c~ d~ s~ b~ a gha gha~ ve vm vt e- mu- ta- ve~ vm~ vt~ e+ mu+ ta+ t t~ z w+ ghz ghwp ghwm h w- ghz~ ghwp~ ghwm~ taup nup taup~ nup~
HepMC is widely used for exchanging event information between event generators and detector simulation tools. For this exercise, HepMC3 can be downloaded from GitLab. Some usage instructions are available here.
HepMC3 can be brought from GitLab and built as follows.
git clone https://gitlab.cern.ch/hepmc/HepMC3.git # This will create a source directory called 'HepMC3'. mkdir hepmc3_build hepmc3_install # This will create two directories where hepmc3 is built and installed. cd hepmc3_build # This is where building hepmc3 happens. cmake -DCMAKE_INSTALL_PREFIX=../hepmc3_install -Dmomentum:STRING=GEV -Dlength:STRING=MM ../HepMC3 make # This requires the C++ compilers (as checked by the previous command), and takes some time. make install # This will transfer files to the install directory.
PYTHIA is a program for simulating particle interactions as well as hadronization. It can be downloaded from the official website. Installation instructions are provided there. The following steps are used to install pythia with HepMC3 configuration. Make sure to give the full path to HepMC3 during configuration. I am using version 8.312, but these instructions are valid for all versions.
wget https://www.pythia.org/download/pythia83/pythia8312.tgz # Downloading pythia. tar xvfz pythia8312.tgz # Unzipping the tarball. cd pythia8312
In the configuration, I am including the HepMC3 library. For this, I need to put the full path to HepMC3 along with the
./configure
command as follows.
./configure --with-hepmc3=/home/phazarik/mcgeneration/hepmc3_install # Alternative build commands: #./configure --with-hepmc3=/home/phazarik/mcgeneration/hepmc3_install \ # --with-python-include=/home/phazarik/miniconda3_backup_2024_10_09/envs/analysis/include/python3.10 \ # --with-python-bin=/home/phazarik/miniconda3_backup_2024_10_09/envs/analysis/bin # ./configure # ./configure --with-python #If everything goes right, the following report should pop up. #--------------------------------------------------------------------- #| PYTHIA Configuration Summary | #--------------------------------------------------------------------- # Architecture = LINUX # C++ compiler CXX = g++ # C++ dynamic tags CXX_DTAGS = -Wl,--disable-new-dtags # C++ flags CXX_COMMON = -O2 -std=c++11 -pedantic -W -Wall -Wshadow -fPIC -pthread # C++ shared flag CXX_SHARED = -shared # Further options = #The following optional external packages will be used: #+ HEPMC3 (-I/home/phazarik/mcgeneration/hepmc3_install/include) make clean # Removes temporary files from previous attempts, if any. make # This takes a couple of minutes.
Hadronization happens in the pythiaXXXX/examples
directory. That's why I have built pythia in my work-area for my convenience. But
it can also be kept along with the other tools and export the output files to work-area for the next steps. Anyway, once pythia is build, I
added the following paths to my .bashrc
file, which is needed for C++ compilation of the hadronizer codes.
export PYTHIA8=/mnt/d/work/temp/mcgeneration/pythia8312 export PYTHIA8DATA=$PYTHIA8/share/Pythia8/xmldoc export PATH=$PYTHIA8/bin:$PATH export LD_LIBRARY_PATH=$PYTHIA8/lib:$LD_LIBRARY_PATH
In some of the examples, I have also seen PYTHIA used in a python based interface. For this, one can easily install PYTHIA and HepMC3 using conda-forge. But this is not important for the toy example that I have shared in the next section. For a new user, I would not recommend this, because managing multiple versions of tools is messy.
conda install -c conda-forge pythia8 # not important conda install -c conda-forge hepmc3 # not important
I also found a nice YouTube video on this Python based PYTHIA interface, which is a part of an HSF workshop. This video broadly covers the basics of a lot of things that can be done with PYTHIA, starting from event generation.
Delphes is a fast simulation framework for high-energy physics detectors. The Delphes outputs are equivalent to NanoAOD, but the information is structured differently in the ROOT files. It can be downloaded from the GitHub repository. Installation instructions are available there as well. Building delphes is pretty straight forward.
git clone https://github.com/delphes/delphes.git cd delphes make # This takes a couple of minutes
In some examples, PYTHIA, HEPMC3 etc. can be used from the MadGraph interface itself (I'm exploring it, not an expert yet). For this, paths to PYTHIA and Delples should be included in MadGraph configuration. This has to be done once. I also added my fastjet setup, in case I need it later.
set pythia8_path /home/phazarik/mcgeneration/pythia8312 set delphes_path /home/phazarik/mcgeneration/delphes set fastjet /home/phazarik/fastjet-3.4.2/bin/fastjet-config #optional
These paths are needed only when the hadronization etc is done inside MadGraph itself, by importing these modules. But in the
toy example, I am doing each step of the sample generation individually. These paths can also be added manually by editing
the input/mg5_configuration.txt
file in the MadGraph directory. Also, MadGraph may ask for something called lhapdf
,
which is a standard tool for evaluating parton distribution functions (PDFs). This can be installed using conda as follows.
conda install -c conda-forge lhapdf
After these installations, following are the list of MC-generation tools I have in my setup.
Packages | Versions | Sources |
---|---|---|
mg5amnlo | 3.5.4 | GitHub |
hepmc3 | 3.3.0 | GitLab |
pythia | 8.312 | pythia.org |
delphes | 3.5.0 | GitHub |
lhapdf | 6.5.4 | conda-forge |
install pythia8
(for pythia and HepMC together) and install Delphes
(for Delphes alone). These are kept in
mg5amcnlo/HEPTools
. But I don't trust it yet, and there is no control on the versions. I prefer to do it manually.
The flowchart below illustrates the event simulation workflow I am going to use. The process begins with generation using MadGraph, where parton-level events are generated in the LHE format. The output LHE file is then fed into PYTHIA for hadronization, where partons are converted into physical hadrons, resulting in a DAT file. Finally, detector simulation is done using Delphes with a CMS card, which simulate how the CMS would record these events, producing a ROOT file that can be used for further analysis.
In the generation phase, tools like MadGraph are used to simulate hard scattering events based on quantum field theory. MadGraph takes in a process definition and a set of parameters from the run_card and param_card. The input file typically consists of these cards and configuration files, which define the physics process and the collider setup. The output of this phase is in the LHE format (Les Houches Event), a standardized text file that contains detailed information about the parton-level event, such as particle IDs, momenta, and event weights. The LHE file serves as the bridge to further event processing.
Let's try to generate a simple Drell-Yan process: pp → Z → ll. For this, be in your work area, go to MadGraph prompt, and define the process.
mg5 # my shortcut for opening MadGraph prompt. #Now inside MadGraph prompt. display particle # displays all the available individual particles. display multiparticles # displays all the labels for groups of particles. generate p p > z > l+ l- output ppToZToLL exit
The output
line creates a new directory containing the pp → Z → ll process, including the run-cards and param-cards etc.
In principle, all of these can be done in a single step in the MadGraph prompt, but it's convenient to customize the parameters later. The cards
are kept at ppToZToLL/Cards
. Let's edit the run_card.dat
file and change/add some important parameters as follows, and
keep everything else as it is.
#Edited line: 100 = nevents ! Number of unweighted events requested, changed to 10 #Newly added lines: # -- customize according to your need --
The ppToZToLL/bin
directory contains the binaries used to run the event generator, and ppToZToLL/Events
contains the
outputs for each run. Let's run a test and see what it does.
# Be inside ppToZToLL directory. ./bin/generate_events testrun -f > /dev/null 2>&1 # testrun = name of the directory to be generated inside ppToZToLL/Events # -f = suppresses the MadGraph CLI prompts and takes the parameters from run_card. # > /dev/null 2>&1 = Suppresses any GUI related issues (in WSL)
This creates a ppToZToLL/Events/testrun
directory containing a .gz
file which can be unzipped and used for later
purposes.
cd Events/testrun gunzip unweighted_events.lhe.gz
This unzipped LHE file, unweighted_events.lhe
, contains all the generated parton-level information, including particle IDs,
momenta, event weights, and other metadata.
The hadronization step happens in the pythiaXXXX/examples
directory. For this, we need a code that reads the lhe file and
hadronizes them, and edit pythiaXXXX/examples/Makefile
to include this code so that it is compiled along with the rest of the
staff.
hadlhe.cc
, and can be found here.
Makefile
can be found here, and should
replace the old one.
These files can also be directly downloaded into the correct directory as follows.
cd [path to pythia]/examples wget -O hadlhe.cc https://raw.githubusercontent.com/phazarik/phazarik.github.io/main/mypages/files/codes/hadlhe.cc wget -O Makefile https://raw.githubusercontent.com/phazarik/phazarik.github.io/main/mypages/files/codes/Makefile_for_pythia.txt
The hadlhe.cc
file should be edited to give the correct path to the input LHE file that was generated using MadGraph, the number of
events in the input file, and a desired output path for the dat
file that contains the hadronized outputs. This
dat
file is later imported in Delphes for detector simulation. The hadronizer is executed as follows.
#inside the examples directory make clean # to get rid of any previous compiled files make hadlhe # this looks fof hadlhe.cc, compiles it, and creates an executable ./hadlhe # execution
Delphes simulation can be run using the executable DelphesHepMC3
followed by the arguments: card-name, output-file and input-file.
The first two arguments are the same in every case, so I added them in my bashrc as follows.
alias delphes="/home/phazarik/mcgeneration/delphes/DelphesHepMC3 /home/phazarik/mcgeneration/delphes/cards/delphes_card_CMS.tcl"
For this toy example, I ran the following from my work-directory to produce a delphes tree.
delphes testroot.root pythia8312/examples/ppToZToLL_10.dat
So far, the whole MC production process is done step-by-step. First, event generation at the LHE level, then producing the hadronized gen-level information using PYTHIA, and then detector simulation using Delphes. This whole exercise is done for understanding what happens at the back-end. The hadronization step produces large dat files, which needs to be handled by Delphes before being useful. It turns out that this extra step can be avoided by merging the work of Delphes and PYTHIA, which is illustrated in the flowchart below.
For this the following things need to be done.
.bashrc
.
export PYTHIA8=[path to pythia8312] #Give the full path here export PYTHIA8DATA=$PYTHIA8/share/Pythia8/xmldoc export PATH=$PYTHIA8/bin:$PATH export LD_LIBRARY_PATH=$PYTHIA8/lib:$LD_LIBRARY_PATH
cd /home/phazarik/mcgeneration/delphes make clean make HAS_PYTHIA8=true # This will take a while
pythia8_ppToZToLL.cfg
, and keep it in the process directory. This contains a
specific set of instructions that is needed during hadronization, including the path to the input files, how many events to process, etc. This
file is fed as an input to Delphes. If the number of events specified in the configuration does not match the number of events available in
the LHE file, PYTHIA will still proceed with whatever number of events it can read.
! Pythia8 configuration for pp -> Z -> LL process Main:numberOfEvents = 10 ! Number of events to simulate Main:timesAllowErrors = 10 ! How many errors Pythia will allow before stopping ! Set up beam parameters Beams:idA = 2212 ! Proton beam Beams:idB = 2212 ! Proton beam Beams:eCM = 13000.0 ! Center-of-mass energy in GeV ! Load the LHEF events generated by MadGraph Beams:LHEF = /mnt/d/work/temp/mcgeneration/ppToZToLL/Events/testrun/unweighted_events.lhe ! Pythia8-specific physics settings WeakSingleBoson:ffbar2gmZ = on ! Enable Z boson production 23:onMode = off ! Turn off all Z decays 23:onIfAny = 11 13 ! Allow only decays to leptons (e+ e- and mu+ mu-) ! Hadronization and jet clustering HadronLevel:all = on
DelphesHepMC3
, run DelphesPythia8
along with the CMS card. For this, I created an alias in the
.bashrc
file for my convenience.
alias delphespythia="/home/phazarik/mcgeneration/delphes/DelphesPythia8 /home/phazarik/mcgeneration/delphes/cards/delphes_card_CMS.tcl"
Once these changes are made, the following can be run from the work-directory.
delphespythia ppToZToLL/pythia8_ppToZToLL.cfg test_output.root
I find it convenient because I don't have to go to the pythia8312/examples
directory and create a mess there with all the temporary
folders, and deal with the DAT files. However, this method complains about some missing libraries related to ExRootAnalysis
at the
beginning, and ROOT prompts some warnings about the same while loading the output files. But as long as we are not using those features, it's
fine.
output processname
.run_card.dat
and specify the number of events../bin/generate_events testrun
.Events/testrun
.delphes/DelphesPythia8
by providing the CMS card, the PYTHIA config file, and the output file as arguments.
For analyzing Delphes trees and writing histograms, I created a MakeSelelector()
based setup with instructions available in this
GitHub repository.
Previous sections describe how to generate MC completely outside CMSSW framework, using delphes to approximate detector simulation. This section focuses on generating gridpacks, which are required to produce MC samples within the CMSSW framework, where full detector simulation is performed using GEANT4. This section is completely independent of the earlier ones.
MC generation in CMS follows a standard sequence: Gridpack > LHEGS > Premix > AODSIM > MINIAODSIM > NANOAODSIM.
These steps include event generation (e.g., MadGraph), parton showering and hadronization (Pythia), detector simulation (GEANT4 via CMSSW), and
full reconstruction. Each step is configured through CMSSW fragments, and production is aligned with centrally defined campaign configurations.
Users generate gridpacks using the gridpack_generation.sh
script from the
GitHub:genproductions/MadGraph5_aMCatNLO
tool, specifying the model and process. For central production, the gridpack, Pythia fragment, number of events, and other metadata are provided
to the NPS MC contact. For local validation, the gridpack is processed with cmsDriver.py
to create GEN-SIM or full AOD workflows.
As an example, I am producing a VLL doublet (electron type) sample with mass of both new leptons being 600 GeV. I prepared some helper scripts to manage and organize the outputs. These are to be brought to the work area as follows.
modeldict.yaml
: YAML file containing information on the signal models. This is used to auto-generate the cards from a template.
Next, the genproductions tool is cloned inside a directory named Run3.
mkdir Run3 && cd Run3 git clone https://github.com/cms-sw/genproductions.git --depth=1
In case of Run2, a specific branch mg265UL
is recommended, and that's why it should be kept in a separate repository to avoid
conflicts.
mkdir Run2 && cd Run2 git clone https://github.com/cms-sw/genproductions.git --depth=1 -b mg265UL
At this point, the work area should look like this:
. ├── Run3 │ └── genproductions. ├── generate_cards.py ├── generate_one_gridpack.py ├── modeldict.yaml ├── move_cards.py └── templates ├── customizecards.dat ├── extramodels.dat ├── proc_card_doublet.dat ├── proc_card_singlet.dat └── run_card.dat
The card generation is automatically handled by
generate_cards.py
which loops over the
different mass points described in the YAML file, fills up the parameters (such as beam energy, decay leptons, masses of the VLLs etc.) in the
template cards and creates a set of cards for each mass point. After this,
move_cards.py
is used to move the generated
cards to the genproductions/bin/MadGraph5_aMCatNLO/cards/VLL
directory. Now the setup is ready to generate gridpacks. At this
point, relevant files/directories in the genproductions setup should look like this:
. └── Run3 └── genproductions └──bin └──MadGraph5_aMCatNLO ├── PLUGIN ├── Utilities ├── cards │ └── VLL └── gridpack_generation.sh
VLL.tgz
in this case) is available in the central repository (cms-project-generators).
A note on VLL M-100: In the cards, the logic for producing the vector-like charged lepton L, and the neutral lepton N (both of
which couple to muons in this example) is the following.
define lepton = mu+ mu- vm vm~ # Taken from YAML generate p p > L L, (L > z lepton), (L > h lepton) # Pair production add process p p > N N, (N > w+ lepton), (N > w- lepton) # Pair production add process p p > L N, (L > z lepton), (L > h lepton), (N > w+ lepton), (N > w- lepton) # Associated production
However, there is an issue with these for the low mass-point. For M = 100 GeV, some of the decay chains (e.g., L > higgs + lepton with mH = 125 GeV) are not kinematically allowed (if the Higgs is on-shell). So MadGraph tries to include these subprocesses but fails midway. The same thing is true for the associated production. The gridpack generation tries to compile all 24 processes regardless of whether they are viable or not. If even one fails, the entire gridpack generation fails. MadGraph is designed to produce on-shell bosons with minimal instructions, and it is safe to skip this mass point if it's not too important.
Just to generate one gridpack, the following command can be run from the genproductions/bin/MadGraph5_aMCatNLO
directory.
chmod +x gridpack_generation.sh ./gridpack_generation.sh VLLD_ele_M600 cards/VLL/VLLD_ele_M600 local ALL el8_amd64_gcc10 CMSSW_12_4_8
Where,
VLLD_ele_M600
: Name of the gridpack to be generated.cards/VLL/VLLD_ele_M600
: Path to the process card directory.local
: Run mode (condor
can be used here).ALL
: Indicates that all available cores should be used. Number of cores can also be mentioned here.CMSSW_12_4_8
: Version of CMSSW to be used for the subsequent steps.el8_amd64_gcc10
: SCRAM architecture (must be compatible with the CMSSW release).
To simplify the process, I have automated gridpack generation using
generate_one_gridpack.py
. This script automatically sets the run mode, SCRAM architecture, and CMSSW version - requiring only the sample name as input. After
generating the gridpack, it also transfers the output to EOS to conserve space in the AFS workspace. The script can be executed from the base
working directory as shown below:
python3 generate_one_gridpack.py --name VLLD_ele_M600
Full automation has not been implemented for this setup, as gridpack generation is a one-time task and the process cards are rarely modified. The resulting gridpacks will be available in the EOS directory.
genproductions/bin/MadGraph5_aMCatNLO
directory also contains several submit_*.sh
scripts,
such as submit_cmsconnect_gridpack_generation.sh
and submit_condor_gridpack_generation.sh
. These are wrapper scripts
around gridpack_generation.sh
, originally created to simplify submission in specific environments like CMSConnect or HTCondor.
While they can still be useful for batch submission, site-specific configurations, or legacy workflows, they are not strictly necessary. The
current setup uses this main script directly, without relying on the wrappers.
This section describes the local validation of the gridpacks after they are successfully generated, i.e, the full production chain to produce
NanoAOD. I am taking VLLD_ele_M600
as an example. First, pick a separate work area for validating the test gridpack and unzip it as
follows.
mkdir VLLD_ele_M600 tar -xf VLLD_ele_M600_el8_amd64_gcc10_CMSSW_12_4_8_tarball.tar.xz -C VLLD_ele_M600
This creates a predefined structure required for event generation. A breakdown of the key contents is as follows:
VLLD_ele_M600 ├── InputCards ├── gridpack_generation.log ├── merge.pl ├── mgbasedir ├── process │ ├── madevent │ └── run.sh └── runcmsgrid.sh
InputCards/
: Contains the cards used to generate the gridpack.gridpack_generation.log
: Full log of the gridpack creation process for debugging.mgbasedir/
: Contains the full MadGraph installation with model files, binaries etc. needed to reproduce the generation.process/
: Includes process-specific configuration and scripts, notably run.sh
for launching the internal generation
step.
runcmsgrid.sh
: The main script used to generate LHE events.
Once the gridpack has been extracted, parton-level events in LHE (Les Houches Event) format can be produced using the
runcmsgrid.sh
script. This script is included inside the unpacked gridpack directory and serves as the interface to MadGraph,
MadSpin (if applicable), and Pythia (if configured). It prepares the runtime environment, handles all necessary steps of event generation, and
produces a cmsgrid_final.lhe
file containing the events. To run the script for a small test sample of events:
cd VLLD_ele_M600 ./runcmsgrid.sh 1000 12345 4
Where:
1000
is the number of events to generate12345
is the random seed4
is the number of threads or parallel jobs to use
This also creates a temporary CMSSW environment under the hood and ensures all dependencies are properly set up for the run. The output file
cmsgrid_final.lhe
contains all the generated parton-level four-vectors, with each event uniquely defined. Make sure to generate
enough events here to support the statistics needed for downstream steps such as full simulation and analysis.
This step takes the parton-level LHE events and runs hadronization and particle showering using Pythia8 to produce GEN-SIM files that simulate
particles and their interactions with the detector. A CMSSW configuration is required for this step. The gridpack was produced with
CMSSW_12_4_8
. However, I will proceed with CMSSW_13_0_13
to match 2022 (preEE) MC conditions. First, set up the CMSSW
environment and directory structure as follows.
CMSSW_12_4_X
or CMSSW_13_0_X
can be used interchangeably for GEN-SIM.
However, Run 2 productions require a separate branch of the genproductions
repository and use an older version of MadGraph, along
with older CMSSW versions executed inside a cmssw-el7
container. These setups are not interchangeable with the Run 3 setup.
echo $SCRAM_ARCH # It has to be compatible with the target CMSSW release cmsrel CMSSW_13_0_13 cd CMSSW_13_0_13/src/ mkdir -p Configuration/GenProduction/python/
The Configuration/GenProduction/python
directory is used to store the fragment (configuration script) describing how to hadronize
the LHE events. The directory structure is important because cmsDriver.py
(configuration maker tool) expects fragments to be in the
Python path under this namespace. Here is an example of a hadronizer, without using externalLHEProducer
(since we are testing a
local LHE file).
import FWCore.ParameterSet.Config as cms from Configuration.Generator.Pythia8CommonSettings_cfi import * from Configuration.Generator.MCTunes2017.PythiaCP5Settings_cfi import * ## Used till 2022 #from Configuration.Generator.MCTunesRun3ECM13p6TeV.PythiaCP5Settings_cfi import * ## From 2023 onwards from Configuration.Generator.PSweightsPythia.PythiaPSweightsSettings_cfi import * generator = cms.EDFilter("Pythia8ConcurrentHadronizerFilter", comEnergy = cms.double(13600.), maxEventsToPrint = cms.untracked.int32(1), pythiaPylistVerbosity = cms.untracked.int32(1), pythiaHepMCVerbosity = cms.untracked.bool(False), nAttempts = cms.uint32(1), PythiaParameters = cms.PSet( pythia8CommonSettingsBlock, pythia8CP5SettingsBlock, pythia8PSweightsSettingsBlock, parameterSets = cms.vstring( 'pythia8CommonSettings', 'pythia8CP5Settings', 'pythia8PSweightsSettings' ) ) ) ProductionFilterSequence = cms.Sequence(generator)
comEnergy = cms.double(13600.)
) matches the
center-of-mass energy of the LHE events. This is crucial for correct hadronization and particle showering.
Call it myfragment.py
. Once the fragment is in place, compile the setup.
scram b -j8 which cmsDriver.py
This should display the path to the cmsDriver.py
available in the current CMSSW environment. This tool is used to convert the
generated LHE file into a GEN-SIM file. However, for debugging purposes, it is often useful to generate a configuration file first using the
--no_exec
option. This config file can be inspected or modified before executing. Adjust filenames, number of events, and
detector-specific settings like beamspot and global conditions as needed. In the following, I am following an example of Run3Summer22 (pre-EE)
conditions.
cmsDriver.py Configuration/GenProduction/python/myfragment.py \ --filein file:../../cmsgrid_final.lhe \ --fileout file:VLLD_ele_M600_GENSIM.root \ --mc \ --eventcontent RAWSIM \ --datatier GEN-SIM \ --beamspot Realistic25ns13p6TeVEarly2022Collision \ --step GEN,SIM \ --nThreads 8 \ --geometry DB:Extended \ --era Run3 \ --conditions 130X_mcRun3_2022_realistic_v5 \ --customise Configuration/DataProcessing/Utils.addMonitoring \ --python_filename cfg_1_GENSIM.py \ --no_exec \ -n 100
This creates a configuration file named cfg_1_GENSIM.py
without running the job immediately. Matching the parameters -
--conditions
, --beamspot
, and --era
with the intended simulation campaign and CMSSW release is important.
Finally, run the generation step.
cmsRun cfg_1_GENSIM.py
This produces a GEN-SIM ROOT file named VLLD_ele_M600_GENSIM.root
from the LHE input, suitable for the next simulation steps. This
file contains both the generator-level information (i.e. final-state particles from MadGraph/Pythia) and the full detector simulation output
from GEANT4, which emulates how those particles would interact with the CMS detector. This includes simulated energy deposits, tracking hits,
and digitized detector responses. The GEN-SIM step is notably slow and computationally heavy because of the detailed physics
and geometry involved in the simulation. While crucial for realistic MC production, such jobs are typically done on grid resources rather than
locally, except for small-scale validation like this. During the validation, it also runs the GenXsecAnalyzer, which computes the cross-section
of the generated events, and displays the time taken per event. Estimation of time-per-event and
size-per-event is needed so that PdmV can estimate how long the central production might take.
In this step, the simulated detector hits are converted into raw detector data format, and this is where pileup interactions can be added by overlaying minimum bias events on top of the hard scattering. Access to pileup datasets from DBS requires a valid VOMS proxy. In this example, I am not providing a pileup dataset.
cmsDriver.py step1 \ --filein file:VLLD_ele_M600_GENSIM.root \ --fileout file:VLLD_ele_M600_DIGIRAW.root \ --eventcontent FEVTDEBUGHLT \ --datatier GEN-SIM-DIGI-RAW \ --step DIGI,L1,DIGI2RAW,HLT:@fake2 \ --nThreads 8 \ --geometry DB:Extended \ --era Run3 \ --conditions 130X_mcRun3_2022_realistic_v5 \ --beamspot Realistic25ns13p6TeVEarly2022Collision \ --customise Configuration/DataProcessing/Utils.addMonitoring \ --python_filename cfg_2_DIGIRAW.py \ --no_exec \ --mc \ -n 100
cmsRun cfg_2_DIGIRAW.py
In this step, the full detector reconstruction is performed on the RAW or DIGIRAW data, producing reconstructed physics objects such as tracks, jets, electrons, and muons. This completes the previous digitization step and produces fully analysis-ready objects.
cmsDriver.py step2 \ --filein file:VLLD_ele_M600_DIGIRAW.root \ --fileout file:VLLD_ele_M600_AOD.root \ --eventcontent AODSIM \ --datatier AODSIM \ --step RAW2DIGI,L1Reco,RECO,RECOSIM \ --nThreads 8 \ --geometry DB:Extended \ --era Run3 \ --conditions 130X_mcRun3_2022_realistic_v5 \ --beamspot Realistic25ns13p6TeVEarly2022Collision \ --customise Configuration/DataProcessing/Utils.addMonitoring \ --python_filename cfg_3_AOD.py \ --no_exec \ --mc \ -n 100
cmsRun cfg_3_AOD.py
MINIAOD is a reduced format derived from AOD where reconstruction is not redone but the data is skimmed and slimmed to contain only essential reconstructed objects and variables. Some high-level corrections and possibly DNN-based identification variables can be added at this stage.
cmsDriver.py step3 \ --filein file:VLLD_ele_M600_AOD.root \ --fileout file:VLLD_ele_M600_MINIAOD.root \ --eventcontent MINIAODSIM \ --datatier MINIAODSIM \ --step PAT \ --nThreads 8 \ --geometry DB:Extended \ --era Run3 \ --conditions 130X_mcRun3_2022_realistic_v5 \ --beamspot Realistic25ns13p6TeVEarly2022Collision \ --customise Configuration/DataProcessing/Utils.addMonitoring \ --python_filename cfg_4_MINIAOD.py \ --no_exec \ --mc \ -n 100
cmsRun cfg_4_MINIAOD.py
NanoAOD further reduces the data size for fast physics analysis, containing selected reconstructed objects and variables, often including DNN outputs for particle identification or event classification. No reconstruction is performed here; it uses the objects produced in previous steps.
cmsDriver.py step4 \ --filein file:VLLD_ele_M600_MINIAOD.root \ --fileout file:VLLD_ele_M600_NANOAOD.root \ --eventcontent NANOAODSIM \ --datatier NANOAODSIM \ --step NANO \ --nThreads 8 \ --geometry DB:Extended \ --era Run3 \ --conditions 130X_mcRun3_2022_realistic_v5 \ --beamspot Realistic25ns13p6TeVEarly2022Collision \ --customise Configuration/DataProcessing/Utils.addMonitoring \ --python_filename cfg_5_NANOAOD.py \ --no_exec \ --mc \ -n 100
(cmsRun cfg_5_NANOAOD.py)>nanoaod.log
-n
or skip specific
events with process.source.skipEvents
. To skip events that raise ProductNotFound
errors automatically, add
process.options = cms.untracked.PSet(SkipEvent = cms.untracked.vstring('ProductNotFound'))
to the config. Always validate the
outputs with edmDumpEventContent
to ensure key branches are populated.