← Go back

Private MC sample generation

Local MC generation setup


In CMS, the SM processes are generated and simulated centrally, and we don't have to worry about them. But for a specific BSM search, a physicist has to take care of the BSM samples themselves, if they are not being handled already. For this section, I am using vector-like leptons as a reference BSM signal arXiv:1510.03556. First, we need to integrate this new physics model to the existing standard model in an event generator like MadGraph. Typically, theorists share their BSM model in the Universal FeynRules Output (UFO) format, which contain automatized matrix-element generators. These files contain python modules which can be easily included as an extension to MadGraph. This allows us to play with the new particles and the Feynman rules associated with them. I have taken the latest VLL UFO file from the repository: github.com/prudhvibhattiprolu.

This is the order of installation for this setup.

  1. Setting up MadGraph along with the BSM models.
  2. Building HepMC3.
  3. Installing and configuring pythia for hadronization.
  4. Building Delphes for detector simulation.

Here are some important packages required before installation. I am mentioning the versions that I have in my setup, but any recent versions should also work fine. I recommend using conda to install everything. It's important that ROOT is build with the same Python version that is used here. For detailed instructions on how to handle Python environments and install ROOT properly, visit here.

Packages Versions
python 3.10.9
cmake 3.22.1
git 2.34.1
g++, gcc 11.4.0
ROOT 6.26/10

I would also recommend to keep all of these MC-generation tools inside the same directory. For me, the working directory is /home/phazarik/mcgeneration.

Setting up MadGraph, including the BSM models

MadGraph can be downloaded from the official website. Building is not needed. The binary file is available at mg5amcnlo/bin/mg5_aMC. I have given the full path to MadGraph in my .bashrc file as follows.

alias mg5="/home/phazarik/mcgeneration/mg5amcnlo/bin/mg5_aMC"

If you are not importing any BSM model, MadGraph setup is done. In case of any BSM model (like VLL in my case), the BSM models are unpacked into the MadGraph directory as follows.

                /home/phazarik/mcgeneration/mg5amcnlo/models/VLLS_NLO
                /home/phazarik/mcgeneration/mg5amcnlo/models/VLLD_NLO
              

These model files in my example are compatible with MadGraph version v2 that was built on python2, but should also work with the latest MadGraph v3 that uses python3. In order to use MadGraph v3, these model files are made compatible with python3, and then imported into MadGraph as follows.

                shell>> mg5 #Takes me into MadGraph interface.
                MG5_aMC> set auto_convert_model T #For compatibility with python3. 
                MG5_aMC> import model VLLD_NLO
              

If the following outputs pop-up successfully, then the setup is ready.

                INFO: Change particles name to pass to MG5 convention 
the definition of 'j' and 'p' to 5 flavour scheme.
definitions of multiparticles l+ / l- / vl / vl~ unchanged
multiparticle all = g ghg ghg~ u c d s b u~ c~ d~ s~ b~ a gha gha~ ve vm vt e- mu- ta- ve~ vm~ vt~ e+ mu+ ta+ t t~ z w+ ghz ghwp ghwm h w- ghz~ ghwp~ ghwm~ taup nup taup~ nup~
INFO: Change particles name to pass to MG5 convention
definitions of multiparticles p / j / l+ / l- / vl / vl~ unchanged
multiparticle all = g ghg ghg~ u c d s b u~ c~ d~ s~ b~ a gha gha~ ve vm vt e- mu- ta- ve~ vm~ vt~ e+ mu+ ta+ t t~ z w+ ghz ghwp ghwm h w- ghz~ ghwp~ ghwm~ taup nup taup~ nup~

Setting up HepMC3

HepMC is widely used for exchanging event information between event generators and detector simulation tools. For this exercise, HepMC3 can be downloaded from GitLab. Some usage instructions are available here.

Note: Building HepMC3 is required before installing pythia, because PYTHIA configuration is done with HepMC3.

HepMC3 can be brought from GitLab and built as follows.

                git clone https://gitlab.cern.ch/hepmc/HepMC3.git # This will create a source directory called 'HepMC3'.
                mkdir hepmc3_build  hepmc3_install                # This will create two directories where hepmc3 is built and installed.
                cd hepmc3_build                                   # This is where building hepmc3 happens.
                cmake -DCMAKE_INSTALL_PREFIX=../hepmc3_install -Dmomentum:STRING=GEV -Dlength:STRING=MM ../HepMC3
                make                                              # This requires the C++ compilers (as checked by the previous command), and takes some time.
                make install                                      # This will transfer files to the install directory.
              
Note: HepMC3 is built with a certain Python version. Changing the Python version (or the conda environment) while using HepMC3 might cause problems later.

Setting up PYTHIA

PYTHIA is a program for simulating particle interactions as well as hadronization. It can be downloaded from the official website. Installation instructions are provided there. The following steps are used to install pythia with HepMC3 configuration. Make sure to give the full path to HepMC3 during configuration. I am using version 8.312, but these instructions are valid for all versions.

                wget https://www.pythia.org/download/pythia83/pythia8312.tgz  # Downloading pythia.
                tar xvfz pythia8312.tgz                                       # Unzipping the tarball.
                cd pythia8312
              

In the configuration, I am including the HepMC3 library. For this, I need to put the full path to HepMC3 along with the ./configure command as follows.


                ./configure --with-hepmc3=/home/phazarik/mcgeneration/hepmc3_install
 
                # Alternative build commands:
                #./configure --with-hepmc3=/home/phazarik/mcgeneration/hepmc3_install \
                #            --with-python-include=/home/phazarik/miniconda3_backup_2024_10_09/envs/analysis/include/python3.10 \
                #            --with-python-bin=/home/phazarik/miniconda3_backup_2024_10_09/envs/analysis/bin
                # ./configure
                # ./configure --with-python

                #If everything goes right, the following report should pop up.
                #---------------------------------------------------------------------
                #|                    PYTHIA Configuration Summary                   |
                #---------------------------------------------------------------------
                #  Architecture                = LINUX
                #  C++ compiler     CXX        = g++
                #  C++ dynamic tags CXX_DTAGS  = -Wl,--disable-new-dtags
                #  C++ flags        CXX_COMMON = -O2 -std=c++11 -pedantic -W -Wall -Wshadow -fPIC -pthread
                #  C++ shared flag  CXX_SHARED = -shared
                #  Further options             =

                #The following optional external packages will be used:
                #+ HEPMC3 (-I/home/phazarik/mcgeneration/hepmc3_install/include)

                make clean                       # Removes temporary files from previous attempts, if any.
                make                             # This takes a couple of minutes.
              

Hadronization happens in the pythiaXXXX/examples directory. That's why I have built pythia in my work-area for my convenience. But it can also be kept along with the other tools and export the output files to work-area for the next steps. Anyway, once pythia is build, I added the following paths to my .bashrc file, which is needed for C++ compilation of the hadronizer codes.

                export PYTHIA8=/mnt/d/work/temp/mcgeneration/pythia8312
                export PYTHIA8DATA=$PYTHIA8/share/Pythia8/xmldoc
                export PATH=$PYTHIA8/bin:$PATH
                export LD_LIBRARY_PATH=$PYTHIA8/lib:$LD_LIBRARY_PATH
              

If you are a big fan of Python ... [optional]

In some of the examples, I have also seen PYTHIA used in a python based interface. For this, one can easily install PYTHIA and HepMC3 using conda-forge. But this is not important for the toy example that I have shared in the next section. For a new user, I would not recommend this, because managing multiple versions of tools is messy.

                conda install -c conda-forge pythia8        # not important
                conda install -c conda-forge hepmc3         # not important
              

I also found a nice YouTube video on this Python based PYTHIA interface, which is a part of an HSF workshop. This video broadly covers the basics of a lot of things that can be done with PYTHIA, starting from event generation.

Setting up Delphes

Delphes is a fast simulation framework for high-energy physics detectors. The Delphes outputs are equivalent to NanoAOD, but the information is structured differently in the ROOT files. It can be downloaded from the GitHub repository. Installation instructions are available there as well. Building delphes is pretty straight forward.

                git clone https://github.com/delphes/delphes.git
                cd delphes
                make             # This takes a couple of minutes
              

Wrapping up ... [optional]

In some examples, PYTHIA, HEPMC3 etc. can be used from the MadGraph interface itself (I'm exploring it, not an expert yet). For this, paths to PYTHIA and Delples should be included in MadGraph configuration. This has to be done once. I also added my fastjet setup, in case I need it later.

                set pythia8_path /home/phazarik/mcgeneration/pythia8312
                set delphes_path /home/phazarik/mcgeneration/delphes
                set fastjet /home/phazarik/fastjet-3.4.2/bin/fastjet-config          #optional
              

These paths are needed only when the hadronization etc is done inside MadGraph itself, by importing these modules. But in the toy example, I am doing each step of the sample generation individually. These paths can also be added manually by editing the input/mg5_configuration.txt file in the MadGraph directory. Also, MadGraph may ask for something called lhapdf, which is a standard tool for evaluating parton distribution functions (PDFs). This can be installed using conda as follows.

                conda install -c conda-forge lhapdf
              

After these installations, following are the list of MC-generation tools I have in my setup.

Packages Versions Sources
mg5amnlo 3.5.4 GitHub
hepmc3 3.3.0 GitLab
pythia 8.312 pythia.org
delphes 3.5.0 GitHub
lhapdf 6.5.4 conda-forge
Note: Pythia, HepMC and Delphes can also be installed by simply going to the MadGraph prompt and running install pythia8 (for pythia and HepMC together) and install Delphes (for Delphes alone). These are kept in mg5amcnlo/HEPTools. But I don't trust it yet, and there is no control on the versions. I prefer to do it manually.

↑ back to top

Toy example


The flowchart below illustrates the event simulation workflow I am going to use. The process begins with generation using MadGraph, where parton-level events are generated in the LHE format. The output LHE file is then fed into PYTHIA for hadronization, where partons are converted into physical hadrons, resulting in a DAT file. Finally, detector simulation is done using Delphes with a CMS card, which simulate how the CMS would record these events, producing a ROOT file that can be used for further analysis.

graph LR; A[Generation in MadGraph
Input: Interaction parameters, Output: LHE] --> B[Hadronization in pythia
Input: LHE file, Output: DAT file]; B --> C[Simulation in delphes
Input: HepMC file, Output: ROOT file ];

Generation (QFT → LHE through MadGraph)

In the generation phase, tools like MadGraph are used to simulate hard scattering events based on quantum field theory. MadGraph takes in a process definition and a set of parameters from the run_card and param_card. The input file typically consists of these cards and configuration files, which define the physics process and the collider setup. The output of this phase is in the LHE format (Les Houches Event), a standardized text file that contains detailed information about the parton-level event, such as particle IDs, momenta, and event weights. The LHE file serves as the bridge to further event processing.

Let's try to generate a simple Drell-Yan process: pp → Z → ll. For this, be in your work area, go to MadGraph prompt, and define the process.

                mg5                       # my shortcut for opening MadGraph prompt.
                
                #Now inside MadGraph prompt.
                
                display particle          # displays all the available individual particles.
                display multiparticles    # displays all the labels for groups of particles.

                generate p p > z > l+ l-
                output ppToZToLL

                exit
              

The output line creates a new directory containing the pp → Z → ll process, including the run-cards and param-cards etc. In principle, all of these can be done in a single step in the MadGraph prompt, but it's convenient to customize the parameters later. The cards are kept at ppToZToLL/Cards. Let's edit the run_card.dat file and change/add some important parameters as follows, and keep everything else as it is.

                #Edited line:
                100  = nevents ! Number of unweighted events requested, changed to 10
                
                #Newly added lines:
                # -- customize according to your need --
              

The ppToZToLL/bin directory contains the binaries used to run the event generator, and ppToZToLL/Events contains the outputs for each run. Let's run a test and see what it does.

                # Be inside ppToZToLL directory.

                ./bin/generate_events testrun -f > /dev/null 2>&1
                
                # testrun          = name of the directory to be generated inside ppToZToLL/Events
                # -f               = suppresses the MadGraph CLI prompts and takes the parameters from run_card.
                # > /dev/null 2>&1 = Suppresses any GUI related issues (in WSL) 
              

This creates a ppToZToLL/Events/testrun directory containing a .gz file which can be unzipped and used for later purposes.

                cd Events/testrun
                gunzip unweighted_events.lhe.gz
              

This unzipped LHE file, unweighted_events.lhe, contains all the generated parton-level information, including particle IDs, momenta, event weights, and other metadata.

Hadronization (LHE → DAT through PYTHIA and HepMC3)

The hadronization step happens in the pythiaXXXX/examples directory. For this, we need a code that reads the lhe file and hadronizes them, and edit pythiaXXXX/examples/Makefile to include this code so that it is compiled along with the rest of the staff.

  • The hadronizer file is named hadlhe.cc, and can be found here.
  • The updated Makefile can be found here, and should replace the old one.

These files can also be directly downloaded into the correct directory as follows.

                  cd [path to pythia]/examples
                  wget -O hadlhe.cc https://raw.githubusercontent.com/phazarik/phazarik.github.io/main/mypages/files/codes/hadlhe.cc
                  wget -O Makefile https://raw.githubusercontent.com/phazarik/phazarik.github.io/main/mypages/files/codes/Makefile_for_pythia.txt
              

The hadlhe.cc file should be edited to give the correct path to the input LHE file that was generated using MadGraph, the number of events in the input file, and a desired output path for the dat file that contains the hadronized outputs. This dat file is later imported in Delphes for detector simulation. The hadronizer is executed as follows.

                #inside the examples directory

                make clean               # to get rid of any previous compiled files
                make hadlhe              # this looks fof hadlhe.cc, compiles it, and creates an executable
                ./hadlhe                 # execution
              

Simulation (DAT → ROOT through Delphes)

Delphes simulation can be run using the executable DelphesHepMC3 followed by the arguments: card-name, output-file and input-file. The first two arguments are the same in every case, so I added them in my bashrc as follows.

                alias delphes="/home/phazarik/mcgeneration/delphes/DelphesHepMC3 /home/phazarik/mcgeneration/delphes/cards/delphes_card_CMS.tcl"
              

For this toy example, I ran the following from my work-directory to produce a delphes tree.

                delphes testroot.root pythia8312/examples/ppToZToLL_10.dat
              

Hadronization and Simulation [combined step]

So far, the whole MC production process is done step-by-step. First, event generation at the LHE level, then producing the hadronized gen-level information using PYTHIA, and then detector simulation using Delphes. This whole exercise is done for understanding what happens at the back-end. The hadronization step produces large dat files, which needs to be handled by Delphes before being useful. It turns out that this extra step can be avoided by merging the work of Delphes and PYTHIA, which is illustrated in the flowchart below.

graph LR; A[Generation in MadGraph
Input: Interaction parameters, Output: LHE] --> B[Hadronization and Simulation in Delphes
Input: LHE, PYTHIA config file, Output: ROOT file];

For this the following things need to be done.

  • Make sure that the PYTHIA paths are available. The following lines can be added to .bashrc.
                        export PYTHIA8=[path to pythia8312] #Give the full path here
                        export PYTHIA8DATA=$PYTHIA8/share/Pythia8/xmldoc
                        export PATH=$PYTHIA8/bin:$PATH
                        export LD_LIBRARY_PATH=$PYTHIA8/lib:$LD_LIBRARY_PATH
                      
  • Configure Delphes with PYTHIA8.
                        cd /home/phazarik/mcgeneration/delphes
                        make clean
                        make HAS_PYTHIA8=true   # This will take a while
                      
  • Write a PYTHIA config file for the process as pythia8_ppToZToLL.cfg, and keep it in the process directory. This contains a specific set of instructions that is needed during hadronization, including the path to the input files, how many events to process, etc. This file is fed as an input to Delphes. If the number of events specified in the configuration does not match the number of events available in the LHE file, PYTHIA will still proceed with whatever number of events it can read.
  •                   ! Pythia8 configuration for pp -> Z -> LL process
    
                      Main:numberOfEvents = 10     ! Number of events to simulate
                      Main:timesAllowErrors = 10   ! How many errors Pythia will allow before stopping
    
                      ! Set up beam parameters
                      Beams:idA = 2212              ! Proton beam
                      Beams:idB = 2212              ! Proton beam
                      Beams:eCM = 13000.0           ! Center-of-mass energy in GeV
    
                      ! Load the LHEF events generated by MadGraph
                      Beams:LHEF = /mnt/d/work/temp/mcgeneration/ppToZToLL/Events/testrun/unweighted_events.lhe
    
                      ! Pythia8-specific physics settings
                      WeakSingleBoson:ffbar2gmZ = on   ! Enable Z boson production
                      23:onMode = off                  ! Turn off all Z decays
                      23:onIfAny = 11 13               ! Allow only decays to leptons (e+ e- and mu+ mu-)
    
                      ! Hadronization and jet clustering
                      HadronLevel:all = on
                    
  • Instead of running DelphesHepMC3, run DelphesPythia8 along with the CMS card. For this, I created an alias in the .bashrc file for my convenience.
                        alias delphespythia="/home/phazarik/mcgeneration/delphes/DelphesPythia8 /home/phazarik/mcgeneration/delphes/cards/delphes_card_CMS.tcl"
                      

Once these changes are made, the following can be run from the work-directory.

                delphespythia ppToZToLL/pythia8_ppToZToLL.cfg test_output.root
              

I find it convenient because I don't have to go to the pythia8312/examples directory and create a mess there with all the temporary folders, and deal with the DAT files. However, this method complains about some missing libraries related to ExRootAnalysis at the beginning, and ROOT prompts some warnings about the same while loading the output files. But as long as we are not using those features, it's fine.


Summary

  1. Generate LHE events in MadGraph.
    • Define the processes carefully in the MadGraph prompt.
    • Save the process by doing output processname.
    • Edit run_card.dat and specify the number of events.
    • Run ./bin/generate_events testrun.
    • Unzip the output file in Events/testrun.
  2. Run hadronization and detector simulation.
    • Write a PYTHIA config file mentioning the path to the LHE file along with hadronization parameters and how many events to run on.
    • Run delphes/DelphesPythia8 by providing the CMS card, the PYTHIA config file, and the output file as arguments.

For analyzing Delphes trees and writing histograms, I created a MakeSelelector() based setup with instructions available in this GitHub repository.

↑ back to top

Central production in CMS


Setting up a CMSSW environment

First, setup a suitable environment in lxplus. For this example, I am going with slc7_amb64_gcc700 which is compatible with CMSSW_12_4_8. This can be checked by running the following, and finding the right CMSSW version in the list.

                scram --arch slc7_amd64_gcc700 list CMSSW
              

Unfortunately, in this particular case, lxplus7 is no longer maintained. So we need to use a singularity image OS by running the following right after logging into lxplus. More of this can be found here here.

                cmssw-el7
                echo $SCRAM_ARCH                        # This should display some slc7 architecture.
                export SCRAM_ARCH=slc7_amd64_gcc700     # This will pick the right architecture.
                echo $SCRAM_ARCH                        # Make sure that ou have the right architecture.
              

Now, go to a desired work-area (preferably in eos), create a CMSSW work area and select the environment.

                cmsrel CMSSW_12_4_8
                cd CMSSW_12_4_8/src
                cmsenv
              

Gridpack generation

Riya is working on this. I will update this part later.

Local validation [manual, for one gridpack]

I am going to demonstrate this using CMSSW_12_4_8. I logged into lxplus8 (with el8_amd64_gcc10 architecture). Inside src, I copied a gen-fragment template for Run2 Ultra Legacy, edited the gridpack location to put the locally available gridpack (produced with the same CMSSW release), and compiled the setup.

                wget --no-check-certificate --content-disposition --retry-connrefused --tries=3 -P Configuration/GenProduction/python/ https://cms-pdmv.cern.ch/mcm/public/restapi/requests/get_fragment/EXO-RunIISummer20UL18wmLHEGEN-01288
                mv Configuration/GenProduction/python/EXO-RunIISummer20UL18wmLHEGEN-01288 Configuration/GenProduction/python/EXO-RunIISummer20UL18wmLHEGEN-01288.py

                #Make sure the file extension is py and edit the gridpack location.

                scram b -j8
              
  1. LHE step:
                        cmsDriver.py Configuration/GenProduction/python/EXO-RunIISummer20UL18wmLHEGEN-01288.py \
                        --python_filename EXO-RunIISummer20UL18wmGEN-01288_cfg.py \
                        --eventcontent RAWSIM,LHE --customise Configuration/DataProcessing/Utils.addMonitoring \
                        --datatier GEN,LHE --fileout file:EXO-RunIISummer20UL18wmGEN-01288.root \
                        --conditions 106X_upgrade2018_realistic_v4 --beamspot Realistic25ns13TeVEarly2018Collision \
                        --step LHE,GEN --geometry DB:Extended --era Run2_2018 --no_exec --mc -n 10
                      
  2. RECO step:

↑ back to top