Installation

System requirements

Hardware

IDEAL will run CPU-intensive simulations. In order to get results within a clinically reasonable time frame, many instances of GateRTion can be run in parallel on sufficiently fast CPUs. While this can in principle be achieved with a single machine, in the following we assume a typical setup with a small/medium size cluster.

Submission Node

  • At least 4 cores with a clock speed greater than 2GHz.

  • At least 16 GiB of RAM.

  • Local disk space for the operating system and HT Condor, 100 GiB should be sufficient.

  • Shared disk: see below.

Calculation Nodes

  • In total at least 40 cores (preferably 100-200) with a clock speed greater than 2GHz.

  • At least 8 GiB of RAM per core [1].

  • Local disk space for the operating system and HT Condor, 100 GiB should be sufficient.

Shared disk (internal)

  • At least half a terabyte.

  • Storage of all software, configuration and simulation data.

  • Accessible by the submission and calculation nodes.

  • Internal cluster network and storage hardware should provide at least O(10Gbit/second) read and write speed.

  • Should support high rewrite rate. During a simulation, temporary results up are saved for all cores typically every two minutes, O(1Gib/core).

  • To create a shared disk, we advise to use nfs-kernel-server, although other similar tools are available. The key steps to create a shared disk on the folder <dir_shared> are the following:

On server:

sudo gedit /etc/exports
add line: <dir_shared> IP_CLIENT1(rw,no_subtree_check) IP_CLIENT2(rw,no_subtree_check) IP_CLIENT3(rw,no_subtree_check)

sudo exportfs -ra
sudo ufw allow from IP_CLIENT1 to any port nfs
sudo ufw allow from IP_CLIENT2 to any port nfs
sudo ufw status

On clients:

mkdir <dir_shared>
sudo mount IP_HOST:<dir_shared> <dir_shared>
sudo gedit /etc/fstab
add line: IP_HOST:<dir_shared> <dir_shared> nfs rw 0 0

A more detailed explanation can be found here: https://www.blasbenito.com/post/shared-folder-beowulf-cluster/

Network access to/from submission node

The submission node should be accessible by the user, or be connected with an external server that functions as the user interface. To this end, the submission node should be connected to an reasonably fast internal network that allows access to a shared directory system (typically CIFS) or HTTPS connections with at least one other server. Recommended data upload and download speed is 1Gbit/second or faster.

Mounting Windows file shares

A typical clinical computing environment is dominated by MS Windows devices, and if the environment includes a Windows File Share (CIFS) then it can be convenient to mount this from the submit node of the IDEAL cluster. This can then be used for DICOM input to and output from IDEAL.

Ask your local MS Windows system administrator which subfolder(s) on the file share you can use for IDEAL input/output, and with which user credentials. Some administrators prefer to use personal user accounts for everything (so they can track who did what, in case something went wrong), others prefer to define “service user” accounts that can be used by several users for a particular (limited) purpose. Create a new folder on the submit node (/var/data/IDEAL/io in the example below) and save the user credentials in a text file secrets with -r-------- file permissions (readable only by you).

Then run the following script (or edit /etc/fstab, if you are comfortable doing that) to create “read only” mount point for reading input and a “read and write” mount point for writing output. The mount points can point to the same folder.

#!/bin/bash
set -x
set -e

# you need to define the names and paths here
ideal_remote="//servername.domainname/path/to/IDEAL/folder"
ideal_rw="/var/data/IDEAL/io/IDEAL_rw"
ideal_ro="/var/data/IDEAL/io/IDEAL_ro"
creds="/var/data/IDEAL/io/secrets.txt"

for d in "$ideal_rw" "$ideal_ro"; do
        if [ ! -d "$d" ] ; then
                mkdir -p "$d"
        fi
done

# you need to provide the actual uid and gid here
creds_uid_gid="credentials=$creds,uid=montecarlo,gid=montecarlo"

rw_opts="-o rw,file_mode=0660,dir_mode=0770,$creds_uid_gid"
ro_opts="-o ro,file_mode=0440,dir_mode=0550,$creds_uid_gid"
sudo mount.cifs "$ideal_remote" "$ideal_rw" $rw_opts
sudo mount.cifs "$ideal_remote" "$ideal_ro" $ro_opts

Software

Operating System

For all cluster nodes: Linux. Any major modern operating system (e.g. [Ubuntu] 24.04 or later) should work.

Python

  • Python [Python3] version >= 3.10 installed on all nodes.

  • Submission node: virtualenv and pip are used to install modules that are not part of the standard library.

  • In case the IDEAL cluster is not directly connected to the internet, the intranet should contain a repository that is accessible by the submission node and provides up to date release of the following python modules (versions are minimum versions):

HTCondor

IDEAL relies on the [HTCondor] cluster management system for running many simulations in parallel [2]. All major Linux distributions provide HTCondor as a standard package. The full documentation of HTCondor can be found on the HTCondor web page. To install:

sudo apt update
sudo apt install htcondor

Below some of the specific details for configuring and running HTCondor are described. These are meant as guidance, the optimal configuration may depend on the details of available cluster.

Configuration

Each (submit or calculation) node has HTCondor configuration files stored under /etc/condor/. The /etc/condor/condor_config file contains the default settings of a subset of all configurable options. This file should not be edited, since any edits may be overwritten by OS updates. The settings below may be added either to the /etc/condor/condor_config.local file, or in a series of files /etc/condor/config.d/NNN_XXXXXX, where NNN are numbers (to define the order) and XXXXXX are keywords that help you remember what kind of settings are defined in them.

The options described below are important for running IDEAL. The values of the settings are sometimes used in the definition of other settings, so be careful with the order in which you add them.

The configuration can be identical for all nodes, except for the daemon settings.

Condor host

The submit node should be “condor host”, which is declared by setting CONDOR_HOST to the IP address of the submit node:

CONDOR_HOST = w.x.y.z
Enable communication with other nodes

The simplest way to configure this is to just enable communication (“allow write”) for each node with all nodes in the cluster, including the node itself. The ALLOW_WRITE is a comma-separated list of all hostnames and IP addresses. For ease of reading, the nodes can be added one by one, like this:

ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), 127.0.0.1, 127.0.1.1
ALLOW_WRITE = $(ALLOW_HOST), submit_node_hostname, w.x.y.z
ALLOW_WRITE = $(ALLOW_HOST), calc_node_hostname, w.x.y.z
ALLOW_WRITE = $(ALLOW_HOST), calc_node_hostname, w.x.y.z
ALLOW_WRITE = $(ALLOW_HOST), calc_node_hostname, w.x.y.z
Which daemons on which nodes

This is the only item that requires different configuration for submit and calculation nodes.

For the submit node:

DAEMON_LIST  = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, GANGLIAD

For the calculation nodes:

ALLOW_NEGOTIATOR = $(CONDOR_HOST) $(IP_ADDRESS) 127.*
DAEMON_LIST  = MASTER, STARTD, SCHEDD
Network and filesystem

Make sure to configure the correct ethernet port name and the full host name of the submit node

BIND_ALL_INTERFACES = True
NETWORK_INTERFACE = ethernet_port_name
CUSTOM_FILE_DOMAIN = submit_node_full_hostname
FILESYSTEM_DOMAIN = $(CUSTOM_FILE_DOMAIN)
UID_DOMAIN = $(CUSTOM_FILE_DOMAIN)
Resource limits

HTCondor should try to use all CPU power, but refrain from starting jobs if the disk space, RAM or swap exceed some safe thresholds

SLOT_TYPE_1 = cpus=100%,disk=90%,ram=90%,swap=10%
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1_PARTITIONABLE = True
Resource guards

Define what to do when some already running job exceeds its resource limits

MachineMemoryString = "$(Memory)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS)  MachineMemoryString
MachineDiskString = "$(Disk)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS)  MachineDiskString
SYSTEM_PERIODIC_HOLD_memory = MATCH_EXP_MachineMemory =!= UNDEFINED && \
                       MemoryUsage > 1.0*int(MATCH_EXP_MachineMemoryString)
SYSTEM_PERIODIC_HOLD_disc = MATCH_EXP_MachineDisk =!= UNDEFINED && \
                       DiskUsage > int(MATCH_EXP_MachineDiskString)
SYSTEM_PERIODIC_HOLD = ($(SYSTEM_PERIODIC_HOLD_disc)) || ($(SYSTEM_PERIODIC_HOLD_memory))
SYSTEM_PERIODIC_HOLD_REASON = ifThenElse(SYSTEM_PERIODIC_HOLD_memory, \
                           "Used too much memory", ""), ifThenElse(SYSTEM_PERIODIC_HOLD_disc, \
                           "Used too much disk space","Reason unknown")

MEMORY_USED_BY_JOB_MB = ResidentSetSize/1024
MEMORY_EXCEEDED = ifThenElse(isUndefined(ResidentSetSize), False, ( ($(MEMORY_USED_BY_JOB_MB)) > RequestMemory ))
PREEMPT = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))
WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(MEMORY_EXCEEDED)) =!= TRUE
WANT_HOLD = ( $(MEMORY_EXCEEDED) )
WANT_HOLD_REASON = \
        ifThenElse( $(MEMORY_EXCEEDED), \
        "$(MemoryUsage) $(Memory) Your job exceeded the amount of requested memory on this machine.",\
         undefined )
Miscellaneous
##########################################
COUNT_HYPERTHREAD_CPUS=FALSE
START = TRUE
SUSPEND = FALSE
PREEMPT = FALSE
PREEMPTION_REQUIREMENTS = FALSE
KILL = FALSE
ALL_DEBUG = D_FULLDEBUG D_COMMAND
POOL_HISTORY_DIR = /var/log/condor/condor_history
KEEP_POOL_HISTORY = True
MaxJobRetirementTime    = (1 *  $(MINUTE))
CLAIM_WORKLIFE = 600
MAX_CONCURRENT_DOWNLOADS = 15
MAX_CONCURRENT_UPLOADS = 15

After changing the files:

sudo systemctl enable condor
sudo systemctl restart condor

NOTE: on older HTCondor versions (< 9), the condor commands are slightly different. The user is advised to consult the documentation relative to the installed version of Condor. NOTE: condor needs to be started only once, on EACH MACHINE belonging to the cluster To check that condor is running and that all machines are correctlyincluded in the cluster, the user can run: ```

condor_status

```

GATE-RTion

GATE-RTion [GateRTion] is a special release of Gate [GateGeant4], dedicated to clinical applications in pencil beam scanning particle therapy. From IDEAL v2.0, GateRTion is installed automatically by IDEAL, via pip.

IDEAL installation

Installing the IDEAL scripts

IDEAL is obtained by cloning from GitLab or unpacking a tar ball, provided by at the IDEAL source repository: https://gitlab.com/djboersma/ideal. In a future release, we hope that the code can simply be installed with pip install ideal (which would then also perform some of the post-install steps). The code should be installed on the shared disk of the IDEAL cluster. The install directory will be referred to in this manual as the “IDEAL top directory”. The IDEAL top directory has the following contents:

IDEAL top directory contents

Name

Type

Description

bin

Folder

Executable scripts and IDEAL_env.sh

cfg

Folder

System configuration file(s)

docs

Folder

Source file for this documentation

ideal

Folder

Python modules implementing the IDEAL functionality

gpl-3.0.txt

File

Open Source license, referred to by the LICENSE file

LICENSE

File

Open Source license

RELEASE_NOTES

File

Summary of changes between releases

The first_install.py script

IDEAL will not function correctly immediately after a clean install (cloning it from GitLab or extracting it from a tar ball).

Right after the install, it is recommended to run the bin/first_install.py script. This script will attempt to create a minimal working setup:

  • Some additional python modules installed (using virtualenv) in a so-called “virtual environment” named venv. This step will also install GateRTion v2.

  • Folders for commissioning data (definitions of beam lines, CTs, phantoms), logs, temporary data and output need to be created.

  • The available resources and the simulation preferences need to be specified in a “system configuration” file cfg/system.cfg in the IDEAL install directory.

The script tries to perform all the trivial steps of the installation. Simple examples of a beam line model, CT protocols and a phantom are provided. These examples are hopefully useful to give an idea of where and how you should install your own beam models, CT protocols and phantoms. The details are described in the Commissioning chapter.

This script is supposed to be run after all previous steps have been performed. Specifically:

  • A Linux cluster is available running the same OS on all nodes (e.g. Ubuntu 20.04) and with a fast shared disk that is accessible by all cluster nodes and has at least 200 GiB of free space.

  • HTCondor is installed and configured. All nodes on the Linux cluster run the same OS (e.g. Ubuntu 20.04).

  • Python version >= 3.10 and virtualenv are installed.

The first_install.py script will thoroughly check these assumptions but the checks are not exhaustive.

It is recommended to also give the name of the clinic (with the -C option). Many more options are available, see the script’s ‘–help’ output.

Installing necessary python modules

The installation step described in this section is performed by the first_install.py script.

If you did not run the first_install.py script, then please read the rest of this section.

IDEAL needs several external python modules that are not included in a default python installation. In order to avoid interference with python module needs for other applications, the preferred way of installing these modules is using a virtual environment called venv in the IDEAL top directory. This may be done using the following series of commands (which may be provided in an install script in a later release of IDEAL) in a bash shell after a cd to the IDEAL top directory:

virtuelenv -p python3 --prompt='(IDEAL 2.0) ' venv
source ./venv/bin/activate
pip install --upgrade pip
pip install opengate=10.1.0
pip install filelock htcondor itk matplotlib numpy pydicom python-daemon python-dateutil scipy Flask Flask-RESTful requests PyJWT
pip install pandas PyYAML cryptography Flask-SQLAlchemy apiflask dacite structlog
deactivate

If you decide to install the virtual environment under a different path, then you need to edit the bin/IDEAL_env.sh script to use the correct path for source /path/to/virtualenv/bin/activate line, or to remove that line altogether. You can of course add extra modules with pip.

Set up the data directories

Like the virtual environment, this installation step may be automated in an installation step in the next release. IDEAL needs a couple of folder to store logging, temporary data and output, respectively. In a bash shell after a cd to the IDEAL top directory, do:

mkdir data
mkdir data/logging
mkdir data/workdir
mkdir data/output
mkdir data/MyClinicCommissioningData

The subdirectories of data are described in more detail below.

logging

The logging directory is where all the debugging level output will be stored. In case something goes wrong, these logging files may help to investigate what went wrong. When you report issues to the developers, it can be useful to attach the log file(s).

workdir

The workdir directory will contain a subfolder for every time you use IDEAL to perform a dose calculation. The unique name of each subfolder is composed of the user’s initials, the name and/or label of the plan and a time stamp of when you submitted the job. The subfolder will contain all data to run the GATE simulations, preprocessing the input data and postprocessing the output data:

  • The data needed to run the simulations in Gate 10.

  • Output files, from all condor subjobs running this simulation.

  • Files that are used or generated by HTCondor for managing all the jobs.

  • Two more IDEAL-specific log files, namely: preprocessor.log, postprocessor.log.

The temporary data can take up a lot of space, typically a few dozen GiB, depending on the number of voxels in CT (after cropping it to a minimal bounding box containing the “External” ROI and the TPS dose distribution) and on the number of cores in your cluster. After a successful run, the temporary data is archived in compressed form, for debugging analysis in case errors happened or if there are questions about the final result.

Note

When an IDEAL job runs unsuccessfully, the temporary data is NOT automatically compressed/archived, since the user may want to investigate. Do not forget to delete or compress these data after the investigation has concluded, to avoid inadvertently filling up the disk too quickly. Launching the job monitoring program “bin/log_daemon.py” can help mitigating this issue.

output

The output directory will contain a subfolder of each IDEAL job, using the same naming scheme as for the work directories. In IDEAL’s system configuration file the user (with admin/commissioning role) can define which output will actually be saved, e.g. physical and/or effective dose, DICOM and/or MHD. This output directory serves to store the original output of the IDEAL job. If the path of a second output directory is given in the system configuration file, then the job output subfolder will be copied to that second location (e.g. on a CIFS file share, where it can be accessed by users on Windows devices).

Commissioning Data Directory

In the example MyClinic could be replaced by the name of your particle therapy clinic. If you are a researcher who studies plans from multiple different clinics, may want to create a commissioning data directory for each clinic.

This directory will contain the commissioning data for your particle therapy clinic. The details are laid out in the commissionig chapter.

Footnotes