Installation

System requirements

Hardware

IDEAL will run CPU-intensive simulations. [GateRTion] is single threaded, so in order to get results within a clinically reasonable time frame, many instances of GateRTion need to run in parallel on sufficiently fast CPUs. While this can in principle be achieved with a single machine, in the following we assume a typical setup with a small/medium size cluster.

Submission Node

  • At least 4 cores with a clock speed greater than 2GHz.

  • At least 16 GiB of RAM.

  • Local disk space for the operating system and HT Condor, 100 GiB should be sufficient.

  • Shared disk: see below.

Calculation Nodes

  • In total at least 40 cores (preferably 100-200) with a clock speed greater than 2GHz.

  • At least 8 GiB of RAM per core [1].

  • Local disk space for the operating system and HT Condor, 100 GiB should be sufficient.

Shared disk (internal)

  • At least half a terabyte.

  • Storage of all software, configuration and simulation data.

  • Accessible by the submission and calculation nodes.

  • Internal cluster network and storage hardware should provide at least O(10Gbit/second) read and write speed.

  • Should support high rewrite rate. During a simulation, temporary results up are saved for all cores typically every two minutes, O(1Gib/core).

  • To create a shared disk, we advise to use nfs-kernel-server, although other similar tools are available. The key steps to create a shared disk on the folder <dir_shared> are the following:

On server:

sudo gedit /etc/exports
add line: <dir_shared> IP_CLIENT1(rw,no_subtree_check) IP_CLIENT2(rw,no_subtree_check) IP_CLIENT3(rw,no_subtree_check)
sudo exportfs -ra
sudo ufw allow from IP_CLIENT1 to any port nfs
sudo ufw allow from IP_CLIENT2 to any port nfs
sudo ufw status

On clients:

mkdir <dir_shared>
sudo mount IP_HOST:<dir_shared> <dir_shared>
sudo gedit /etc/fstab
add line: IP_HOST:<dir_shared> <dir_shared> nfs rw 0 0

A more detailed explanation can be found here: https://www.blasbenito.com/post/03_shared_folder_in_cluster/

Network access to/from submission node

The submission node should be accessible by the user, or be connected with an external server that functions as the user interface. To this end, the submission node should be connected to an reasonably fast internal network that allows access to a shared directory system (typically CIFS) or HTTPS connections with at least one other server. Recommended data upload and download speed is 1Gbit/second or faster.

Mounting Windows file shares

A typical clinical computing environment is dominated by MS Windows devices, and if the environment includes a Windows File Share (CIFS) then it can be convenient to mount this from the submit node of the IDEAL cluster. This can then be used for DICOM input to and output from IDEAL.

Ask your local MS Windows system administrator which subfolder(s) on the file share you can use for IDEAL input/output, and with which user credentials. Some administrators prefer to use personal user accounts for everything (so they can track who did what, in case something went wrong), others prefer to define “service user” accounts that can be used by several users for a particular (limited) purpose. Create a new folder on the submit node (/var/data/IDEAL/io in the example below) and save the user credentials in a text file secrets with -r-------- file permissions (readable only by you).

Then run the following script (or edit /etc/fstab, if you are comfortable doing that) to create “read only” mount point for reading input and a “read and write” mount point for writing output. The mount points can point to the same folder.

#!/bin/bash
set -x
set -e

# you need to define the names and paths here
ideal_remote="//servername.domainname/path/to/IDEAL/folder"
ideal_rw="/var/data/IDEAL/io/IDEAL_rw"
ideal_ro="/var/data/IDEAL/io/IDEAL_ro"
creds="/var/data/IDEAL/io/secrets.txt"

for d in "$ideal_rw" "$ideal_ro"; do
        if [ ! -d "$d" ] ; then
                mkdir -p "$d"
        fi
done

# you need to provide the actual uid and gid here
creds_uid_gid="credentials=$creds,uid=montecarlo,gid=montecarlo"

rw_opts="-o rw,file_mode=0660,dir_mode=0770,$creds_uid_gid"
ro_opts="-o ro,file_mode=0440,dir_mode=0550,$creds_uid_gid"
sudo mount.cifs "$ideal_remote" "$ideal_rw" $rw_opts
sudo mount.cifs "$ideal_remote" "$ideal_ro" $ro_opts

Software

Operating System

For all cluster nodes: Linux. Any major modern operating system (e.g. [Ubuntu] 18.04 or later) should work.

Python

  • Python [Python3] version 3.6 or later should be installed on all nodes.

  • Submission node: virtualenv and pip are used to install modules that are not part of the standard library.

  • In case the IDEAL cluster is not directly connected to the internet, the intranet should contain a repository that is accessible by the submission node and provides up to date release of the following python modules (versions are minimum versions):

HTCondor

IDEAL relies on the [HTCondor] cluster management system for running many simulations in parallel [2]. Any recent release (e.g. 8.6.8) should work well. All major Linux distributions provide HTCondor as a standard package. The full documentation of HTCondor can be found on the HTCondor web page. To install: ``

sudo apt update sudo apt install htcondor

`` Below some of the specific details for configuring and running HTCondor are described. These are meant as guidance, the optimal configuration may depend on the details of available cluster.

Configuration

Each (submit or calculation) node has HTCondor configuration files stored under /etc/condor/. The /etc/condor/condor_config file contains the default settings of a subset of all configurable options. This file should not be edited, since any edits may be overwritten by OS updates. The settings below may be added either to the /etc/condor/condor_config.local file, or in a series of files /etc/condor/config.d/NNN_XXXXXX, where NNN are numbers (to define the order) and XXXXXX are keywords that help you remember what kind of settings are defined in them.

The options described below are important for running IDEAL. The values of the settings are sometimes used in the definition of other settings, so be careful with the order in which you add them.

The configuration can be identical for all nodes, except for the daemon settings.

Condor host

The submit node should be “condor host”, which is declared by setting CONDOR_HOST to the IP address of the submit node:

CONDOR_HOST = w.x.y.z
Enable communication with other nodes

The simplest way to configure this is to just enable communication (“allow write”) for each node with all nodes in the cluster, including the node itself. The ALLOW_WRITE is a comma-separated list of all hostnames and IP addresses. For ease of reading, the nodes can be added one by one, like this:

ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), 127.0.0.1, 127.0.1.1
ALLOW_WRITE = $(ALLOW_HOST), submit_node_hostname, w.x.y.z
ALLOW_WRITE = $(ALLOW_HOST), calc_node_hostname, w.x.y.z
ALLOW_WRITE = $(ALLOW_HOST), calc_node_hostname, w.x.y.z
ALLOW_WRITE = $(ALLOW_HOST), calc_node_hostname, w.x.y.z
Which daemons on which nodes

This is the only item that requires different configuration for submit and calculation nodes.

For the submit node:

DAEMON_LIST  = MASTER, COLLECTOR, NEGOTIATOR, SCHEDD, GANGLIAD

For the calculation nodes:

ALLOW_NEGOTIATOR = $(CONDOR_HOST) $(IP_ADDRESS) 127.*
DAEMON_LIST  = MASTER, STARTD, SCHEDD
Network and filesystem

Make sure to configure the correct ethernet port name and the full host name of the submit node

BIND_ALL_INTERFACES = True
NETWORK_INTERFACE = ethernet_port_name
CUSTOM_FILE_DOMAIN = submit_node_full_hostname
FILESYSTEM_DOMAIN = $(CUSTOM_FILE_DOMAIN)
UID_DOMAIN = $(CUSTOM_FILE_DOMAIN)
Resource limits

HTCondor should try to use all CPU power, but refrain from starting jobs if the disk space, RAM or swap exceed some safe thresholds

SLOT_TYPE_1 = cpus=100%,disk=90%,ram=90%,swap=10%
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1_PARTITIONABLE = True
Resource guards

Define what to do when some already running job exceeds its resource limits

MachineMemoryString = "$(Memory)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS)  MachineMemoryString
MachineDiskString = "$(Disk)"
SUBMIT_EXPRS = $(SUBMIT_EXPRS)  MachineDiskString
SYSTEM_PERIODIC_HOLD_memory = MATCH_EXP_MachineMemory =!= UNDEFINED && \
                       MemoryUsage > 1.0*int(MATCH_EXP_MachineMemoryString)
SYSTEM_PERIODIC_HOLD_disc = MATCH_EXP_MachineDisk =!= UNDEFINED && \
                       DiskUsage > int(MATCH_EXP_MachineDiskString)
SYSTEM_PERIODIC_HOLD = ($(SYSTEM_PERIODIC_HOLD_disc)) || ($(SYSTEM_PERIODIC_HOLD_memory))
SYSTEM_PERIODIC_HOLD_REASON = ifThenElse(SYSTEM_PERIODIC_HOLD_memory, \
                           "Used too much memory", ""), ifThenElse(SYSTEM_PERIODIC_HOLD_disc, \
                           "Used too much disk space","Reason unknown")

MEMORY_USED_BY_JOB_MB = ResidentSetSize/1024
MEMORY_EXCEEDED = ifThenElse(isUndefined(ResidentSetSize), False, ( ($(MEMORY_USED_BY_JOB_MB)) > RequestMemory ))
PREEMPT = ($(PREEMPT)) || ($(MEMORY_EXCEEDED))
WANT_SUSPEND = ($(WANT_SUSPEND)) && ($(MEMORY_EXCEEDED)) =!= TRUE
WANT_HOLD = ( $(MEMORY_EXCEEDED) )
WANT_HOLD_REASON = \
        ifThenElse( $(MEMORY_EXCEEDED), \
        "$(MemoryUsage) $(Memory) Your job exceeded the amount of requested memory on this machine.",\
         undefined )
Miscellaneous
##########################################
COUNT_HYPERTHREAD_CPUS=FALSE
START = TRUE
SUSPEND = FALSE
PREEMPT = FALSE
PREEMPTION_REQUIREMENTS = FALSE
KILL = FALSE
ALL_DEBUG = D_FULLDEBUG D_COMMAND
POOL_HISTORY_DIR = /var/log/condor/condor_history
KEEP_POOL_HISTORY = True
MaxJobRetirementTime    = (1 *  $(MINUTE))
CLAIM_WORKLIFE = 600
MAX_CONCURRENT_DOWNLOADS = 15
MAX_CONCURRENT_UPLOADS = 15

After changing the files: ```

sudo condor_master (sudo not needed in virtual machine) condor_reconfig

` NOTE: condor_master needs to be started only once, on EACH MACHINE belonging to the cluster To check that condor is running and that all machines are correctlyincluded in the cluster, the user can run: `

condor_status

```

ROOT

Any release with major release number 6 should work.

Geant4

GATE-RTion requires Geant4 version 10.03.p03, compiled without multithreading.

GATE-RTion

GATE-RTion [GateRTion] is a special release of Gate [GateGeant4], dedicated to clinical applications in pencil beam scanning particle therapy. If all nodes run the same hardware, then this can be compiled and installed once on the shared disk of the cluster and then be used by all nodes. If the different cluster nodes have different types of CPU then it can be good to compile Geant4, ROOT and Gate-RTion separately on all nodes and install it on the local disks (always under the same local path).

After installation, a short shell script should be created that can be “sourced” in order to set up the shell environment for running Gate, including the paths not only of Gate itself, but also of the Geant4 and ROOT libraries and data sets. For instance:

source "/usr/local/Geant4/10.03.p03/bin/geant4.sh"
source "/usr/local/ROOT/v6.12.06/bin/thisroot.sh"
export PATH="/usr/local/GATE/GateRTion-1.0/bin:$PATH"

IDEAL installation

Installing the IDEAL scripts

For the current 1.0rc release, IDEAL is obtained by cloning from GitLab or unpacking a tar ball, provided by at the IDEAL source repository: https://gitlab.com/djboersma/ideal. In a future release (1.0), we hope that the code can simply be installed with pip install ideal (which would then also perform some of the post-install steps). The code should be installed on the shared disk of the IDEAL cluster. The install directory will be referred to in this manual as the “IDEAL top directory”. The IDEAL top directory has the following contents:

IDEAL top directory contents

Name

Type

Description

bin

Folder

Executable scripts and IDEAL_env.sh

cfg

Folder

System configuration file(s)

docs

Folder

Source file for this documentation

ideal

Folder

Python modules implementing the IDEAL functionality

gpl-3.0.txt

File

Open Source license, referred to by the LICENSE file

LICENSE

File

Open Source license

RELEASE_NOTES

File

Summary of changes between releases

The first_install.py script

IDEAL will not function correctly immediately after a clean install (cloning it from GitLab or extracting it from a tar ball).

Right after the install, it is recommended to run the bin/first_install.py script. This script will attempt to create a minimal working setup:

  • Some additional python modules installed (using virtualenv) in a so-called “virtual environment” named venv.

  • Folders for commissioning data (definitions of beam lines, CTs, phantoms), logs, temporary data and output need to be created.

  • The available resources and the simulation preferences need to be specified in a “system configuration” file cfg/system.cfg in the IDEAL install directory.

The script tries to perform all the trivial steps of the installation. Simple examples of a beam line model, CT protocols and a phantom are provided. These examples are hopefully useful to give an idea of where and how you should install your own beam models, CT protocols and phantoms. The details are described in the Commissioning chapter.

This script is supposed to be run after all previous steps have been performed. Specifically:

  • A Linux cluster is available running the same OS on all nodes (e.g. Ubuntu 18.04) and with a fast shared disk that is accessible by all cluster nodes and has at least 200 GiB of free space.

  • Geant4, ROOT and GateRTion should all be installed on the shared disk. A gate_env.sh shell script is available to configure a shell environment (source /path/to/gate_env.sh) such that these are usable. Specifically, the Gate --version command should return the version blurb corresponding to “GateRTion 1.0”.

  • HTCondor is installed and configured. All nodes on the Linux cluster run the same OS (e.g. Ubuntu 18.04).

  • Python version 3.6 or newer and virtualenv are installed.

The first_install.py script will thoroughly check these assumptions but the checks are not exhaustive.

The minimum input for the script is the file path of the gate_env.sh script. It is recommended to also give the name of the clinic (with the -C option). Many more options are available, see the script’s ‘–help’ output.

Installing necessary python modules

The installation step described in this section is performed by the first_install.py script.

If you did not run the first_install.py script, then please read the rest of this section.

IDEAL needs several external python modules that are not included in a default python installation. In order to avoid interference with python module needs for other applications, the preferred way of installing these modules is using a virtual environment called venv in the IDEAL top directory. This may be done using the following series of commands (which may be provided in an install script in a later release of IDEAL) in a bash shell after a cd to the IDEAL top directory:

virtuelenv -p python3 --prompt='(IDEAL 1.0) ' venv
source ./venv/bin/activate
pip install filelock htcondor itk matplotlib numpy pydicom
pip install python-daemon python-dateutil scipy
deactivate

(The modules ipython, Sphinx and PyQt5 are optional. The first enables interactive, python-based analysis, the second enables you to generate these docs yourself, and the third enables the somewhat clunky sokrates.py GUI interface.)

If you decide to install the virtual environment under a different path, then you need to edit the bin/IDEAL_env.sh script to use the correct path for source /path/to/virtualenv/bin/activate line, or to remove that line altogether.

Installing additional python modules

You can of course add extra modules with pip. There are three modules in particular that might be desirable when working with IDEAL:

  • ipython: a python command line program, which can be useful for debugging (e.g., query DICOM files using the pydicom module)

  • Sphinx: enables you to generate these docs yourself (cd docs; make html).

  • PyQt5: enables running the sokrates.py GUI. It’s a bit clunky, but some users like it.

In a fresh shell, cd to the IDEAL install directory and then run:

source ./venv/bin/activate
pip install ipython Sphinx PyQt5
deactivate

Alternatively, in a shell in which you already ran source bin/IDEAL_env.sh, you can directory run pip install ipython Sphinx PyQt5.

Set up the data directories

Like the virtual environment, this installation step may be automated in an installation step in the next release. IDEAL needs a couple of folder to store logging, temporary data and output, respectively. In a bash shell after a cd to the IDEAL top directory, do:

mkdir data
mkdir data/logging
mkdir data/workdir
mkdir data/output
mkdir data/MyClinicCommissioningData

The subdirectories of data are described in more detail below.

logging

The logging directory is where all the debugging level output will be stored. In case something goes wrong, these logging files may help to investigate what went wrong. When you report issues to the developers, it can be useful to attach the log file(s).

workdir

The workdir directory will contain a subfolder for every time you use IDEAL to perform a dose calculation. The unique name of each subfolder is composed of the user’s initials, the name and/or label of the plan and a time stamp of when you submitted the job. The subfolder will contain all data to run the GATE simulations, preprocessing the input data and postprocessing the output data:

  • The GATE directory with the mac scripts and data needed to run the simulations.

  • Temporary output, saved every few minutes, from all condor subjobs running this simulation.

  • Files that are used or generated by HTCondor for managing all the jobs.

  • Three more IDEAL-specific log files, namely: preprocessor.log, postprocessor.log and job_control_daemon.log.

The temporary data can take up a lot of space, typically a few dozen GiB, depending on the number of voxels in CT (after cropping it to a minimal bounding box containing the “External” ROI and the TPS dose distribution) and on the number of cores in your cluster. After a successful run, the temporary data is archived in compressed form, for debugging analysis in case errors happened or if there are questions about the final result.

Note

When an IDEAL job runs unsuccessfully, the temporary data is NOT automatically compressed/archived, since the user may want to investigate. Do not forget to delete or compress these data after the investigation has concluded, to avoid inadvertently filling up the disk too quickly.

After compressed archiving job work directory still still occupies up to a few GiB per plan, which will add up when running IDEAL routinely for many plans [3].

output

The output directory will contain a subfolder of each IDEAL job, using the same naming scheme as for the work directories. In IDEAL’s system configuration file the user (with admin/commissioning role) can define which output will actually be saved, e.g. physical and/or effective dose, DICOM and/or MHD. This output directory serves to store the original output of the IDEAL job. If the path of a second output directory is given in the system configuration file, then the job output subfolder will be copied to that second location (e.g. on a CIFS file share, where it can be accessed by users on Windows devices).

Commissioning Data Directory

In the example MyClinic could be replaced by the name of your particle therapy clinic. If you are a researcher who studies plans from multiple different clinics, may want to create a commissioning data directory for each clinic.

This directory will contain the commissioning data for your particle therapy clinic. The details are laid out in the commissionig chapter.

Footnotes