For the large 274 node commissioning runs, the following settings were used:
module load gnu_comp/9.3.0module load openmpi/4.1.1module load gsl/2.4
with the settings:
export UCX_TLS=self,sm,udexport UCX_UD_MLX5_RX_QUEUE_LEN=16384
If writing restart files, it is also important to get striping correct, depending on whether you are using /snap8 (ideally), or /cosma8. This can make a huge difference to runtimes. Please ask if you are unsure.