Solution Number: 1096
Title: COMSOL and Multithreading
Platform: All Platforms
Versions: All versions
Categories: Solver
Keywords:

Problem Description

This solution describes how COMSOL takes advantage of multicore computers.

Solution

COMSOL supports two mutual modes of parallel operation: shared-memory parallelism and distributed-memory parallelism (cluster support). Shared-memory parallelism is supported with all COMSOL license types, while distributed-memory parallelism requires a floating network license. Using shared-memory parallelism is possible to utilize all CPU sockets on a computer, but for computers with multiple sockets, it can sometimes be advantageous with a floating network license to utilize the computer's full capacity; for further information, please see Hybrid Computing: Advantages of Shared and Distributed Memory Combined. This solution is dedicated to shared-memory parallel operations. For distributed-memory parallel operations, see Solution 1001.

Shared-memory processing, or multithreading is important for the performance of COMSOL computations. Some terms that are frequently used in when describing multithreading are

  • Core: A physical processor core used in shared-memory parallelism by a computational node with multiple processors.
  • Speedup: is how many times faster a job runs on N cores compared to 1 core, on a specific compute node. The speedup depends both on the problem type, the hardware used, and hardware drivers used.

Windows

On Windows platforms, the default number of processor cores used by COMSOL is the total number of available physical cores. For example, if you have a 2 x dual core machine, 4 cores will be used in parallel by a COMSOL Multiphysics process by default.

If you want COMSOL to leave out one or more processor cores you can manually set the number of cores used for a job, you can change the default behavior by starting the COMSOL Desktop and setting the Number of processors option on the Multicore and Cluster Computing section in the Preferences menu.

Alternatively, create a new shortcut on your Desktop to the COMSOL executable and modify it to set the desired number of threads.

  1. Create a new shortcut on the Desktop.
  2. Right-click the shortcut and select Properties.
  3. Change the Target field to
    "C:\Program Files\COMSOL\COMSOL50\Multiphysics\bin\win64\comsol.exe" -np 2
    if you want COMSOL to use only 2 cores.

Mac OS X

On Mac OS X, controlling the number of processor cores used by COMSOL is only possible when launching COMSOL from the Terminal. The default behavior is to use all available physical processor cores for the COMSOL Multiphysics application. You can find how many processor cores you have in the System Profiler application, or by using the command sysctl hw.ncpu. You can override the default behavior by using the command line switches. For example, start by the command
/Applications/COMSOL50/Multiphysics/bin/comsol -np 2
.

Linux

The number of cores available to a COMSOL process in parallel can be displayed on some systems by the command
more /proc/cpuinfo | grep proc

Note that if you have hyperthreading activated you need to divide the cores count reported by the above command by relevant hyperthreading factor (2) to get the physical core count. COMSOL does not benefit from hyperthreading; if COMSOL is started with more threads than there are physical CPU cores, performance will decrease.

On Linux the default behavior is to use all available physical cores for the COMSOL Multiphysics application. You can override the default behavior by using the command line switches. For example, start by the command comsol -np 2.

Hyperthreading

COMSOL does currently not benefit from hyperthreading. This means that by default, COMSOL will use as many threads as there are physical CPU cores on the system. The result is that if hyperthreading is active, the Windows Task Manager will show at most 50% CPU utilization (for the COMSOL process) when COMSOL is running. This is expected when hyperthreading is activated. Turning hyperthreading off will not increase COMSOL performance; we recommend that hyperthreading is used if available in order to get reasonable performance for other applications when COMSOL is running.

The -mpmode option

The values "turnaround" and "throughput" for -mpmode correlate directly with the OpenMP runtime settings for the KMP_LIBRARY environment variable. The -mpmode option overwrites the system settings (if KMP_LIBRARY is not set). For more information on the turnaround and throughput modes, please see the section on "Execution modes" on https://software.intel.com/en-us/node/522689.​ All options use KMP_BLOCKTIME = 200 by default. turnaround is also the default, when -mpmode is not set at all. The "serial" mode is not used by COMSOL. The third value that COMSOL lists for -mpmode is "owner". The owner option is similar to turnaround, the difference is that owner also specifies a thread affinity that is optimized for the number of sockets on the computer, so owner is more aggressive than turnaround.

Troubleshooting

My new server has 48 cores, however speedup is poor when increasing the number of threads beyond a certain number. What gives?

  1. Problem size matters for speedup. Speedup for very large models (like several million degrees of freedom) is better. If you use very small models, speedup will be limited when using many cores. In addition, the maximum possible speedup is limited by the non-parallel fraction of the algorithms. This limit is described by Amdahl's law.
  2. If you are using the MUMPS direct solver, switch to the PARDISO direct solver in COMSOL. It provides better shared-memory speedup for hight number of cores than MUMPS.
  3. Start with the MKL library, it may work better than the default ACML library if you use many cores on AMD machines:
    "C:\Program Files\COMSOL\COMSOL50\Multiphysics\bin\win64\comsol.exe" -blas mkl
  4. For a system with 16 memory channels, 16 is the maximum attainable speedup for a linear system solver even if 48 cores (or more) are available. In practice, depending on the physics and solver settings, actual maximum speedup is typically in the 8-13 range, regardless of the number of memory channels or CPU cores available on the computer.

See Also

See also Selecting hardware (solution 866).

See also Running COMSOL on clusters (solution 1001).


Disclaimer

COMSOL makes every reasonable effort to verify the information you view on this page. Resources and documents are provided for your information only, and COMSOL makes no explicit or implied claims to their validity. COMSOL does not assume any legal liability for the accuracy of the data disclosed. Any trademarks referenced in this document are the property of their respective owners. Consult your product manuals for complete trademark details.