This page provides access to a GPU accelerated version of the widely used Object Oriented Micromagnetic Framework (OOMMF). The GPU acceleration leads to up to 70x GPU-CPU speed-up with the latest GPUs. GPU OOMMF was developed in collaboration with Dr. Michael J. Donahue, National Institute of Standards and Technology (NIST).

The GPU implementation is such that most of the user-related OOMMF components are unchanged and only the lower-level modules are modified. This allows OOMMF users to run their models as before but at a much increased speed. The current implementation supports two running modes: if no GPU modules are loaded, the code runs purely on CPU as normal CPU OOMMF; otherwise, the heavy computational work is accelerated by GPU while the light-weight front-end work runs on CPU.

The implementation is open-souce and is available for download. Please do cite the following article if you use GPU-accelerated OOMMF in your research.

S. Fu, W. Cui, M. Hu, R. Chang, M.J. Donahue, V. Lomakin, "Finite Difference Micromagnetic Solvers with Object Oriented Micromagnetic framework (OOMMF) on Graphics Processing Units," IEEE Transactions on Magnetics, vol. 52, no. 4, pp. 1-9, 2016.

Download

Installation

1. Install Nvidia CUDA

2. Download the GPU OOMMF package, unwrap it.

3. Control Panel -> System and Security -> System -> Advanced System Settings -> Advanced -> Environment Variables, add CUDA_HOME = %CUDA_PATH%

4. Type the following commands in Visual Studio x64 Cross Tools Command Prompt

cd %PATH_TO_GPU_OOMMF_DIR%

tclsh oommf.tcl pimake clean

tclsh oommf.tcl pimake

1. Install Nvidia CUDA

2. Download the GPU OOMMF package, unwrap it.

3. Type the following commands in a terminal. (The TCL/TK version depends on the version installed in your system.)

export OOMMF_TCL_INCLUDE_DIR=/usr/include/tcl8.6

export OOMMF_TK_INCLUDE_DIR=/usr/include/tcl8.6

export OOMMF_TK_CONFIG=/usr/lib/x86_64-linux-gnu/tk8.6/tkConfig.sh

export OOMMF_TCL_CONFIG=/usr/lib/x86_64-linux-gnu/tcl8.6/tclConfig.sh

export CUDA_HOME=<PATH_TO_CUDA_DIR>

cd <PATH_TO_GPU_OOMMF_DIR>

tclsh oommf.tcl pimake clean

tclsh oommf.tcl pimake

Examples

The GPU OOMMF modules can be activated in the same way as normal CPU OOMMF. The only necessary changes are the class names in the MIF input files. For example, replacing the energy and time evolving specification blocks with the following code in the MuMag standard problem 3 input mif file will launch the GPU blocks.

A complete mif file for GPU OOMMF can be downloaded here. More examples can be found in the GPU OOMMF package.


# Uniaxial anistropy computed by GPU.

Specify GPU_UniaxialAnisotropy_New [subst {

K1 $K1

axis {0 0 1}

}]


# Exchange computed by GPU

Specify Oxs_GPU_UniformExchange_New [subst {

A $A

}]


# Demag computed by GPU

Specify GPU_Demag {}


# Zeeman field computed by GPU

# Unnecessary for standard problem 3, however it is added

# here to show how to activate the Zeeman field on GPU

Specify GPU_FixedZeeman [subst {

multiplier [expr {1/(10000*$mu0)}]

field {0 0 0}

}]


# Time Evolver on GPU

Specify GPU_EulerEvolve {

alpha 0.6

}


# Time Driver on GPU

Specify Oxs_GPU_TimeDriver [subst {

basename prob3

vector_field_output_format {text %.7g}

scalar_output_format %.15g

evolver GPU_EulerEvolve

mesh :mesh

stopping_time 0.5e-9

stopping_dm_dt 0.01

Ms {$Ms}

m0 [list $m0]

}]

Available GPU Modules

Modules In Preparation

Benchmarks

Test problem: cubic magnetic particle

Test environment: Windows (64 bit), Nvidia GTX690 (one GPU device with 2GB memory used; the speed for the latest Titan X is more than 2x higher), Intel Xeon E5-1650@3.2Hz

Metric: CPU Wall Time/Field Evaluation in [ms]

Problem Scale (#Cells) GPU (single prec.) GPU (double prec.) CPU 1-thread (double prec.) CPU 6-thread (double prec.)
4K (163) 0.91 (1.8x) 1.18 (1.4x) 1.65 0.71 (2.3x)
32K (323) 1.59 (8.8x) 2.99 (4.7x) 14.00 5.49 (2.5x)
256K (643) 5.64 (25.0x) 17.58 (8.0x) 140.7 51.56 (2.7x)
2M (1283) 42.75 (29.0x) 148.1 (8.4x) 1242.9 470.1 (2.6x)
4M (128x128x256) 102.2 (34.8x) N/A 3556.4 993.4 (3.6x)
Problem Scale (#Cells) GPU (single prec.) GPU (double prec.) CPU 1-thread (double prec.) CPU 6-thread (double prec.)
4K (163) 0.84 (2.7x) 0.84 (2.7x) 2.24 0.51 (4.4x)
32K (323) 0.94 (19.4x) 2.36 (7.8x) 18.29 3.90 (4.7x)
256K (643) 5.23 (34.3x) 17.35 (10.4x) 179.54 39.29 (4.6x)
2M (1283) 46.63 (31.0x) 153.20 (9.5x) 1450.9 324.9 (4.5x)
4M (128x128x256) 94.58 (31.0x) N/A 3040.9 680.3 (4.5x)