Walker, there is no pathway, the pathway comes while you go

                                         - Antonio Machado

Tutorial: More Advanced Simulations

The Data Synthesis tutorial introduces the Camino simulation capabilities and shows how to run some simple simulations, but the simulation is capable of a lot more than this. This tutorial describes some of its more advanced features and how to simulations that use them. In particular, the simulation allows the diffusion environment to be modelled in quite complicated ways, these environments are called "substrates" and in this tutorial we'll be covering how to run several different substrates such as

Crossing fibres
Cylinders with gamma-distributed radii
Mesh-based substrates

A substrate is envisaged to sit inside a single voxel, with spins diffusing across it. The boundaries of the voxel are ususally periodic so that the substrate defines an environment made up of an infinite, 3D array of whatever you specify. The measurement model in the simulation does not capture the trade-off between voxel size and SNR and hence simulation "voxels" can be quite a bit smaller than those in actual scans. The simulation is, and has always been, intended as a tool to simulate signals due to sub-voxel structure, rather than large spatially-extended structures.

Running simulations and common options

Simulations are all run via the datasynth command, and the substrate is specified using the -substrate switch. Different substrates require different parameters to be specified, and hence different additional switches will be required which will be discussed in the next section. Certain options, however, are common to all (or nearly all) simulations and we will discuss them as follows. -walkers Specifies how many spins to use in the simulations. This is an important consideration. More spins equates to greater accuracy, but increases the amount of time a simulation takes to run. Simple simulations require around 10000 spins but simulations on more complex substrates require more spins to capture the additional complexity, particularly at higher b-value. For the simulations covered in this tutorial we recomment AT LEAST 100000 spins. More may well be required.
-tmax Specifies the number of updates in the simulation. This can be though of as defining the temporal resolution of the simualtion. The total duration of a simulation's dynamics (i.e. the period of time that is simulated, not how long it takes to run!) is defined by its schemefile. This duration will be split into a number of updates of equal length, dt. This is an important number, since together with the diffusivity it defines the length of steps in each spins random walk. It also defines how long a simulation will take to run. More updates equates to longer execution. Investigations have shown that simple substrates require about 1000 updates, although very complex substrates and very high b-values may need more (5000 or more).

-diffusivity Specifies the free diffusivity for pins in the simulation in SI units. The default value is 2.0E-9m^s/s but this can be changed as needed. All physical quantities in Camino (including b-value) use SI units (seconds, meters, Teslas) so it may be neccessary to convert to SI before specifying.

-p Specifies the probability that a spin will step through a barrier unaffected instead of interacting with it. This is usually set to 0, making barriers impermeable.

-voxels Specifies the number of voxels of data to synthesise. It is important to specify this to be at least 1. By default, a simulation will run once and then fill remaining voxels with different noise-realisations of the synthesised data. To run a spearate simulation in each voxel, use the -separateruns option.

-initial SPecifies the initial conditions of spins by a keyword. uniform distributes spins uniformly (but randomly) across the substrate. delta locates them all at the smae point in the centre of the substarte, intra distributes them uniformly in the intracellular space and extra does the same but in the extracellular space. Note that some substrates do not have a reliable definition of intracellular and extracellular, and in this case the use of intra and extra will lead to unexpected results and is deprecated.

-seed Specifies the seed for the random number generators to use. Being a Monte-Carlo simulation, the simulation makes extensive use of random numbers when running. Specifying different seeds will cause different sequences of random numbers to be generated and hence different trajectories of spins to be developed as (slightly) different synthetic measurements to be synthesised. A seed can be any integer, although it is better to use large numbers (8 or 9 digits) as more complex seeds give lower correlations in random number sequences.

Half a Command

Putting this together, we can go a long way towards running a simulation. In fact, we can specify everything in a simulation except the substrate.
datasynth -walkers 100000 -tmax 1000 -p 0.0 -voxels 1 -schemefile myscheme.scheme -initial uniform

Specifies a simulation with 100000 spins, 1000 updtaes, zero permeability and spins initially uniformly distributed across the substrate using myscheme.scheme to specify the pulse sequence.

In fact, although it usually the best thing to do, it is not always appropriate to specify a scheme file. The next section describes running simulations with and without scheme files.

Scheme files and simulations

As previously mentioed, a scheme file defines a pulse sequence used by the simulation to generate synthetic data. There are several different formats of scheme file which describe different kinds of pulse sequences at different levels of complexity. The simulation can make use of any of them, although in the case of very simple scheme files, assumptions will be made about the specifics of the sequence.
-schemefile Specifies the scheme file to use for data synthesis. The scheme file describes the pulse sequence used to generate diffusion-weighted data. These specify gradients, pulse durations, orientations, separations, etc and are no different to those used by other parts of Camino.

The simulation needs to run at least one set of dynamics to generate a set of synthetic measurements. Simulations are complex and can take anything from minutes to days depending on complexity and hardware. The longer a schemefile is, the more memory will be required by the simulation, since each spin stores a phase shift for each entry in the scheme file. For this reason, it may not be advisable to put hundreds of measurements into a single scheme file but rather split them across different simulations.

Trajectories files

In some applications, however, we might want to synthesise many different sets of measurements from the same simulation. For example, I might want to compare the sensitivity of different acquisitions to a partticular parameter, or to investigate the behaviour of the diffusion signal as a function of one or more scan parameter. In this case, it may not be appropriate or even feasible to run a separate simulation for every protocol used and for this reason it is possible to separate the simulation dynamics from the data synthesis phase.

This is done using an intermediate file called a trajectories (or traj) file. A traj file contains the complete trajectories of all spins in a given simulation and may be parsed into a set of measurents at a later date. In order to run a simulation in this way, use the -duration switch along with -trajfile

-duration Specifies the duration of the dynamics (in seconds) if no scheme file is used. You can find out what this should be by using the longest acquisition in your scheme file. The simulation should be long enough to run at least from the onset of the first gradient block to the end of the last.
-trajfile SPecifies the name of the trajfile generated.
Beware, however, that trajfiles are very large. A typical traj file will run to several or tens of Gigabytes, so a considerable amount of disk space will be required.

trajfiles are parsed into data using a separate command called scan
scan -trajfile mytraj.traj -schemefile myscheme.scheme > scan outpt.bfloat

This command reads the trajectories file specified and synthesise diffusion weighted data using myscheme.scheme. Due to the size of the files involved this will take at least several seconds and may take several minutes for larger files.

Different Substrates

At its heart, the diffusion simulation is a model of diffusion in an environment that restricts the motion of spins. Data are generated by simulating the action of a pulse sequence on the spins, and hence data is a function of spin trajectories, which are themselves strongly influenced by the environment. The structure contained in the environemtn is called the "substrate" and it is important to choose a substrate that is appropriate for the study you are interested in. New substrates are being added to the simulation all the time, but this tutorial covers the most useful and flecible types provided by the simulation.

Since diffusion MRI has historically been concerned with imaging white matter, many of the substrates supported by the simulation are designed to model white matter structure and are hence composed of cylinders. However, more recently the limitations of cylinders have started to become important and so a more general form of substrate was implemented that deascribes the environment with arbitrary triangle meshes, making it possible to simulate diffusion in extremely detailed tissue environments.

Empty substrates

The absolute simplest simulation possible is in an empty environment without any restricting structures at all. This is called an Empty substrate, and it's the simulation's equivalent of putting a bottle of water in the scanner, i.e. it's a model of free diffusion. Empty substrates can be surprisingly useful for testing purposes and are also very fast to run because of the lack of intersection checking. To run and empty substrate use:
datasynth -walkers 100000 -tmax 1000 -voxels 1 -p 0.0 -schemefile my.scheme -initial uniform -substrate empty > freeDiffusion.bfloat

the diffusivity can be controlled by adding a -diffusivity option. remembering to specify the diffusivity in SI units.

Regular-packed cylinders

The simplest non-trivial simulation that is supported is an environment containing regularly packed cylinders of constant radius. In this situation, the environment is composed of an infinite array of cylinders parallel to the z-axis. There are two ways of arranging cylinders in this way: square-packing, in which cylinders are placed at the intersections on a square grid, and hexagonal-packing in which they are placed on the intersections of a triangular grid.

From a data-synthesis perspective there is not a great deal of difference between the two of these packings but it is worth noting that a hexagonal packing is more effiecient at filling space than square pavking. The maximum intracellular volume fraction from non-overlapping square packed cylinders is around 71% whereas from hexagonal packing it is 92%.

The simulation supports both of these configurations, they are specified by the -packing switch followed by either square or hex. Cylinders in the simulation do not necessarily have to be abutting (i.e. close-packed, so that their edges touch), although this is certainly possible. The size and spacing of cylinders is controlled by the -cylinderrad and -cylindersep options respectively. -cylinderrad specifies the RADIUS of all cylinders and -cylindersep the separation between cylinders. The cylindersep variable is defined as the distance between the central axes of the cylinders and hence must be at least twice the cylinder radius. Separations closer than this will cause cylinders to overlap and intersect and may cause unexpected results. Both distances must be specified in meters (NOT microns etc).

An example command to run a simulation with a regular-packed cylinder substrate is:
datasynth -walkers 100000 -tmax 1000 -voxels 1 -p 0.0 -schemefile my.scheme -initial uniform -substrate cylinder -packing hex -cylinderrad 1E-6 -cylindersep 2.1E-6 > regcylinders.bfloat

This will run a simulation with 100000 spins, 1000 updates and scan parameters specified by my.scheme on a hexagonally packed regular cylinder substrate with impermeable cylinders of radius 1E-6m and separation 2.1E-6m and send the results to the file regcylinders.bfloat. Changing hex to square will change the packing configuration accordingly.

Crossing Cylinders

A situation that is often of interest in diffusion MR research is where we have more than one principle fibre direction. The simulation is able to model crossing fibres with a specified crossing angle. This substrate contains two populations of fibres in interleaved planes. One population is parallel to the z-axis and another is rotated about the y-axis by a given angle with respect to the first.

Cylinders on this substrate are arranged in parallel in the xz-plane in parallel layers one cylinder thick. I.e. a plane of cylinders parallel to the z-axis, with a rotated with respect the first, then another parallel z-axis and so on. Cylinders are all of a constant radius.

An example command to use here is
datasynth -walkers 100000 -tmax 1000 -voxels 1 -p 0.0 -schemefile my.scheme -initial uniform -substrate crossing -crossangle 0.7854 -cylinderrad 1E-6 -cylindersep 2.1E-6 > crossingcyls45.bfloat

Here we've specified a crossing substrate. The crossing angle is specified in radians (NOT degrees) using the -crossangle 0.7854. This is approximately pi/4, or 45 degrees. The crossing angle can take on any value, just make sure you use radians!

Irregularly Packed, Distributed Radius Cylinders

Useful as they are, regularly packed cylinders of constant radius do not capture much of the compleity of white matter tissue, and in fact we can do quite a bit better. In this section we describe a more realistic type of substrate. Parallel cylinders with gamma-distributed radius and not symmetry in their arrangement. For brevity we refer to this substrate as gamma-cylinders. For historical reasons, it is referred to on the commandline as an "imflammation" substrate (see Hall & Alexander IEEE TMI 2009).

The gamma cylinder substrate contains a specified number of cylinders whose radii are drawn from a gamma distribution. This distribution is specified by two parameters, which may also be given on the command line. Cylinders are placed randomly in a square region of a given size such that no two overlap and that the edges of the square are periodic: a cylinder that overlaps an edge is repeated on the oppiste side of the square.

An example command for simulating diffusion with gamma distributed cylinders is
datasynth -walkers 100000 -tmax 1000 -voxels 1 -p 0.0 -schemefile my.scheme -initial uniform -substrate inflammation -increments 1 -gamma 1.84037 7.8E-7 -substratesize 3.5E-5 -numcylinders 100 > gammacyls.bfloat

Here we've specified an inflammation substrate that contains 100 cylinders which will be placed in a region 35x35um. Becasue there is no simple way to predict the intracellular volume fraction from the number of cylinders, substrate size and distribution parameters, some experimentation may be necessary to obtain realistic volume fractions. The intracellular volume fraction of the substrate is reported in the runtime log (and on the console output). Too small a substrate size will mean that the simulation will not be able to fit all cylinders in the space, and hence distort the dynamics. Too large a substrate size will mean that there is too much space between cylinders and hence too low an intracellular volume fraction.

A more complex substrate also brings other considerations. Firstly, it may be necessary to model a larger number of spins in order to probe its complexity. Here we have used 100000, but for more demanding simulations including high b-value measurements it may be neccessary to increase this substantially. It is not uncommon to use 500000 spins and 2000 or more timesteps in order to obtain good convergence of results. Furthermore, a disordered substrate means that in order to eliminate bias in simulations it is necessary to run several simulations on different realisations of the same substrate. This means different sets of cylinders with the same statistical properties arranged in different ways in space. Fortunately, this is simple to do. The seed for the random number generator can be changed using the -seed option followed by any large integer. Any integer will do, but large (8-9 digits) numbers will complicated binary representations will work better than simple ones. Changing the seed will change the arrangement of cylinders and also the trajectories executed by spins during the simulation. Averaging over these is called an ensemble average, and is good practice when using disordered substrates.

Mesh substrates

The most flexible and realistic type of substrate currently supported by the simulation is the Mesh Substrate. Mesh substrates model the environment as a collection of triangles and as such can model almost any shape in three dimensions. Mesh substrates have been successfully used to simulate diffusion in environments constructed from micrograph images (see Panagiotaki et al 2010) and to explore structures whose radius varies along their length.

Mesh substrates are constructed using a PLY file. This is a simple file format used for 3D geometry. They are simple to construct, and the format is described here. is an example of the format as parsed by the simulation. In common with everything else in Camino, all distances in the PLY file used will be assumed to be in meters.

A basic example command for running a simulation on a mesh substrate is
datasynth -walkers 100000 -tmax 1000 -voxels 1 -p 0.0 -schemefile my.scheme -initial uniform -substrate mesh -plyfile mysubstrate.ply > meshsubstrate.bfloat

Here we're running the usual 100000 spins and 1000 updates to generate a single voxel of data using the my.scheme scheme file with spins initially uniformly distributed across the substrate. The substrate here is constructed of all the triangles in the PLY file. Spins diffuse around in the space containing the mesh and any spin that leaves the region occupied by the substrate will re-enter on the opposite side. This means that spins whose trajectories cross a periodic boundary experience a discontinuity in their environment and to avoid biases in the signal caused by these discontinuities, we minimise the contribution from boundary-crossing spins by only generating data from the central portion of the voxel. This is a cube-within-a-cube which shares its centre with the substrate. The size of this central region is by default 0.75 times the length of each side of the substrate. This proportion can be changed by using the -voxelsizefrac followed by a number between zero and 1. The larger this region the more spins will be contained in the central region, but the more the signal will be effected by spins that have crossed the boundaries of the substrate. The default value has provided a good compromise in previous work. For some substrates (such as those with periodic boundaries) it may make sense to generate data from the entire substrate. In this case use -voxelsizefrac 1.0.

Because meshes are very complicated, simulations with meshes can be very computationally demanding. Previous work (see Panagiotaki et al MICCAI 2010) has employed meshes with as many as 200000 triangles, simulating 80000 spins and 1250 updates required around 24hours on a 3.2GHz dual-core linux PC. It is important to run with a large number of spins and updates in order to achieve unbiased results with good convergence.

Other Options

This section describe some other miscellanious features of the simulation that may be useful.

Different pulse sequences

First of all, it's worth mentioning that the simulation can work with a variety of scheme file formats. In addition to the standard version 1 scheme file that describes standard PGSE acquisitions, the simulation can work with twice refocussed spin-echo (TRSE)sequences using version 3 scheme files and completely general gradient waveforms using gradient waveform schemes (see tutorial on general waveform scheme files which can express, for example, double wave-vector or oscillating gradients.). A general waveform scheme can be used to describe arbitrary, 3 dimensional pulse shapes and so are able to describe almost any conceivable sequence. The simulation will work seamlessly with any of these schemes, simply specify a scheme file of the appropriate format on the command line.

Permeability

In all commands given previously, membrane permeability was set to zero via the -p 0.0 switch. This specifies a probability that when a spin encounters a barrier, it will step through it rather than being reflected. Specifying a finite probability will cause membranes to become more permeable. Care should be taken witht his probability since in combination with the simulation time increment dt = duration/tmax it defines a rate of spins through membranes. Changing dt will therefore change this rate, and if dt is changed, the membrane permeability must also be altered to account for this.

Stats files

In addition to generating synthetic MRI signals, the simulation is capable of generating statistics directly from the dynamics. These are generated in stats files. To generate a stats file, just specify its name from the command line using -statsfile myStatsFile.dat along with the rest of your simulation command. By default, a stats file will contain the mean squared-displacement of spins in the x, y and z directions for all time points in the simulation. This allows, for example, the time-dependent instrinsic diffusivity to be calculated. Stats files are binary files, formatted with each entry as a double precision floating point number (Java type double) as follows:
<duration of simulation (dt*tmax)>
<Number of spins>
<Number of updates in simulation (also a double)>
<time1>
<MSDx>
<MSDy>
<MSDz>
<time2>
<MSDx>
<MSDy>
<MSDz>
...

Framework exists in the code for implementing other type of stats files with different type of data in them, although there is currently no command line mechanism for changing statsfile type.

That about covers things. Have fun in Monte-Carlo.

AdvSimTutorial