Warning
This section is a work in progress, documenting the most recent code generation framework, which is in the package brian.experimental.codegen2.
To generate code, we start with a basic statement or set of statements we want to evaluate for all neurons, or for all synapses, and then apply various transformations to generate code that will do this. We start from a structured, language-invariant representation of the set of basic statements. We then ‘resolve’ the unknown symbols in it. This is done recursively, the resolution of each symbol can add vectorised statements or loops to the current representation, and add data to a namespace that will be associated to the final code. Symbols will be things like a NeuronGroup state variable, or a synaptic weight value. The output of this process is a new, more complicated structured representation, including things like loops if necessary. Next, we convert this structured representation into a code string. Finally, this code string is JIT-compiled into an executable object.
You can use Brian’s equations format to generate C/C++ code for a numerical integration step, for example:
eqs = '''
dv/dt = (ge+gi-(v+49*mV))/(20*ms) : volt
dge/dt = -ge/(5*ms) : volt
dgi/dt = -gi/(10*ms) : volt
'''
code, vars, params = make_c_integrator(eqs, method=euler, dt=0.1*ms)
print code
has output:
double _temp_v = 50.0*ge + 50.0*gi - 50.0*v - 2.45;
double _temp_ge = -200.0*ge;
double _temp_gi = -100.0*gi;
v += _temp_v*0.0001;
ge += _temp_ge*0.0001;
gi += _temp_gi*0.0001;
See the documentation for the function make_c_integrator().
The basic way to use the code generation module is as follows:
This process is very clearly illustrated in the source code for CodeGenStateUpdater.
The following are the main elements of the code generation package:
We start with a worked example. Consider the statement:
V = V+1
Here V is a NeuronGroup state variable. We wish to transform this into code that can be executed. In the case of Python, the output would look like:
_neuron_index = slice(None)
V = _arr_V[_neuron_index]
_arr_V[_neuron_index] = V+1
The symbol _arr_V would be added directly to the namespace.
In the case of C++ it would look like:
for(int _neuron_index=0; _neuron_index<_len__arr_V; _neuron_index++)
{
double &V = _arr_V[_neuron_index];
V = V+1;
}
Here the symbols _arr_V and _len__arr_V` would be added to the namespace. The reason for these complicated names is to do with making the code as generic as possible, not introducing namespace clashes (symbols starting with _ are reserved), etc.
The way the process works is we start with the statement V=V+1 and a Symbol object with name V, specifically a NeuronGroupStateVariableSymbol. The statement V=V+1 depends on V with both a Read and Write dependency. We therefore have to ‘resolve’ the symbol V. To do this we call the method resolve() on V.
In the case of Python, this gives us:
V = _arr_V[_neuron_index]
_arr_V[_neuron_index] = V+1
It adds _arr_V to the namespace, and creates a dependency on _neuron_index. The reason that V=V+1 is translated to _arr_V[_neuron_index] = V+1 is that on the left hand side we have a write variable, and on the right hand side we have a read variable. In Python, when vectorising, we have no choice but to give the underlying array with its slice when writing to an array. However, at this point the code generation framework doesn’t know what _neuron_index will be, so it could be, for example, an array of indices. In this case, suppose we did V*V it would be more efficient to compute V=_arr_V[_neuron_index] and then compute V*V than to compute _arr_V[_neuron_index]*_arr_V[_neuron_index], and in the case where _neuron_index=slice(None) it is no slower, so we always do this.
In the case of C++, the first resolution step gives us:
double &V = _arr_V[_neuron_index];
V = V+1;
For the second resolution step, we need to resolve _neuron_index, which is a symbol of type SliceIndex, telling us that _neuron_index varies over all neurons. Note that we could also have _neuron_index being an ArrayIndex, for examples spikes, and then this could be used for a reset operation (we would iterate only over those indices of neurons which had spiked). Here though, we iterate over all neurons. In Python, calling the resolve() method of _neuron_index gives us:
_neuron_index = slice(None)
V = _arr_V[_neuron_index]
_arr_V[_neuron_index] = V+1
and in C++:
for(int _neuron_index=0; _neuron_index<_len__arr_V; _neuron_index++)
{
double &V = _arr_V[_neuron_index];
V = V+1;
}
In both cases, the _neuron_index symbol is resolved and the process is complete.
Note that we have actually mixed two stages here, the stage of generating a structured representation of the code using CodeItem objects, and the stage of generating code strings using CodeItem.convert_to(). In fact, the converting of, for example, V to _arr_V[_neuron_index] only happens at the second stage.
The first stage, acting on the structured representation of nested CodeItem objects is resolved using the function resolve(). This calls Symbol.resolve() for each of the symbols in turn. The resolution order is determined by an optimal efficiency algorithm, see the reference documentation for resolve() for the full algorithm description.
Symbol.resolve() can do an arbitrary transformation of the input CodeItem, but typically it will either do something like:
load()
item
save()
Or something like:
for name in array:
item
See the reference documentation for Symbol.resolve(), and the documentation for the most important symbols:
This step is relatively straightforward, each CodeItem object has its convert_to method called iteratively. The important one is in MathematicalStatement, where the left hand side usage is replaced by Symbol.write() and the right hand side usage is replaced by Symbol.read(). In addition, at this stage the syntax of mathematical statements is corrected, e.g. Python’s x**y is replaced by C++’s pow(x,y) using sympy.
The four objects used for code generation in Brian are:
An integration scheme is generated from an Equations object using the make_integration_step() function. See reference documentation for that function for details.
This is carried out by CodeGenStateUpdater, which can be used as a Brian brian.StateUpdater object in brian.NeuronGroup.
As an example, for Euler integration, the differential equations:
dx/dt = expr
are separated by Equations into variable x with expression expr. This then becomes:
_temp_x := expr
x += _temp_x*dt
This can then be resolved by the code generation mechanisms described already.
TODO: synaptic propagation, including docstrings and code comments
NOTE: GPU functionality not included for synaptic propagation yet.
Warning
GPU code is highly transitional, many details may change in the future.
GPU code is handled by five classes:
For more details, see the reference documentation for the classes in the order above.
Note that CodeGenConnection is the only code generation version of a Brian class which is not GPU enabled at present.
To extend code generation, you will probably need to add new Symbol classes. Read the documentation for this class to start, and the documentation for the most important symbols:
See also CodeItem, particularly the process described in CodeItem.generate().
The overall structure of the classes in the code generation package are included below, for reference.
Contains a list of CodeItem objects which are considered to be executed in serial order. The list is passed as arguments to the init method, so if you want to pass a list you can initialise as:
block = Block(*items)
Helper class used as the base for various control structures such as for loops, if statements. These are typically not language-invariant and should only be output in the resolution process by symbols (which know the language they are resolving to). Consists of strings start and end, a list of contents (as for Block), and explicit sets of dependencies and resolved (these are self-dependencies/resolved). The output code consists of the start string, the indented converted contents, and then the end string. For example, for a C for loop, we would have start='for(...){ and end='}'.
Simply a base class, does nothing.
A for loop in Python, the structure is:
for var in container:
content
Where var and container are strings, and content is a CodeItem or list of items.
Dependencies can be given explicitly, or by default they are Read(x) for each word x in container. Resolved can be given explicitly, or by default it is set(var).
A for loop in C, the structure is:
for(spec)
{
content
}
You specify a string var which is the variable the loop is iterating over, and a string spec should be of the form 'int i=0; i<n; i++'. The content is a CodeItem or list of items. The dependencies and resolved sets can be given explicitly, or by default they are extracted, respectively, from the set of words in spec, and set([var]).
Just a base class.
If statement in Python, structure is:
if cond:
content
Dependencies can be specified explicitly, or are automatically extracted as the words in string cond, and resolved can be specified explicitly or by default is set().
If statement in C, structure is:
if(cond)
{
content
}
Dependencies can be specified explicitly, or are automatically extracted as the words in string cond, and resolved can be specified explicitly or by default is set().
An item of code, can be anything from a single statement corresponding to a single line of code, right up to a block with nested loops, etc.
Should define the following attributes (default values are provided):
This structure of having default implementations allows several routes to derive a class from here, e.g.:
Returns a string representation of the code for this item in the given language. From the user point of view, you should call generate(), but in developing new CodeItem derived classes you need to implement this. The default behaviour is simply to concatenate the strings returned by the subitems.
Returns a Code object. The method resolves the symbols using resolve(), converts to a string with convert_to() and then converts that to a Code object with Language.code_object().
The basic Code object used for all Python/C/GPU code generation.
The Code object has the following attributes:
Each language (e.g. PythonCode) extends some or all of the methods:
It will usually not be necessary to override the call mechanism:
Base class for Read and Write dependencies.
A dependency marks that a CodeItem depends on a given symbol. Each dependency has a name.
Used to indicate a read dependency, i.e. the value of the symbol is read.
Used to indicate a write dependency, i.e. the value of the symbol is written to.
Returns the set of names of the variables which are either read to or written to in a set of dependencies.
Returns a frozen version of inputcode with equations and namespace.
Replaces each occurrence in inputcode of a variable name in the namespace ns with its value if it is of int or float type. Variables with names in brian.Equations eqs are not replaced, and neither are dt or t.
Returns a frozen set of equations.
Each expression defining an equation is frozen as in freeze_with_equations().
A mathematical expression such as x*y+z.
Has an attribute dependencies which is Read(var) for all words var in expr.
Has a method convert_to() defined the same way as CodeItem.convert_to().
Converts expression into a string for the given language using the given set of symbols. Replaces each Symbol appearing in the expression with sym.read(), and if the language is C++ or GPU then uses sympy.CCodePrinter().doprint() to convert the syntax, e.g. x**y becomes pow(x,y).
Applies a dict of word substitutions.
The dict substitutions consists of pairs (word, rep) where each word word appearing in expr is replaced by rep. Here a ‘word’ means anything matching the regexp \bword\b.
Returns a docstring with the indentation removed according to the Python standard
split=True returns the output as a list of lines
Changing numtabs adds a custom indentation afterwards
Indents a given string or list of lines
split=True returns the output as a list of lines
Return all the identifiers in a given string expr, that is everything that matches a programming language variable like expression, which is here implemented as the regexp \b[A-Za-z_][A-Za-z0-9_]*\b.
Removes all empty lines from the multi-line string s.
Warning
GPU code is highly transitional, many details may change in the future.
Generates final kernel source code and used to launch kernels.
Used in conjunction with GPUManager. Each kernel is prepared with prepare() which generates source code and adds symbols to the GPUSymbolMemoryManager. The GPUManager compiles the code and sets the gpu_func attribute, and the kernel can then be called via run().
The initialisation method extracts variable _gpu_vector_index from the namespace and stores it as attribute index, and _gpu_vector_slice as the pair (start, end).
Generates kernel source code and adds symbols to memory manager.
We extract the number of GPU indices from the namespace, _num_gpu_indices.
We loop through the namespace, and for each value determine it to be either an array or a single value. If it is an array, then we place it in the GPUSymbolMemoryManager, otherwise we add it to the list of arguments provided to the function call. This allows scalar variables like t to be transmitted to the kernel function in its arguments.
We then generate a kernel of the following (Python template) form:
__global__ void {name}({funcargs})
{{
int {vector_index} = blockIdx.x * blockDim.x + threadIdx.x;
if(({vector_index}<({start}))||({vector_index}>=({end})))
return;
{code_str}
}}
We also compute the block size and grid size using the user provided maximum block size.
Calls the pycuda GPU function prepare() method for low-overhead function calls.
Calls the function on the GPU, extracting the scalar variables in the argument list from the namespace.
This object controls everything on the GPU.
It uses a GPUKernel object for managing kernels, and a GPUSymbolMemoryManager object for managing symbol memory.
The class is used by:
Memory is mirrored on GPU and CPU. In the present implementation, in the development phase only, each call to run() will copy all symbols from CPU to GPU before running the GPU kernel, and afterwards copy all symbols from GPU back to CPU. In the future, this will be disabled and symbol memory copies will be handled explicitly by calls to methods copy_to_device() and copy_to_host().
Adds a kernel with the given name, code and namespace. Creates a GPUKernel object.
Proxy to GPUSymbolMemoryManager.add_symbols().
Compiles code using pycuda.compiler.SourceModule and extracts kernel functions with pycuda.compiler.SourceModule.get_function(). The GPUKernel.gpu_func attribute is set for each kernel.
Proxy to GPUSymbolMemoryManager.copy_to_device().
Proxy to GPUSymbolMemoryManager.copy_to_host().
Combines kernel source into one source file, and adds memory management kernel functions. These simple kernels simply copy a pointer to a previously specified name. This is necessary because when pycuda is used to allocate memory, it doesn’t give it a name only a pointer, and the kernel functions use a named array.
Copies allocated memory pointers to named global memory pointer variables so that kernels can use them. The kernel names to do this are in the GPUSymbolMemoryManager.symbol_upload_funcnames dict (keys are symbol names), and the allocated pointers are in the GPUSymbolMemoryManager.device dict.
Not used at present. Will be used to combine multiple kernels with the same vectorisation index for efficiency.
Compiles code and initialises memory.
Performs the following steps:
Runs the named kernel. Calls GPUKernel.run(). Note that all symbols are copied to and from the GPU before and after the kernel run, although this is only for the development phase and will change later.
Manages symbol memory on the GPU.
Stores an attribute device and host which are dicts, with keys the symbol names, and values pycuda.gpuarray.GPUArray and numpy.ndarray respectively. Add symbols with add_symbols(), which will allocate memory.
Adds a collection of symbols.
Each item in items is of the form (symname, hostarr, devname) where symname is the symbol name, hostarr is the numpy.ndarray containing the data, and devname is the name the array pointer should have on the device.
Allocates memory on the device, and copies data to the GPU.
Copy the memory in the numpy.ndarray for symname to the allocated device memory. If symname==True, do this for all symbols. You can also pass a list for symname.
As for copy_to_device() but copies memory from device to host.
Generates declarations for array pointer names on the device, and kernels to copy device pointers to the array pointers. General form is:
__device__ {dtypestr} *{name};
__global__ void set_array_{name}({dtypestr} *_{name})
{
{name} = _{name};
}
Stores the kernel function names in attribute symbol_upload_funcnames (dict with keys being symbol names).
Returns a string with declarations and kernels combined.
The list of symbol names managed.
Code object for GPU.
For the user, works as the same as any other Code object. Behind the scenes, source code is passed to the GPUManager gpu_man from the GPULanguage object, via GPUManager.add_kernel(). Compilation is handled by GPUManager.prepare(), and running code by GPUManager.run().
Simply calls GPUManager.prepare().
Simply runs the kernel via GPUManager.run().
Language object for GPU.
Has an attribute gpu_man, the GPUManager object responsible for allocating, copying memory, etc. One is created if you do not specify one.
Utility class for defining numerical integration scheme
Initialise with a set of equations eqs. You can now iterate over this object in two ways, firstly over all the differential equations:
for var, expr in eqscontainer:
yield f(expr)
Or over just the differential equations with nonzero expressions (i.e. not including dx/dt=0 for parameters):
for var, expr in eqscontainer.nonzero:
yield f(expr)
Here var is the name of the symbol, and expr is a string, the right hand side of the differential equation dvar/dt=expr.
Also has attributes:
Return an integration step from a method and a set of equations.
The method should be a function method(eqs) which receives a EquationsContainer object as its argument, and yield s statements. For example, the euler() integration step is defined as:
def euler(eqs):
for var, expr in eqs.nonzero:
yield '_temp_{var} := {expr}'.format(var=var, expr=expr)
for var, expr in eqs.nonzero:
yield '{var} += _temp_{var}*dt'.format(var=var, expr=expr)
Euler integration
2nd order Runge-Kutta integration
Exponential-Euler integration
Base class for languages, each should provide a name attribute, and a method code_object().
Python language.
alias of PythonCode
Gives C/C++ format code for the integration step of a differential equation.
Returns a triple (code, vars, params):
Resolves symbols in item in the optimal order.
The first stage of this algorithm is to construct a dependency graph on the symbols.
The optimal order is resolve loops as late as possible. We actually construct the inverse of the resolution order, which is the intuitive order (i.e. if the first thing we do is loop over a variable, then that variable is the last symbol we resolve).
We start by finding the set of symbols which have no dependencies. The graph is acyclic so this always possible. Then, among those candidates, if possible we choose loopless symbols first (this corresponds to executing loops as late as possible). With this symbol removed from the graph we repeat until all symbols are placed in order.
We then resolve in reverse order (because we start with the inner loop and add code outwards). At the beginning of this stage, vectorisable is set to True. But after we encounter the first multi-valued symbol we set vectorisable to False (we can only vectorise one loop, and it has to be the innermost one). This vectorisation is used by both Python and GPU but not C++. Each resolution step calls CodeItem.resolve() on the output of the previous stage.
Just a base class, supposed to indicate single line statements.
A language-specific single line of code, which should only be used in the resolution step by a Symbol which knows the language it is resolving to. The string code and the set of dependencies and resolved have to be given explicitly.
Define a variable from an array and an index in C.
For example:
double &V = __arr_V[neuron_index];
Initialisation arguments are:
A single line mathematical statement.
The structure is var op expr.
If op==':=' then this statement will resolve var, otherwise it will add a Write dependency for var. The other dependencies come from expr.
When converting to a code string, the following takes place:
Generate a list of statements from a user-defined string.
The rule for definition inference is that you scan through the lines, and a set of already defined symbols is maintained (starting from eqs and defined if provided), and an = op is changed to := if the name on the LHS is not already in the set of symbols already defined.
Gives the C language specifier for numpy data types. For example, numpy.int32 maps to int32_t in C.
Perhaps this method is given somewhere in numpy, but I couldn’t find it.
State updater using code generation, supports Python, C++, GPU.
Initialised with:
Creates a Block from the equations and the method, gets a set of Symbol objects from get_neuron_group_symbols(), and defines the symbol _neuron_index as a SliceIndex. Then calls CodeItem.generate() to get the Code object.
Inserts t and dt into the namespace, and _num_neurons and _num_gpu_indices in case they are needed.
Base class for all symbols.
Every symbol has attributes name and language which should be a string and Language object respectively. The symbol class should define some or all of the methods below.
Returns the set of dependencies of this symbol, can be overridden.
Called by resolve(), can be overridden to perform more complicated loading code. By default, returns an empty Block.
Should return True if this symbol is considered to have multiple values, for example if you are iterating over an array like so:
for(int i=0; i<n; i++)
{
double &x = arr[i];
...
}
Here the symbol x is single-valued and depends on the symbol i which is multi-valued and whose resolution required a loop. By default returns False unless the class has an attribute multiple_values in which case this is returned.
The string that should be used when this symbol is read, by default just the symbol name.
Should return True if the resolution of this symbol will require a loop. The resolve() function uses this to optimise the symbol resolution order.
Creates a modified item in which the symbol has been resolved.
For example, if we started from the expression:
x += 1
and we wanted to produce the following C++ code:
for(int i=0; i<n; i++)
{
double &x = __arr_x[i];
x += 1;
}
we would need to take the expression x+=1 and embed it inside a loop.
Function arguments:
The default implementation first calls update_namespace(), then creates a new Block consisting of the value returned by load(), the item, and the value returned by save(). Finally, this symbol’s name is added to the resolved set for this block.
Called by resolve(), can be overridden to perform more complicated saving code. By default, returns an empty Block.
Returns True if the language specified at initialisation is supported. By default, checks if the language name is in the class attribute supported_languages (list), however can be overridden.
Called by resolve(), can be overridden to modify the namespace, e.g. adding data.
The string that should be used when this symbol is written, by default just the symbol name.
This Symbol is guaranteed by the context to be inserted into the namespace at runtime and can be used without modification to the name, for example t or dt.
Returns True.
This symbol is used to specify a value taken from an array.
Schematically: name = arr[index].
Introduces a read-dependency on index and array_name.
Read-dependency on index.
Method generated by language_invariant_symbol_method().
Languages and methods follow:
Uses CDefineFromArray.
If read is false, does nothing. Otherwise, returns a CodeStatement of the form:
name = array_name[index]
Adds pair (array_name, arr) to namespace.
Method generated by language_invariant_symbol_method().
Languages and methods follow:
Returns array_name[index].
Symbol for a state variable.
Wraps ArraySymbol.
Arguments:
Multi-valued symbol that ranges over a slice.
Schematically: name = slice(start, end)
Returns True except for Python.
Method generated by language_invariant_symbol_method().
Languages and methods follow:
Returns item embedded in a C for loop.
If not vectorisable return resolve_c(). If vectorisable we mark it by adding _gpu_vector_index = name and _gpu_vector_slice = (start, end) to the namespace. The GPU code will handle this later on.
If vectorisable and all then we simply return item and add name=slice(None) to the namespace.
If vectorisable and not all then we prepend the following statement to item:
name = slice(start, end)
If not vectorisable then we add a for loop over xrange(start, end).
Multi-valued symbol giving an index that iterates through an array.
Schematically: name = array_name[array_slice]
Dependencies are collected from those arguments that are used (item, array_name, array_len, array_slice).
Returns True except for Python.
Method generated by language_invariant_symbol_method().
Languages and methods follow:
Returns a C++ for loop of the form:
for(int index_name=start; index_name<end; index_name++)
{
const int name = array_name[index_name];
...
}
If defined (start, end)=array_slice otherwise (start, end)=(0, array_len).
If not vectorisable, use resolve_c(). If vectorisable, we set the following in the namespace:
_gpu_vector_index = index_name
_gpu_vector_slice = (start, end)
Where start and end are as in resolve_c(). This marks that we want to vectorise over this index, and the GPU code will handle this later. Finally, we prepend the item with:
const int name = array_name[index_name];
If vectorisable it will prepend one of these two forms to item:
name = array_name
name = array_name[start:end]
(where (start, end) = array_slice if provided).
If not vectorisable, it will return a for loop over either array_name or array_name[start:end]`.
Helper function to create methods for Symbol classes.
Sometimes it is clearer to write a separate method for each language the Symbol supports. This function can generate a method that can take any language, and calls the desired method. For example, if you had defined two methods load_python and load_c then you would define the load method as follows:
load = language_invariant_symbol_method('load',
{'python':load_python, 'c':load_c})
The fallback gives a method to call if no language-specific method was found. A docstring can be provided to doc.
Returns a dict of NeuronGroupStateVariable from a group.
Arguments: