C++ Examples
There are many ways to implement a given calculation. This document contains examples of how to apply Enzyme to different kinds of patterns that show up frequently in C++. To make the code snippets below more concise, we’ll assume that each example snippet has the following definitions prepended to the translation unit:
int enzyme_dup;
int enzyme_dupnoneed;
int enzyme_out;
int enzyme_const;
template < typename return_type, typename ... T >
return_type __enzyme_fwddiff(void*, T ... );
template < typename return_type, typename ... T >
return_type __enzyme_autodiff(void*, T ... );
Free Functions: Return-By-Value ¶
Consider a function with one scalar input and one scalar output, \(f(x) \coloneqq x^2.\) We can implement this in C++ as a free function:
double f(double x) { return x * x; }
The derivative of this function is simply \(f'(x) = 2x\), but rather than implement that by hand, let’s see how do it with automatically with Enzyme:
double x = 5.0;
double dx = 1.0;
double df_dx = __enzyme_fwddiff<double>((void*)f, enzyme_dup, x, dx);
printf("f(x) = %f, f'(x) = %f", f(x), df_dx);
// prints f(x) = 25.000000, f'(x) = 10.000000
The first argument tells enzyme which function we want to differentiate, and the subsequent arguments describe where to evaluate the derivatve and in what “direction”.
Link to example: https://fwd.gymni.ch/Hx6hwt
Next, we’ll look at a function with two inputs and one output, $$ f(x, y) \coloneqq x \, y + \frac{1}{y} $$
The “derivative” (i.e. the Jacobian) of this function is a row vector with two entries: $$ \bold{J} = \left[ \begin{array}{cc} \displaystyle \frac{\partial f}{\partial x} & \displaystyle \frac{\partial f}{\partial y} \end{array} \right] $$ There are two fundamental operations that involve the Jacobian:
- “forward mode” differentiation: compute \(df := \bold{J} \cdot d\bold{x}\), (where \(d\bold{x} = [dx \,\, dy]^\top\))
- “reverse mode” differentiation: compute \(\boldsymbol{\mu} := \bold{J}^\top \lambda\)
There’s lots more to say here about when to use forward mode or reverse mode, but that’s beyond the scope of this document, so we’ll just focus on how to use Enzyme to perform these two kinds of differentiation for us.
Like before, start by implementing the original function
double f(double x, double y) { return x * y + 1.0 / y; }
Forward Mode ¶
__enzyme_fwddiff
implements forward-mode differentiation, and we use it by specifying where to evaluate \(\bold{J}\), and what values to use for \(d\bold{x}\)
double x = 3.0;
double y = 2.0;
double dx = 1.0;
double dy = 1.0;
double df = __enzyme_fwddiff<double>((void*)f, enzyme_dup, x, dx, enzyme_dup, y, dy);
printf("f(x) = %f, df = %f", f(x, y), df)
// prints f(x) = 6.500000, df = 4.750000
But what if I only want to (forward) differentiate with respect to some of the inputs?
There are two approaches:
-
Set
dx
ordy
to zeroes as needed for unwanted derivativesdouble dfdx = __enzyme_fwddiff<double>((void*)f, enzyme_dup, x, 1.0, enzyme_dup, y, 0.0); double dfdy = __enzyme_fwddiff<double>((void*)f, enzyme_dup, x, 0.0, enzyme_dup, y, 1.0); printf("dfdx = %f, dfdy = %f\n", dfdx, dfdy); // prints dfdx = 2.000000, dfdy = 2.750000
-
Specify
enzyme_const
to indicate which arguments are not to be differentiateddouble dfdx = __enzyme_fwddiff<double>((void*)f, enzyme_dup, x, 1.0, enzyme_const, y); double dfdy = __enzyme_fwddiff<double>((void*)f, enzyme_const, x, enzyme_dup, y, 1.0); printf("dfdx = %f, dfdy = %f\n", dfdx, dfdy); // prints dfdx = 2.000000, dfdy = 2.750000
Option 1 has the benefit of flexibility– we can choose to turn differentiation with respect to certain variables on or off at runtime. Option 2 hard codes which derivatives can be computed, which narrows scope and potentially improves performance. The appropriate choice will depend on the specific needs of your project.
Link to example: https://fwd.gymni.ch/OJMdAx
Reverse Mode ¶
__enzyme_autodiff
implements reverse-mode differentiation. We tell enzyme which function to differentiate, and pass information about where to evaluate the Jacobian. The enzyme_out
specifier indicates which components (i.e. rows of the vector-jacobian product \(\bold{J}^\top \lambda\)) we want
struct double2{ double x, y; };
...
double x = 3.0;
double y = 2.0;
auto [mu_x, mu_y] = __enzyme_autodiff<double2>((void*)f, enzyme_out, x, enzyme_out, y);
printf("mu_x = %f, mu_y = %f\n", mu_x, mu_y);
// prints mu_x = 2.000000, mu_y = 2.750000
The value returned by __enzyme_autodiff
is the concatenation of the different enzyme_out
quantities. Here, we define a struct, double2
to represent those output values (and use C++17 structured binding to split them into mu_x, mu_y
).
Note: it may be tempting to store the outputs in a
std::tuple< double, double >
, but the memory layout ofstd::tuple
is implementation defined (e.g. some compilers implementstd::tuple<T, U, V>
withV
first,U
second, andT
last). So, please don’t store the outputs of__enzyme_autodiff
in astd::tuple
!
Link to example: https://fwd.gymni.ch/HMFTsY
Free Functions: Return-By-Reference ¶
Another possible way to implement the previous function is with an out-parameter:
void f(double x, double y, double & output) { output = x * y + 1.0 / y; }
Differentiating functions like this with enzyme is similar to the return-by-value case, but with some small differences.
If we only care about the derivative output, and not the function value itself, we can use the enzyme_dupnoneed
descriptor. This lets the compiler optimize away unnecessary calculations associated with evaluating the output.
Forward Mode ¶
// these will be overwritten by __enzyme_fwddiff
double z = 0;
double dz = 0;
#if 1
__enzyme_fwddiff<void>((void*)f, enzyme_dup, x, dx,
enzyme_dup, y, dy,
enzyme_dup, &z, &dz);
printf("f(x,y) = %f, df = %f\n", z, dz);
// prints f(x,y) = 6.500000, df = 7.500000
#else
__enzyme_fwddiff<void>((void*)f, enzyme_dup, x, dx,
enzyme_dup, y, dy,
enzyme_dupnoneed, &z, &dz);
printf("f(x,y) = %f, df = %f\n", z, dz);
// prints f(x,y) = 0.000000, df = 7.500000
#endif
Note: the by-reference arguments of the function are passed to __enzyme_fwddiff
by address.
Link to example: https://fwd.gymni.ch/BtpzAA
Reverse Mode ¶
vector-Jacobian product \(\boldsymbol{\mu} := \bold{J}^\top \lambda\) is implemented as
// lambda is an input to __enzyme_autodiff
double lambda = 2.0;
#if 1
double2 mu = __enzyme_autodiff<double2>((void*)f, enzyme_out, x,
enzyme_out, y,
enzyme_dup, &z, &lambda);
printf("z = %f, mu.x = %f, mu.y = %f\n", z, mu.x, mu.y);
// prints z = 6.500000, mu.x = 4.000000, mu.y = 5.500000
#else
double2 mu = __enzyme_autodiff<double2>((void*)f, enzyme_out, x,
enzyme_out, y,
enzyme_dupnoneed, &z, &lambda);
printf("z = %f, mu.x = %f, mu.y = %f\n", z, mu.x, mu.y);
// prints z = 0.000000, mu.x = 4.000000, mu.y = 5.500000
#endif
Link to example: https://fwd.gymni.ch/eYmIzw
Function Templates ¶
Function templates are treated much the same way as regular functions, except we need to explicitly include the template arguments when passing the function to enzyme. For example, if we had the following definition:
template < typename T >
void f(T x, T y, T & output) { output = x * y + 1.0 / y; }
Then the first argument looks like
__enzyme_fwddiff<void>((void*)f<double>, enzyme_dup, x, dx,
enzyme_dup, y, dy,
enzyme_dupnoneed, &z, &dz);
Link to example: https://fwd.gymni.ch/WJVVRt
Member Functions ¶
Differentiating member functions with Enzyme is a little bit trickier, since a member
function in C++ is effectively a function with an implicitly passed argument (the this
pointer).
This means that if we have an object with a member function
struct MyObject {
double f(double y) { return x * y + 1.0 / y; }
double x;
};
we can’t pass &MyObject::f
directly to __enzyme_fwddiff(...)
. Instead, what
we can do is write a free function that calls the desired member function:
double wrapper(MyObject obj, double y) {
return obj.f(y);
}
and pass that to enzyme:
double dfdy = __enzyme_fwddiff<double>((void*)wrapper, enzyme_const, obj,
enzyme_dup, y, dy);
printf("dfdy = %f\n", dfdy);
// prints dfdy = 5.500000
Note: if the member function belongs to a class or struct with no member variables, it may be optimized away at the call site, which disrupts the argument ordering and leads to an error. See: https://github.com/EnzymeAD/Enzyme/issues/1388#issuecomment-1728114457 for more information on a workaround.
A more general implementation of the wrapper function (that works with different objects and argument types) is given below:
template < typename T, typename ... arg_types >
auto wrapper(T obj, arg_types && ... args) {
return obj.f(args ... );
}
...
double dfdy = __enzyme_fwddiff<double>((void*)wrapper<MyObject, double>,
enzyme_const, obj,
enzyme_dup, &y, &dy);
printf("dfdy = %f\n", dfdy);
// prints dfdy = 5.500000
Question: why do
y
anddy
now have&
in front when being passed to__enzyme_fwddiff
?
When passing information to __enzyme_autodiff
, __enzyme_fwddiff
:
- if the differentiated function takes an argument by value, then we pass it to enzyme by value
- if the differentiated function takes an argument by pointer, reference or rvalue-ref, then we pass it to enzyme by pointer
Link to example: https://fwd.gymni.ch/y0scib
If we want to differentiate with respect to the data members of the object (x
in this case), we can annotate the object argument as enzyme_dup
:
MyObject dobj{2.0};
double dfdx = __enzyme_fwddiff<double>((void*)wrapper<MyObject, double>,
enzyme_dup, obj, dobj
enzyme_const, &y);
printf("dfdx = %f\n", dfdx);
Link to example: https://fwd.gymni.ch/wlC87F
Functors and Lambda Functions ¶
Functors are C++ objects that implement an operator()
member, so they can be invoked like functions. Our previous example could have been implemented in the following way instead
struct MyObject {
double operator()(double y) const { return x * y + 1.0 / y; }
double x;
};
double wrapper(const MyObject & f, double y) { return f(y); }
...
double dfdy = __enzyme_fwddiff<double>((void*)(wrapper<MyObject, double>),
enzyme_const, (void*)&obj,
enzyme_dup, &y, &dy);
printf("dfdy = %f\n", dfdy);
// prints dfdy = 0.750000
We can see that it is handled in the same way as regular member functions (passed to enzyme through a wrapper class).
Lambda functions are another common C++ idiom for expressing function definitions in-line:
auto f = [](double x, double y) { return x * y + 1.0 / y; };
Behind the scenes, the compiler expands the definitions of the lambdas above into functor objects
// what the compiler "sees" when we type
// auto f = [](double x, double y) { return x * y + 1.0 / y; };
class __lambda_f {
public:
inline /*constexpr */ double operator()(double x, double y) const {
return (x * y) + (1.0 / y);
}
using retType_f = double (*)(double, double);
inline constexpr operator retType_f () const noexcept {
return __invoke;
};
private:
static inline /*constexpr */ double __invoke(double x, double y) {
return __lambda_f{}.operator()(x, y);
}
};
__lambda_f f = __lambda_f{};
This means there are a few ways to differentiate this kind of lambda function:
-
Passing directly to Enzyme
double dfdx = __enzyme_fwddiff<double>((void*)+f, enzyme_dup, x, dx, enzyme_const, y); printf("dfdx = %f\n", dfdx); // dfdx = 6.200000
Note: since
f
doesn’t capture anything, it can implicitly convert to a function pointer, which Enzyme can handle directly. The weird+f
in the first argument is a rarely-used unary operator that makesf
convert to a function pointer. -
Using the member function wrapper from above
double dfdy = __enzyme_fwddiff<double>((void*)(wrapper<decltype(f), double, double>), enzyme_const, (void*)&f, enzyme_const, &x, enzyme_dup, &y, &dy); printf("dfdy = %f\n", dfdy); // dfdy = 0.750000
Note: the lambda is explicitly cast to
(void*)
to suppress a compilation error. This error arises because, regularly, C++ has no way to specialize anextern
template like__enzyme_fwddiff(...)
using a type in a local scope (i.e. the lambda function), but Enzyme does not have this limitation. -
Using a similar wrapper with non-type template parameter (requires C++20)
// C++20 or later template < auto obj, typename ... arg_types > auto wrapper(arg_types && ... args) { return obj(args ... ); } ... double dfdy = __enzyme_fwddiff<double>((void*)(wrapper<f, double, double>), enzyme_const, &x, enzyme_dup, &y, &dy); printf("dfdy = %f\n", dfdy); // dfdy = 0.750000
Link to examples: https://fwd.gymni.ch/wkgeoL
Instead, if a lambda function captures variables from its surrounding scope
double x = 2.0;
auto f = [x](double y) { return x * y + 1.0 / y; };
then the compiler-generated functor object for the lambda expression is slightly different:
// what the compiler "sees" when we type
// double x = 2.0;
// auto f = [x](double y) { return x * y + 1.0 / y; };
class __lambda_f {
public:
inline /*constexpr */ double operator()(double y) const {
return (x * y) + (1.0 / y);
}
__lambda_f(double & _x) : x{_x}{}
private:
double x;
};
double x = 2.0;
__lambda_f f = __lambda_f{x};
An important difference is that by capturing, the lambda no longer generates the static __invoke()
method or the implicit conversion to a function pointer, which means option #1 above (passing directly to Enzyme) will not work.