Wednesday, November 26, 2014

The Python API and C++

Recently, for a job interview task, I was asked to write a Python module with a C or C++ implementation to solve an otherwise simple task. Obviously, I chose C++. While I had never used the Python API before, I found that the existing information on extending Python with C quite sufficient. What surprised me, however, is how little information existed for using C++. A few libraries exist, like Boost.Python, PyCXX, and some utilities that parse C++ to create Python bindings, but I didn't find much in the way of actual information without examining the sources of these libraries.

I will not discuss much why someone would want to implement a Python module in another language (efficiency? better library support for certain tasks? preference?), but why C++? The Python API has basically no type safety--everying is a PyObject *, whether it represents a string, number, or tuple. It requires a considerable amount of boiler-plating--something we can reduce by using the C++ type system. It presents some interesting technical challenges, which are what I will focus on. I will assume some knowledge of the Python API.

Note: I will be basing this off Python 2.7. Yes, Python 3 is newer, but due to incompatibilities, not a replacement, and also not my system default. Also, I have little experience with the Python API, so do not take this article as authoritative. It represents a journal of my experiments.

I have started working on a little utility library (https://github.com/splinterofchaos/py-cxx) for personal use, but for a listing of the code for this article, see the gist: https://gist.github.com/splinterofchaos/b099149a701edfa5948f


Writing a Python Module: The Basics

First, we will want to create a Python module, which alone is rather uninteresting. For a more in-depth study, one should refer to https://docs.python.org/2/extending/extending.html.

Every module requires an init function which communicates to the interpreter what functions, types, and objects this module offers. For now, let's consider a module that counts how many time a certain function gets called.
#include <Python.h>

PyObject *count(PyObject *self, PyObject *args)
{
  static int i = 0;
  PySys_WriteStdout("%i\n", ++i);  // Just like printf.
  return PyInt_FromLong(i);
}

static PyMethodDef countMethods[] = {
  {"count", count, METH_VARARGS, "Returns the number of times called."},
  {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initcount()
{
  PyObject *m = Py_InitModule("count", countMethods);
}
See setup.py for building this example.

Here, countMethods contains the defined functions in a {name, c-function, function-type, __doc__-string} structure. count must be a PyCFunction, a function taking self (probably null) and args (argument tuple) parameters and returning an object. METH_VARARGS lets the interpreter know this is a regular function--other types of functions do exist, but more on that later.

The PyMODINIT_FUNC macro tells Python that, obviously, this function initializes the module. Note that even Py_InitModule() returns a regular Python object!

There are several improvements we can make. First, we could write an overloaded function, to_py_int(), that could dispatch between PyInt_FromLong(), PyInt_FromSsize_t(), and friends, but that's rather mundane so I'll be skipping it. More interesting: we can write a function to define methods.

Aside from METH_VARARGS, we can have a function be a METH_KEYWORDS which takes an additional parameter, a dictionary,  and thus not be a PyCFunction,  METH_NOARGS, which must still be a PyCFunction, and may receive a self argument, but always NULL for args, or METH_O, which has an object as self. It may be convenient to write a function that takes a pointer to a specific type instead of the generic PyObject, but by casting we lose certain safety guarantees and it can be easy to do something stupid, like writing a function will the wrong number of arguments or the wrong METH_* variant.
#include <type_traits>
template<typename R, typename...X>
constexpr int arity(R(*)(X...)) {
  return sizeof...(X);
}

template<typename R, typename...X>
constexpr bool returns_PyObject(R(*)(X...)) {
  // Result is either a PyObject, or a subclass of one.
  return std::is_convertible<R, PyObject *>::value;
}

template<typename R, typename...X>
constexpr bool is_PyCFunction(R(*)(X...)) {
  return false;
}

template<>
constexpr bool is_PyCFunction(PyCFunction) {
  return true;
}

template<typename F>
constexpr int method_type(F f) {
  return arity(f) == 3     ? METH_KEYWORDS
       : is_PyCFunction(f) ? METH_VARARGS
                           : METH_O;
}

template<typename F>
constexpr PyMethodDef method_def(const char *name, const char *doc,
                                 int type, F f)
{
  static_assert(arity(F()) == 2 || arity(F()) == 3,
                "Methods must have an arity of 2 or 3");
  static_assert(returns_PyObject(F()), "Methods must return a PyObject *.");
  return {name, (PyCFunction)f, type, doc};
}

template<typename F>
constexpr PyMethodDef method_def(const char *name, const char *doc, F f)
{
  return method_def(name, doc, method_type(f), f);
}

static PyMethodDef countMethods[] = {
  method_def("count", "Returns the number of times called.", count),
  {NULL, NULL, 0, NULL}
};
Note that in order to use static_asserts, we construct an F instead of passing f because f, as a parameter, may not be a constexpr.

Now, we can declare methods in a type-safe manor without having to specify METH_* or lose any safety. While it may be a little limiting (for example, we can't use a lambda to define the method), one can always revert to not using method_def as well.

Note: It may be safe to define a function that takes no arguments and cast it to a PyCFunction, however I don't know that this would be true across all architectures and ABI calling conventions. 

One thing lacking from this example is actually using the args parameter. For that, we will need to use PyArg_ParseTuple().


A Type-Safe PyArg_ParseTuple().

Let's use the example of finding the cross product of two vectors.
#include <Python.h>

#include "Py.h"  // includes MethodDef()

PyObject *cross(PyObject *self, PyObject *args)
{
  float a, b, c;
  float x, y, z;

  if (!PyArg_ParseTuple(args, "(fff)(fff)", &a, &b, &c, &x, &y, &z))
    return nullptr;

  float i = b*z - c*y;
  float j = c*x - a*z;
  float k = a*y - b*x;

  return Py_BuildValue("fff", i, j, k);
}

static PyMethodDef vecMethods[] = {
  MethodDef("cross", "Returns the cross product of two 3D vectors.", cross),
  {NULL, NULL, 0, NULL}
};

PyMODINIT_FUNC initvec()
{
  PyObject *m = Py_InitModule("vec", vecMethods);
}
This lets us write, in Python, cross((a,b,c), (x,y,z)). Even simple functions like this benefit from being written in statically typed languages since, in Python, when one wants to do many operations on some variables, their types must be checked every time, lest you try to add a string to an integer. Here, we do nine operations, but only check the types of the initial six arguments.

PyArg_ParseTuple() is really quite simple; you pass in args and a format string (in this case, using f for float), and pointers to the variables you want to fill. If the tuple doesn't fit the expected format, it sets an error so we can just return NULL. We do our calculation and call Py_BuildValue(), which creates a tuple when given more than one value. Unfortunately, it's very verbose and not type-safe. We can fix that, but first, we must build a format string, preferably at compile time, to pass in.

First, we can use, for convenience, a typedef of std::integer_sequence to build a list of chars.
template<char...cs>
using CharList = std::integer_sequence<char, cs...>;
Then, define mappings for PyArg_ParseTuple.
template<typename...T>
struct CharListConcat;

template<typename T>
struct CharListConcat<T> {
  using type = T;
};

template<typename...U, char...cs, char...cs2>
struct CharListConcat<CharList<cs...>, CharList<cs2...>, U...> {
  using type = typename CharListConcat<CharList<cs..., cs2...>, U...>::type;
};

template<typename...T>
using CharListConcat_t = typename CharListConcat<T...>::type;

template<> struct PTCharListOf<float> {
  using type = CharList<'f'>;
};

template<typename...Ts>
struct PTCharListOf<std::tuple<Ts...>> {
  using type = CharListConcat_t<CharList<'('>,
                                typename PTCharListOf<std::decay_t<Ts>>::type...,
                                CharList<')'>>;
};

template<typename T>
using PTCharListOf_t = typename PTCharListOf<T>::type;
Unfortunately, this strategy is a but limited--we couldn't pass in an std::vector to get the desired affect because we wouldn't know how many elements go into it. A better option would be to add a PyObject * specialization for PTCharListOf and manually check that the result is a list.

template<> struct PTCharListOf<PyObject *> {
  using type = CharList<'O'>;
};
Next, we define a type to build the format:

template<typename...Ts>
struct ParseTupleBuilder { };

template<typename CL, typename T, typename...Ts>
struct ParseTupleBuilder<CL, T, Ts...> {
  using type = ParseTupleBuilder<CharListConcat_t<CL, PTCharListOf_t<T>>,
                                 Ts...>;
  constexpr static const char *fmt = type::fmt;
};

template<char...cs>
struct ParseTupleBuilder<CharList<cs...>> {
  using type = CharList<cs...>;

  static const char fmt[sizeof...(cs) + 1];
};

template<char...cs>
const char ParseTupleBuilder<CharList<cs...>>::fmt[] = { cs..., '\0' };

template<typename...Ts>
constexpr const char *ParseTupleFormat(Ts...) {
  return ParseTupleBuilder<CharList<>, std::decay_t<Ts>...>::fmt;
}
One interesting thing: When I defined fmt inside ParseTupleBuilder, I got an error from inside Python on typing "import vec" claiming that fmt's constructor had not been defined. The Python docs warn that static global variables with constructors may not be used if Python was built with a C compiler, but defining fmt outside the struct seems to fix this.

Finally, we can start defining ParseTuple(). The strategy I chose was to build an std::tuple of arguments to send to PyArg_ParseTuple() and examine each argument in a helper function. This will require two helpers, defined below, apply_tuple() and map_tuple().
template<typename F, typename T, size_t...Is>
decltype(auto) apply_tuple(F&& f, T&& t, std::index_sequence<Is...>) {
  return std::forward<F>(f)(std::get<Is>(std::forward<T>(t))...);
}

template<typename F, typename T, size_t...Is>
decltype(auto) map_tuple(F&& f, T&& t, std::index_sequence<Is...>) {
  return std::make_tuple(std::forward<F>(f)(std::get<Is>(std::forward<T>(t)))...);
}

template<typename F, typename...Ts,
         typename Is = std::make_index_sequence<sizeof...(Ts)>>
decltype(auto) map_tuple(F&& f, std::tuple<Ts...> &t) {
  return map_tuple(std::forward<F>(f), t, Is());
}

template<typename...Bound,
         typename Indicies = std::make_index_sequence<sizeof...(Bound)>>
bool ParseTuple_impl(std::tuple<Bound...> &&bound) {
  return apply_tuple(PyArg_ParseTuple, bound, Indicies());
}

template<typename...Bound, typename Arg, typename...Args>
bool ParseTuple_impl(std::tuple<Bound...> &&bound, Arg &a, Args &...as) {
  return ParseTuple_impl(std::tuple_cat(std::move(bound), std::make_tuple(&a)),
                          as...);
}

template<typename...Bound, typename...Args>
bool ParseTuple_impl(std::tuple<Bound...> &&bound, Optional, Args &...as) {
  return ParseTuple_impl(std::move(bound), as...);
}

template<typename...Bound, typename...Ts, typename...Args>
bool ParseTuple_impl(std::tuple<Bound...> &&bound, std::tuple<Ts &...> &t,
                     Args &...as) {
  auto &&mapped = map_tuple([](auto &x) { return &x; }, t);
  return ParseTuple_impl(std::tuple_cat(bound, std::move(mapped)),
                         as...);
}

template<typename...Args>
bool ParseTuple(PyObject *args, Args &&...as) {
  return ParseTuple_impl(std::make_tuple(args, ParseTupleFormat(as...)),
                          as...);
}
Before getting back to our cross product function, we will also want a BuildTuple() function. Please excuse the repetitive nature of this code.
template<typename...Bound,
         typename Indicies = std::make_index_sequence<sizeof...(Bound)>>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound) {
  return apply_tuple(Py_BuildValue, bound, Indicies());
}

template<typename...Bound, typename Arg, typename...Args>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound, Arg a, Args ...as) {
  return BuildValue_impl(std::tuple_cat(std::move(bound), std::make_tuple(a)),
                         as...);
}

template<typename...Bound, typename...Args>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound, Optional, Args &...as) {
  return BuildValue_impl(std::move(bound), as...);
}

template<typename...Bound, typename...Ts, typename...Args>
PyObject *BuildValue_impl(std::tuple<Bound...> &&bound, std::tuple<Ts...> &t,
                          Args &...as) {
  return BuildValue_impl(std::tuple_cat(bound, std::move(t)), as...);
}

template<typename...Args>
PyObject *BuildValue(Args &...as) {
  return BuildValue_impl(std::make_tuple(ParseTupleFormat(as...)),
                          as...);
}
And finally, getting back to our cross product...
PyObject *cross(PyObject *self, PyObject *args)
{
  float a, b, c;
  float x, y, z;

  if (!ParseTuple(args, std::tie(a,b,c), std::tie(x,y,z)))
    return nullptr;

  float i = b*z - c*y;
  float j = c*x - a*z;
  float k = a*y - b*x;

  return BuildValue(i, j, k);
}
That sure was a lot of work, but it created a simple interface that's hard to use improperly.

Extending Python Types

Probably the cornerstone of extending Python itself would be to define new types that interact well with the existing Python infrastructure. For efficiency's sake, the more variables we can statically type and hold in a struct, the better. The Python docs suggest extending a type this way:
typedef struct {
    PyObject_HEAD
    ...
} MyType;
The macro, PyObject_HEAD, contains fields common to any Python Object to ensure that a casting MyType pointer to a PyObject is valid. This is a common technique for representing inheritance in C, however, we can get the same affect in C++ by using inheritance.

Also, every Python type requires an accompanying PyTypeObject, which is also a PyObject. The PyTypeObject stores lots of runtime information about a type including what function to use for allocation, to convert to a string, its methods, its base class, how to deallocate it, and more.

We can use a constructor for our extension type, but it may be wisest not to. One of the fields of the type object, tp_alloc, defines how to allocate memory for this type, including setting the reference count to one,  specifying the ob_type field (a member of PyObject), and a few other things. For example, it must work, even for classes that inherit from our custom type. It relates enough to Python's internals that I think it best be left alone, and can be left as NULL in the PyTypeObject without trouble.

More interesting would be tp_new, which must point to a function that calls tp_alloc and initializes the memory, and must be defined in order to create instances of our new type. We can define tp_new to use a placement new for objects in our type that require construction.

We can generalize that an extension of PyObject will look like this:
template<typename T>
struct Extention : PyObject
{
  static PyTypeObject type;

  T ext;

  T       &get()       & { return this->ext; }
  const T &get() const & { return this->ext; }

  T       *ptr()       & { return &this->ext; }
  const T *ptr() const & { return &this->ext; }
};
We can define a default tp_new and tp_dealloc and initialize type like so:
template<typename T,
         typename = std::enable_if_t<std::is_default_constructible<T>::value>>
newfunc default_new()
{
  return [](PyTypeObject *type, PyObject *args, PyObject *kwds)
  {
    using Self = Extention<T>;
    Self *self = (Self *) type->tp_alloc(type, 0);
    if (self)
      new (self->ptr()) T();
    return (PyObject *) self;
  };
}

template<typename T,
         typename = std::enable_if_t<!std::is_default_constructible<T>::value>>
auto default_new() {
  return [](PyTypeObject *type, PyObject *args, PyObject *kwds)
  {
    return type->tp_alloc(type, 0);
  };
}

template<typename T>
PyTypeObject Extention<T>::type = {
  PyObject_HEAD_INIT(NULL)
  0,                         // ob_size
  0,                         // tp_name
  sizeof(Extention<T>),      // tp_basicsize
  0,                         // tp_itemsize
  destructor([](PyObject *self) {
    ((Extention *) self)->get().T::~T();
    self->ob_type->tp_free(self);
  }),
  0,                         // tp_print
  0,                         // tp_getattr
  0,                         // tp_setattr
  0,                         // tp_compare
  0,                         // tp_repr
  0,                         // tp_as_number
  0,                         // tp_as_sequence
  0,                         // tp_as_mapping
  0,                         // tp_hash 
  0,                         // tp_call
  0,                         // tp_str
  0,                         // tp_getattro
  0,                         // tp_setattro
  0,                         // tp_as_buffer
  Py_TPFLAGS_DEFAULT,        // tp_flags
  0,                         // tp_doc 
  0,                         // tp_traverse 
  0,                         // tp_clear 
  0,                         // tp_richcompare 
  0,                         // tp_weaklistoffset 
  0,                         // tp_iter 
  0,                         // tp_iternext 
  0,                         // tp_methods 
  0,                         // tp_members 
  0,                         // tp_getset 
  0,                         // tp_base 
  0,                         // tp_dict 
  0,                         // tp_descr_get 
  0,                         // tp_descr_set 
  0,                         // tp_dictoffset 
  0,                         // tp_init 
  0,                         // tp_alloc 
  default_new<T>(),          // tp_new
};
PyTypeObject does have a few more fields, but the compiler sets them to 0 for us. We do, however, have to set tp_basicsize in order for the right amount of memory to be allocated. Since a type in C++ may not be default-constructible, default_new() may return a function that does not construct the object; this must be done in tp_init.

Now, returning to the cross product example, consider this:
struct Vec {
  float x, y, z;
};

using PyVec = Extention<Vec>;

int init_vec(PyVec *self, PyObject *args, PyObject *)
{
  Vec &v = self->get();
  if (!ParseTuple(args, v.x, v.y, v.z))
    return -1;
  return 0;
}

PyObject *vec_str(PyVec *self)
{
  return PyString_FromString(("<"  + std::to_string(self->get().x) +
                              ", " + std::to_string(self->get().y) +
                              ", " + std::to_string(self->get().z) +
                              ">").c_str());
}

PyMODINIT_FUNC initvec()
{
  PyVec::type.tp_name = "vec.Vec";
  PyVec::type.tp_init = (initproc) init_vec;
  PyVec::type.tp_repr = PyVec::type.tp_str = (reprfunc) vec_str;
  if (PyType_Ready(&PyVec::type) < 0)
    return;

  PyObject *m = Py_InitModule("vec", vecMethods);
  if (!m)
    return;

  Py_INCREF(&PyVec::type);
  PyModule_AddObject(m, "Vec", (PyObject *) &PyVec::type);
}
Note that tp_repr is used to display the result of evaluating an expression, and tp_str is used for printing. tp_init is used to construct our value and relates to Vec.__init__() in Python. PyType_Ready() finalizes the type and fills in some of the missing tp_* fields. We add the type to the module as a global object and increment its reference count so Python doesn't try to destruct it. For brevity, I decided not to include functions to check the type safety of the initproc and reprfunc casts.

Since Vec is default constructible, we only need to worry about assigning the members in the init function.

And now, cross looks like this:
PyObject *cross(PyObject *self, PyObject *args)
{
  PyObject *o1, *o2;
  if (!ParseTuple(args, o1, o2))
    return nullptr;

  // Ensure o1 and 2 are the right types.
  if (!PyType_IsSubtype(o1->ob_type, &PyVec::type) ||
      !PyType_IsSubtype(o2->ob_type, &PyVec::type))
    return nullptr;
  
  Vec &v = ((PyVec *) o1)->get(), &w = ((PyVec *) o2)->get();
  float i = v.y*w.z - v.z*w.y;
  float j = v.z*w.x - v.x*w.z;
  float k = v.x*w.y - v.y*w.x;

  PyObject *ret = PyVec::type.tp_new(&PyVec::type, nullptr, nullptr);

  PyObject *val = BuildValue(i, j, k);
  init_vec((PyVec *) ret, val, nullptr);
  Py_DECREF(val);

  return ret;
}


Conclusions

Despite this being quite a long article, it has only touched the surface of how the Python API can be extended. There are many restrictions and it certainly puts a cramp on C++'s style, but the moral of this story is that just because you need to work with a C API doesn't mean you can't use modern C++ techniques.