Friday, August 7, 2020

OpenGL tutorials can do better than raw pointer math.

 If you've ever followed an OpenGL tutorial, you probably saw code like this:

  float verts[] = {
    // Position   TexCoords
    -X, -Y, Z,    0.0f, 1.0f,
     X, -Y, Z,    1.0f, 1.0f,
     X,  Y, Z,    1.0f, 0.0f,
    -X,  Y, Z,    0.0f, 0.0f

  GLuint vbo = glGenBuffer();
  glBindBuffer(GL_ARRAY_BUFFER, vbo);
  glBufferData(GL_ARRAY_BUFFER, sizeof(verts), verts, GL_STATIC_DRAW);

So far not... terrible. The verts array may contain additional color information or random other things, but it'll basically look like this. Later, we have to tell the graphics card to actually draw this data, informing our GLSL shaders how to read it:

  GLint vertex_pos_attrib = glGetUniformLocation("vertex_pos");
  glVertexAttribPointer(vertex_pos_attrib, 3, GL_FLOAT,
                        GL_FALSE, 5 * sizeof(float), (void*)0); // A
  GLint tex_coord_attrib = glGetUniformLocation("tex_coord");
  glVertexAttribPointer(tex_coord_attrib, 2, GL_FLOAT, 
                        GL_FALSE, 5 * sizeof(float), (void*)(3 * sizeof(float)));

and this is where these tutorials seem to be dropping the ball. The call I marked with A is sending information to the graphics card that the GLSL variable, vertex_pos, should be filled with two floats, or 3 elements of data, a stride of 5 * sizeof(float) bytes between vertices, and 0 bytes offset from the beginning of the vertices array buffer. The next call passes in nearly identical information, but 2 elements and 3 * sizeof(float) bytes from the beginning. The extra (void*) cast is just converting the argument to the expected type.

This code is brittle for just too many reasons.

  1. If vertex information is added or removed, all the pointer offset math has to change.
  2. The same goes for if the datatype changes, like from an int32 to int64.
  3. If one forgets the * sizeof(float) part, there could be trouble when moving to other data types as one might guess the size in bytes wrong.
  4. The math gets more complicated if types of varying sizes are used.

Just use a struct

  struct Vertex {
    GLfloat pos[3];
    GLfloat tex_coords[2];

  float verts[] = {
    // Position     TexCoords
    {{-X, -Y, Z},  {0.0f, 1.0f}},
    {{ X, -Y, Z},  {1.0f, 1.0f}},
    {{ X,  Y, Z},  {1.0f, 0.0f}},
    {{-X,  Y, Z},  {0.0f, 0.0f}}

Now our data is more organized. We can share information about the binary layout of our code with other parts of our program, write functions for generating  points, constructors, and all sorts of good stuff. We can even use other, less basic data types, in theory. My code uses glm::vec3s here, for example.

We need the corresponding glVertexAttribPointer calls to change a bit, too. I've seen a number of attempts at this, sometimes with the idea that one can take a member pointer and deference it at null in order to get the offset, then convert it back into a pointer. Or one could just use offsetof(). Finally, we can achieve something like this:

                      sizeof(Vertex::tex_coords) / sizeof(GLfloat), GL_FLOAT, 
                      (void*)offsetof(Vertex, tex_coords));

and we're pretty close to something approaching ideal. It's unfortunate that we have to know that Vertex::tex_coords is made up of GLfloats, though. It could be advisable to define a global, constexpr unsigned int D = 3, which is used both instead of hardcoding in the Vertex definition and here instead of sizeof(), but the gains are marginal since we still have to pass in GL_FLOAT as the next parameter

Still, to improve further...


This is the actual code I use:

struct GlTraits<GLfloat> {
  static constexpr auto glType = GL_FLOAT;

template<typename RealType, typename T, typename Mem>
inline void vertexAttribPointer(GLuint index, GLboolean normalized,
                                const Mem T::*mem) {
  // This is basically C's offsetof() macro generalized to member pointers.
  RealType* pointer = (RealType*)&(((T*)nullptr)->*mem);
  glVertexAttribPointer(index, sizeof(Mem) / sizeof(RealType),
                        GlTraits<RealType>::glType, normalized, sizeof(T),

Mem here is often going to be a glm::vec3 or something similar, T being my vertex class. Unfortunately, RealType needs to be passed in as an explicit type since neither Mem nor T suggest what the OpenGL type actually is. Theoretically, if this library knew what a glm::vec3 was, it could also deduce the real type from the parameter, or one could specialize GlTraits for it and include a typedef or using statement that gave the "real" type.

I use the function like this:

  vertexAttribPointer<float>(tex_coord_attrib, GL_FALSE, &Vertex::tex_coord);
  vertexAttribPointer<float>(vertex_pos_attrib, GL_FALSE, &Vertex::pos);



Many OpenGL tutorials are suited for younger, less experienced programmers and so--giving their authors some credit--writers may not want to confuse readers with language features their readers hadn't seen before, like offsetof(), but I think this does a disservice to beginners as these tutorials are also authoritative on how good, well-written OpenGL code looks. They may not want to go into the template madness that I enjoy, but even tutorials targeting C or C++ can be significantly improved just by using a data structure for vertex information. Beginners should be taught to write resilient programs that resists the urge to hardcode sizes, types, and offsets based on their specifications at the time of writing and instead rely on the compiler more.

Brittle code can punish beginners when they try to expand their knowledge as the slightest changes can cause their screen to stop rendering, or render odd artifacts and this can be difficult to debug. At worst, it will cause segfaults.

Wednesday, July 29, 2020

The Entity Component System

Note: code highlighting thanks to

Note2: This blog is primarily on template meta-programming--when the opportunity presents--using it to solve practical problems. This post assumes knowledge of C++17 fold expressions. If the reader is unfamiliar, this may not be the best learning material, but if the reader gets the idea and wants a practical example, please read on.

Update July 30 2020: I realized a bit late a bug where non-const values weren't always respect in the final solution. I haven't tracked it down yet so just be warned.

Update July31: Fixed a bug that caused an infinite loop.

When I started programming, I made a simple game called Orbital Chaos and the code was organized very simply. 

class Actor {
  // ...

class Player : public Actor {
  // ...

class Enemy : public Actor {
  // ...

So each thing that an actor could do was represented in the base class and the derived classes overrode that behavior in classic OOP style. Those were simpler days.

I decided to rewrite my code to what I now understand is a fairly industry-standard idea: The entity component system or ECS for short. From what I gather, an ECS is a simple system where entities are merely ID's with a set of components which define all relevant information important to that entity. For example, its <X,Y> coordinates in the 2D space of a 2d game, the type of an entity ({TREE, ITEM, PLAYER, etc...}), perhaps information on how to draw it... basically anything that would have gone in Actor or Player.

Proponents of ECS systems will argue for being a better alternative for OOP, not requiring the base class to consider every possible action or attribute in the game and cache efficiency--storing like information close together so that iterations over all entities hopefully involve less cache misses. Sometimes it's also argued that using ID's to relate objects to each other is much safer than direct pointers. The alternative is often std::shared_ptr with locking. Lastly, entities are easier to extend, mix, match, and define than objects, not having to worry about hierarchies, mix-ins, and multiple inheritance to get the desired behavior.

Personally, I think writing a base class that considers every possible action is not good OOP, but doing it the "right" way is probably less appealing than the data-oriented ECS style. OOP works great for abstract objects with variations on behavior, but game entities tend to vary more on configuration. I think this simplicity sometimes comes at the cost of having to reinvent mix-ins and hierarchies, but one can still put pointers and virtual in an ECS by storing pointers. 

Information on ECS's on the internet varies in quality as do their implementations. I won't try to argue that I will present an objectively better implementation than any others, these are just my ideas and the story of my process to write one.

Ideas I didn't like

class Entity {
  GraphicsComponent gfx_component;
  PhysicsComponent phys_component;
  EtcComponent etc;
  // ...

I think this is closer to the "right" OOP solution as it encapsulates the complexity into smaller pieces which can then be as simple or intricate as necessary. In practice, the segmentation of code logic and how systems relate to each other may become unclear. In a grid-based game like mine, entities will have an integer <X,Y> position on the grid which maps to a very different position when rendering to the screen which may even by offset slightly by animations. How do components refer to each other? It encourages using virtual functions to customize behavior, which is very tidy, but also involve a lot of boilerplate code and isn't incredibly efficient. Lastly, it doesn't store like data together.

There's also the idea of having the Entity store a vector of component objects and it's one of the more common implementations I see, but this has the additional issue that one class needs to consider all the complexity of the game. For example, if each component object has a init(), update(), etc() functions, they have to understand how to update themselves and interact with the world, but the world needs to understand how to contain them leading to two mutually referential system. That can be fine, but in my previous attempt, I found this really complicated the definition of more interesting components.

using ComponentId = unsigned int;

template<typename T>
struct ComponentData {
    EntityId id;
    T data;

    ComponentData(EntityId id, T data)
        : id(id), data(std::move(data)) { }
class Buffer { char* data = nullptr; // ... }; class EntityComponentSystem { std::unordered_map<ComponentId, Buffer> components; };

I think this idea gets closer. Each component type has a unique ComponentId which is determined by the type system through template magic and each Buffer holds an array of these components that is manually managed by itself since std:: classes like vector need to know what type it contains and how large it is at compile time. This std::unordered_map could be replaced by an std::array if the number of different component  types are known in advance.

Actually, if the component types themselves are known and I happen to love tuples, we could just do this:

template<typename T>
struct ComponentData {
    EntityId id;
    T data;

    ComponentData(EntityId id, T data)
        : id(id), data(std::move(data)) { }

using EntityId = unsigned int;

class EntityComponentSystem {
  std::tuple<std::vector<ComponentData<Components>>...> components;

and that's what I ended up with in my first iteration (link). One big and notable downside is that now, if I have a grid position, for which I use my custom Vec<int> class, and a graphical position, which uses a Vec<int> as well, they'll be stored as the same component. It's tedious, but I implemented GraphicalPosition and GridPosition classes to differentiate.

It also means that the declaration of the ECS type needs to contain every component used in the game, but I don't find that too problematic.

From the above idea, we could construct a simple interface like this:

  EntityComponentSystem<Pos, TextureRef> ecs;
  auto e1 = iu.write_new_entity(Pos(1, 2), TextureRef(tex1));
  auto e2 = iu.write_new_entity(Pos(3, 4), TextureRef(tex2));
  for (auto id : ecs.ids()) {
    Pos p;
    TextureRef tref;
    if (, p, tref))
      draw(p, tref);

In the end, I use a much more efficient interface, but this gives an idea about how it might be used. e1 and e2 are entity ID's that have a Pos and TextureRef component which can be read back out in order to draw them onto the screen if the entity has both components.

One note, before we move on: Why an std::vector and not a hash table keyed by the entity ID? Many entity component systems do it this way and it's great for O(1) lookups, but the code I have planned out has little need for fast random access and hash tables don't make good use of CPU caches. If I run into performance issues, I can see if a hash fixes it, but for now I'll stick to this. I think the larger concern is that a particle system might require more efficient random deletion.

If only life was so simple...

Since my entity ID's monotonically increase (omitting int32 overflow), keeping the component vectors sorted is generally simple enough. I used std::lower_bound() to figure out where to insert new elements, just in case, but generally entities don't gain or lose components over time so they're almost always inserted at the end. The component arrays are essentially parallel where std::get<std::vector<ComponentData<X>>>[i] would resolve to the i'th X component of probably the i'th entity, and ditto for Y. However, not all entities have all components so some checking might be needed.

Say I want an algorithm that does this:

for (grid_pos, graphical_pos) in ecs.read_grid_and_graphical_pos() {
   graphical_pos = grid_pos * TILE_SIZE

Well, if I can iterate through one, I can do two, right? Start at the beginning, make sure they're pointing at the same element, and in the increment step, increment one of them and then the other until it has the same EntityId or higher and keep going back and forth until either hits the end or they both have the same ID again.

Great! Simple! Problem: I have potentially N iterators and they're all of different types. I want this syntax:

int sum = 0;
for (auto [id, i, u] : ecs.read_all<int, unsigned>()) sum += i + u,

Obviously, this is non-trivial with tuples since we can't store iterators to each container in a vector, we have to store them in yet another tuple.

My first idea was to have a range class parameterized by the containers (std::vector<ComponentData<T>>...) and then for it to define nested iterator classes parameterized by the iterator types of that container. The basic issue with this is that the container could be const or non-const, but I had to be able to differentiate since in one case, the iterator class would store std::vector<T>::iterator and in the other, ::const_iterator so I ended up writing it twice. But what I landed with is so much better.

template<typename F, typename...T>
constexpr auto tuple_map(F&& f, const std::tuple<T...>& t) {
  return std::tuple(f(std::get<T>(t))...);

template<typename F, typename...T>
auto tuple_map(F&& f, std::tuple<T...>& t) {
  return std::tuple(f(std::get<T>(t))...);

// A range abstraction that allows multiple component data series to be iterated
// lazily over in a range-based for loop.
template<typename StoreTuple>
class ComponentRange {
  // A tuple of references to component storage.
StoreTuple stores_; protected: template<typename IteratorTuple> struct Iterator { // ,,, IteratorTuple its; IteratorTuple ends; bool sentinel = false; // To know if all iterators are pointing to the same entity, we'll neet // to remember what entity that is. EntityId max_id; // ... Iterator(IteratorTuple its, IteratorTuple ends) : its(std::move(its)), ends(std::move(ends)) { // ... } Iterator() { sentinel = true; } }; public: explicit ComponentRange(StoreTuple stores) : stores_(stores) { } auto begin() const { auto b = [](auto&& store) { return store.begin(); }; auto e = [](auto&& store) { return store.end(); }; return Iterator(tuple_map(b, stores_), tuple_map(e, stores_)); } auto end() const { return decltype(begin())(); } };

The general idea is to be as unconstraining of types as possible so where ever we can get away with not specifying a type, we do. StoreTuple will generally resolve to something ugly like...

std::tuple<std::vector<ComponentData<X>>&, std::vector<ComponentData<Y>>&, ...>

tuple_map and tuple_foreach (the same function, it just doesn't return a new tuple) are going to be the bread and butter of this class. For reference about how difficult these functions were to write in C++11/14, I implemented them way back in "Fun With Tuples".

If any of the its is at the end, the whole tuples is, which made comparing  to the end a bit tricky, thus a sentinel is used. I think a variation on this idea could have been to consider two Iterators equal if any of their its are equal, though.

We can't iterate through the tuples easily  so it's best we work on them uniformly, but we don't want any advancing past the max_id, which is going to be the highest ID of an entity of all iterators. Just iterating once is easy.

    template<typename It>
    bool at_end(It it) const {
      return it == std::get<It>(ends);
    // Note that the alternative to the above is
    // std::get<std::decay_t<decltype(it)>>(ends) if auto-typed (e.g. lambda).

    template<typename Iter>
    void increment_iter(Iter& it) {
      if (!at_end(it)) ++it;
      if (!at_end(it)) max_id = std::max(it->id, max_id);

This handles the case where an iterator was under max_id, but in an attempt to catch up it went over and set a new maximum.

Next, we have to handle incrementing all the iterators until they all line up on one ID.

    // A helper to catch up iterators that are behind.
    void increment_if_lower_than_max_id() {
      auto impl = [&](auto& it) {
        if (!at_end(it) && it->id < max_id) increment_iter(it);
      tuple_foreach(impl, its);

    bool any_at_end() const {
      auto any = [](auto...args) { return (args || ...); };
      auto pred = [this](const auto& it) { return at_end(it); };
      return std::apply(any, tuple_map(pred, its));

    // True if all iterators point to the same ID.
    bool all_same() const {
      auto all = [](auto...args) { return (args && ...); };
      auto equals_max_id =
        [this](const auto& it) { return it->id == max_id; };
      return std::apply(all, tuple_map(equals_max_id, its));

    // After iteration, or on initialization, increment any iterators that
    // are behind.
    void catch_up() {
      while (!any_at_end() && !all_same()) increment_if_lower_than_max_id();
if (any_at_end()) its = ends; } Iterator& operator++() { // Increment at least once. tuple_foreach([this](auto& it) { increment_iter(it); }, its); catch_up(); return *this; } Iterator operator++(int) { Iterator old = *this; ++(*this); return old; }

And finally, comparisons.

    bool operator==(const Iterator& other) const {
      return sentinel ? other == *this : any_at_end();
    bool operator!=(const Iterator& other) const {
      return !(*this == other);

    auto operator*() const {
      auto data = [](const auto& it) -> auto& { return it->data; };
      return std::tuple_cat(std::tuple(max_id), tuple_map(data, its));

The final gotcha is that on construction, not all the iterators might be pointing to the same ID, so we have to call catch_up() immediately. The full code for the constructor looks like this:

    Iterator(IteratorTuple its, IteratorTuple ends)
      : its(std::move(its)), ends(std::move(ends)) {
        max_id = EntityId{0};
            [&](auto it) {
              if (!at_end(it)) max_id = std::max(max_id, it->id);
            }, its);
        catch_up();  // Ensure a valid initial starting position.

    Iterator() { sentinel = true; }


The implementation is incomplete at the time of this writing and some of the interface is inconsistent or unnecessary, but it's a start.

And for the project in which I'm using it: (Originally written in Python; the C++ conversion is on the cpp branch at the time of this writing.)

Writing code like this used to be tedious, difficult, and involve heaps of boilerplate, but at this point, fold expressions over a variable number of types and arguments feels easier than a for loop from i=0 to N.

The biggest frustrations for functional programming in C++ remain being things like how in order to take a template function and convert it to a callable function object, one has to wrap in a lambda and there exists weak partial application. For example, in our begin() function.

  auto begin() const {
    auto b = [](auto&& store) { return store.begin(); };
    auto e = [](auto&& store) { return store.end(); };
    return Iterator(tuple_map(b, stores_), tuple_map(e, stores_));

But things like fold expressions and class template argument deduction make things absolutely much easier.

As for the iterator class, the logic is much less trivial than I'd hope an iterator class to be, but I think the performance should be decent. If my program ends up spending more time incrementing iterators than using the values obtained, I think my bigger issue will be that my game's not doing much. Most of the iterations through the ECS should be of very similar entities and so in the general case, it should increment once and exit.

As for the entity class itself, I'll have to test it out a bunch. Obviously, I worry that adding or removing large numbers of entities could be problematic, but I imagine it'll be good enough for my purposes and there's plenty of room for optimization, like maintaining a free list of deleted entity IDs and component data, for one example. Still, when I'd used the same in Python, it worked well so it should be better here.